# Vocabulary Cache

## 词汇缓存的工作原理

词汇缓存是DL4J中处理通用自然语言任务的机制，包括普通TF-IDF、单词向量和某些信息检索技术。词汇缓存的目标是成为文本向量化的一站式商店，其中封装了单词袋和单词向量等常用的技术。

词汇缓存通过倒排索引处理词、词统计频率、倒排文档频率和文档出现的存储。InMemoryLookupCache是参考实现。

为了在迭代文本和索引词时使用词汇缓存，你需要确定词是否应该包括在词汇缓存中。该标准通常是如果词出现在语料库中超过一定预先配置的频率。在该频率以下，单个词不是一个词汇缓存的单词，它只是一个词。

我们也跟踪词。为了跟踪词，请执行下列操作：

```java
    addToken(new VocabWord(1.0,"myword"));
```

当你想添加一个词汇缓存的词，按如下做：

```java
    addWordToIndex(0, Word2Vec.UNK);
    putVocabWord(Word2Vec.UNK);
```

向索引添加单词来 设置索引。然后你把它声明为一个词汇缓存单词。（声明它是一个词汇缓存单词，将从索引中拉出单词。）


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://deeplearning4j.konduit.ai/zhong-wen-v1.0.0/yu-yan-chu-li/vocabulary-cache.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
