Tokenization
在DL4J中把文本分解成单个单词进行语言处理。
什么是分词?
示例
//带有词形还原,词性标注,句子分割的分词
TokenizerFactory tokenizerFactory = new UimaTokenizerFactory();
Tokenizer tokenizer = tokenizerFactory.tokenize("mystring");
//迭代
while(tokenizer.hasMoreTokens()) {
String token = tokenizer.nextToken();
}
//得到词的整个列表
List<String> tokens = tokenizer.getTokens();Last updated
Was this helpful?