java - Tokenize a string using apache lucene -
how tokenize string based on patter?
example. in following string
arg1:aaa,bbb , arg2:ccc or arg3:ddd,eee,fff
first want tokenize based on , and or
so
token set 1 arg1:aaa,bbb token set 2 arg2:ccc token set 3 arg3:ddd,eee,fff
later want pass these individual token sets method , tokenize based on ":"
token set 1 token 1 aaa token 2 bbb token set 2 token 1 ccc token set 3 token 1 ddd token 2 eee token 3 fff
how tokenize using custom patter using lucene?
to perform custom tokenization implementation, implement own tokenizer
. primary method needs implemented tokenstream.incrementtoken()
.
your tokenizer
can incorporated analyzer
.
Comments
Post a Comment