java - Tokenize a string using apache lucene -


how tokenize string based on patter?

example. in following string

arg1:aaa,bbb , arg2:ccc or arg3:ddd,eee,fff 

first want tokenize based on , and or

so

token set 1 arg1:aaa,bbb  token set 2 arg2:ccc  token set 3 arg3:ddd,eee,fff 

later want pass these individual token sets method , tokenize based on ":"

token set 1 token 1 aaa token 2 bbb  token set 2 token 1 ccc  token set 3 token 1 ddd token 2 eee token 3 fff 

how tokenize using custom patter using lucene?

to perform custom tokenization implementation, implement own tokenizer. primary method needs implemented tokenstream.incrementtoken().

your tokenizer can incorporated analyzer.


Comments

Popular posts from this blog

javascript - Laravel datatable invalid JSON response -

java - Exception in thread "main" org.springframework.context.ApplicationContextException: Unable to start embedded container; -

sql server 2008 - My Sql Code Get An Error Of Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '8:45 AM' to data type int -