c# - How do I get a list of found words using Lucene.Net? -
i have indexed documents. have content:
document 1:
green table stood in room. room small.
document 2:
green tables stood in room. room large.
i'm looking "green table". find document1 , document2. want show phrases found. found in first document - "green table". found in second document - "greens table". how list of founds words ("green table" , "greens table")? i'm using lucene.net version 3.0.3.
you can use highlighter mark "found words". if want find them reason can still use highlighter , using regex (or simple substring loop) extract words.
for example:
query objquery = new termquery(new term("content", strquery)); queryscorer scorer = new queryscorer(objquery , "content"); simplehtmlformatter formatter = new simplehtmlformatter("<b>","</b>"); highlighter = new highlighter(formatter, scorer); highlighter.textfragmenter = new simplefragmenter(9999); (int = 0; < toprealteddocs.scoredocs.length; i++) { tokenstream stream = tokensources.getanytokenstream(searcher.indexreader, toprealteddocs.scoredocs[i].doc, "content", analyzer); string strsnippet = highlighter.getbestfragment(stream, doc.getvalue("content")); // here can want snippet. add result or example extract words (not regex - example here! use ever need): list<string> foundphrases = new list<string>(); while (strsnippet.indexof("<b>") > -1) { int indexstart = strsnippet.indexof("<b>"); int indexend = strsnippet.indexof("</b>"); foundphrases.add(strsnippet.substring(indexstart, indexend - indexstart)); strsnippet = strsnippet.substring(indexend); } }
omri
Comments
Post a Comment