欢迎访问鸿雪径，这里介绍软件相关技术、动态以及鸿雪径杂谈，如有技术合作请与站长联系！邮费网易食烤吧特产大全网

鸿雪径杂谈

首页 > JAVA > lucene5.3快速上手使用

lucene5.3快速上手使用

发布时间：2015-09-12 作者：点击：456

Lucene是apache软件基金会jakarta项目组的一个子项目，是一个开放源代码的全文检索引擎工具包，它不只是一个完整的全文检索引擎，而是一个全文检索引擎的架构，提供了完整的查询引擎和索引引擎，部分文本分析引擎（英文与德文两种西方语言）。Lucene的目的是为软件开发人员提供一个简单易用的工具包，以方便的在目标系统中实现全文检索的功能，或者是以此为基础建立起完整的全文检索引擎。在Java开发环境里Lucene是一个成熟的免费开源工具。

lucene最新版本是5.3，从使用角度来说就是建立索引，查询（查询中有多个字段查询、高亮等小要点），开始先说一下涉及的jar包：

lucene-core-5.3.0.jar

lucene-analyzers-common-5.3.0.jar

lucene-queries-5.3.0.jar

lucene-queryparser-5.3.0.jar

lucene-highlighter-5.3.0.jar

lucene-memory-5.3.0.jar

首先说一下lucene的建立索引：

IKAnalyzer analyzer = new IKAnalyzer();

// 使用智能分词

analyzer.setUseSmart(true);

//StandardAnalyzer analyzer = new StandardAnalyzer();

//Directory index = new RAMDirectory();

IndexWriterConfig config = new IndexWriterConfig(analyzer);

try {

//AppDbConfig adc= AppDbConfig.getInstance();

FSDirectory directory=FSDirectory.open(Paths.get("c:hongxuejingLuceneIndex"));

IndexWriter w = new IndexWriter(directory,config);

for(int i=0;i<al.size();i++)

{

HashMap hm =(HashMap)al.get(i);

addDoc(w,(String)hm.get("title"),(String)hm.get("content"),(String)hm.get("url"));

}

w.close();

} catch (IOException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

private static void addDoc(IndexWriter w, String title, String content,String url) throws IOException {

Document doc = new Document();

content = content.replaceAll("</?[^<]+>", "");

content = content.replaceAll(" ", "");

doc.add(new TextField("title", title, Field.Store.YES));

doc.add(new TextField("content", content, Field.Store.YES));

doc.add(new StringField("url", url, Field.Store.YES));

//w.addDocument(doc);

w.updateDocument(new Term("url",url), doc);

}

查询数据：

String[] queries = {keyword, keyword };

String[] fields = { "title", "content" };

BooleanClause.Occur[] clauses = { BooleanClause.Occur.SHOULD, BooleanClause.Occur.SHOULD };

Query query = MultiFieldQueryParser.parse(queries, fields, clauses, new IKAnalyzer());

int hitsPerPage = 1;

FSDirectory directory=FSDirectory.open(Paths.get(AppDbConfig.getInstance().getLucenIndexPath()));

IndexReader reader = DirectoryReader.open(directory);

IndexSearcher searcher = new IndexSearcher(reader);

TopScoreDocCollector collector = TopScoreDocCollector.create(10000);

searcher.search(query, collector);

ScoreDoc[] hits = collector.topDocs().scoreDocs;

// Hits hits = searcher.search(query);

// 4. display results

//System.out.println("Found " + hits.length + " hits.");

totalNum=hits.length ;

for (int i = 0; i < hits.length; i++) {

int DocId = hits[i].doc;

Document doc =searcher.doc(DocId);

String title = doc.get("title");

String content = doc.get("content");

String url = doc.get("url");

//System.out.println(DocId + ":" + title + " url:" + url );

if(i>=(startRecord-1)&&i<endRecord)

{

HashMap<String, String> hm= new HashMap<String, String>();

//高亮器

Highlighter highlighter = createHighlighter(query,

"<font color='red'>", "</font>",

300);

title = highlight(doc, highlighter, new IKAnalyzer(), "title");

content = highlight(doc, highlighter, new IKAnalyzer(), "content");

hm.put("title", title);

hm.put("content", content);

hm.put("url", url);

pageAL.add(hm);

}

reader.close();//关闭资源

directory.close();//关闭连接

在这里用到了IKAnalyzer来做中文分词。

更多关于 lucene 的信息

lucene5.3使用IKAnalyzer做中文分词(2015-09-12)

本站部分文章转载于网上，版权归原作者所有。如果侵犯您的权益，请Email和本站联系!

关于鸿雪径 | 友情链接