a simplified python impelementaiton of Apache Lucene search engine, mabye helps to understand how an enterprise search engine really works.
Usually, the companies don’t use Lucene; but use ElasticSearch, the distributed and RESTful wrapper of Lucene.
- anatomy of analyzer: Elasticsearch Analyzer Doc
- token graph: Elasticsearch token graph doc and Lucene’s TokenStreams are actually graphs!
- index writer: AlibabaCloud blog: Lucene IndexWriter, An In-Depth Introduction
- token filters: Identify portions of the text that are dialogue
- in-depth implementation of Lucene: In-depth analysis of the implementation principle of Lucene lightweight full-text index