Citegeist is the name of the relevancy engine that ranks search results on CourtListener. It’s the product of years of enhancements and a deep background in legal research.
Citegeist works by combining classic search relevancy algorithms with state-of-the-art legal ranking technology. At a high level, Citegeist has two ranking algorithms: keyword search and semantic search.
Below, we provide a high-level summary of the ways we rank search results. This information is provided to help legal researchers understand how the CourtListener system works.
Keyword search looks for words and connectors in your query that match those of the content, and uses a variety of techniques to rank the results so that the best results are at the top.
When you do a keyword search, the following technologies are used to rank the results:
Not all of these technologies are used in all of our search engines. The following table shows where these technologies are used:
| Feature | Case Law | RECAP Archive | Oral Arguments | Judges |
|---|---|---|---|---|
| BM25 | ✅ | ✅ | ✅ | ✅ |
| Synonyms | ✅ | ✅ | ✅ | ✅ |
| Field Boosting | ✅ | ✅ | ✅ | ✅ |
| Phrase Boosting | ✅ | ✅ | ✅ | ✅ |
| Relevance Decay | ✅ | ✅ | ✅ | ✅ |
| Jurisdiction Boosting | ✅ | ❌ | ❌ | ❌ |
| Citation Boosting | Coming Soon | ❌ | ❌ | ❌ |
Keyword search aims to provide clear and intuitive results that clearly match the words, filters, and connectors you queried, but it has no notion of the underlying intent or meaning of your query. This can be limiting.
Semantic search — also known as AI search or vector search — is a modern approach to querying large data sets. Instead of matching particular keywords, it finds the underlying meaning of your query and finds results with similar meanings.
Semantic search is currently available for case law via our API, and we will be bringing it to our website soon.
Semantic search can provide a number of advantages over keyword search:
Ranking can be better — Because semantic search understands the underlying meaning of your query, it often provides better results than keyword search, particularly for users that simply type in their problem in plain English, without using advanced query operators.
Long queries are as fast — Keyword search engines slow down as you add words to your query. This limits how complex queries can be. Semantic search does not have this problem. Long queries are as fast as short ones, allowing you to provide more information and context to Citegeist.
Synonyms are automatic — In a keyword search engine, an administrator must create a list of synonyms for the system to use. Semantic search engines are able to automatically broaden your search to match relevant synonyms.
Hybrid search with both semantics and keywords — In addition to pure semantic search, you can enclose specific keywords in quotations to invoke hybrid search. That will retrieve both semantically relevant results and results with high BM25 on the enclosed keywords.
There are also some reasons to choose keyword search:
It is predictable — Semantic search engines are powerful, but they can be hard to understand, and sometimes it’s unclear why certain results are returned. Keyword search returns only the results that match.
It is complete — Many legal documents are only a few words long (e.g., SCOTUS cases that simply say, “AFFIRMED” or “CERT DENIED”). Such decisions do not have much actual meaning except in the context of other results, and are not well-suited to semantic search engines. We simply do not add these records to the semantic search engine.
You can go deep — Semantic search engines only provide the top results. Keyword search engines allow you to deeply research a particular topic.
Semantic search uses a language model and the approximate nearest neighbor algorithm to identify semantic meaning between documents and queries. The quality of the model that we use determines how well the system works. To provide the best results possible, we created a domain-adapted fine-tuned model, which we released to the public for free.
Citegeist uses this model in conjunction with BM25, date decay, and jurisdiction boosting to rank results (see above).
These services are sponsored by Free Law Project and users like you. We provide these services in furtherance of our mission to make the legal sector more innovative and equitable.
We have provided these services for over a decade, and we need your contributions to continue curating and enhancing them.
Will you support us today by becoming a member?