BM25
A probabilistic keyword-ranking algorithm that scores documents by term frequency and inverse document frequency.
BM25 (Best Matching 25) is a bag-of-words retrieval function that ranks documents based on how often query terms appear, adjusted for document length. It excels at matching exact terminology, acronyms, and proper nouns that semantic search sometimes misses.
In hybrid retrieval pipelines, BM25 complements vector search. Combining both with reciprocal rank fusion yields significantly better recall than either method alone, which is critical for compliance workflows where missing a relevant clause could have legal consequences.
More ai/ml Terms
Retrieval-Augmented Generation (RAG)
An AI architecture that combines information retrieval with text generation to produce answers grounded in source documents.
Vector Embedding
A numerical representation of text as a high-dimensional vector, enabling semantic similarity comparisons between passages.
Chunking
The process of splitting large documents into smaller, overlapping segments optimized for retrieval and embedding.
Hallucination
When an AI model generates plausible-sounding but factually incorrect or fabricated information.
Large Language Model (LLM)
A neural network trained on massive text corpora that can understand and generate human language.
Fine-Tuning
The process of further training a pre-trained model on domain-specific data to improve its performance on targeted tasks.
Analyze Documents Related to BM25
Upload any document and get AI-powered analysis with verifiable citations.
Start Free