There are two ways to build scoring systems for retrieval. Dual-encoder: encode your query, encode your document, compare embeddings. Cross-encoder: throw the query AND the document into a transformer together, ask it to score the pair directly. Counterintuitive? Yes. More effective? Often, yes. The difference is architectural. Dual encoders work in isolation. They never see the query-document pair together, so they can't capture interaction patterns specific to that pair. Cross-encoders see the full context and can learn fine-grained relevance signals. A query like 'lightweight Python web frameworks' might encode similarly to 'heavy Python web frameworks' in a dual-encoder world (both are about Python frameworks). But a cross-encoder can distinguish them. It's more computationally expensive though. You can pre-compute dual-encoder embeddings once and reuse them. Cross-encoders must score each query-document pair on-the-fly. So in practice, hybrid approaches are common. Use dual-encoders (cheap, fast) for initial retrieval. Use cross-encoders (expensive, accurate) for reranking the top candidates. This is called retrieval then reranking, and it works really well. Modern cross-encoders are typically BERT-style transformer models fine-tuned on relevance datasets. They learn subtle patterns about what makes documents relevant to specific queries. They're particularly good at understanding negation, specificity, and semantic nuance. The tradeoff: latency versus accuracy. With a million documents, reranking the top 100 candidates with cross-encoders is feasible. Reranking all million is not. Synap's cross-encoder implementation handles both initial scoring and reranking workflows, optimizing for your specific performance requirements and accuracy targets.
Why It Matters
Cross-encoder scoring is a relatively simple technique that dramatically improves ranking accuracy. The computational cost is worth it when you're reranking top candidates. It's the secret sauce behind why some retrieval systems feel much more accurate than others. It's the difference between a system that gets lucky sometimes and one that consistently surfaces the right information.
Example
A legal AI searches for relevant case law for 'exceptions to copyright protection.' Dual-encoder might confuse this with 'copyright protection laws' because semantic similarity is high. A cross-encoder sees the full query and documents together, understanding that 'exceptions' is critical and 'copyright protection laws' (without the exceptions angle) is less relevant. The ranking improves noticeably.