Search Toolkit is an AI Productivity tool. Open-source framework for building production-ready search and retrieval pipelines. Best for software developers and engineers, data scientists and analysts and scientists and researchers.
About Search Toolkit
Key Features
Frequently Asked Questions
Search Toolkit is used to build production search and RAG pipelines. It handles the complete workflow from document ingestion and extraction to embedding and retrieval. Companies like CMA CGM use Search Toolkit alongside Voxtral to process audio from multiple sources and return alerts within 15 seconds.
Yes, Search Toolkit is an open-source Python framework that you can use and customize. However, some features like Mistral OCR extraction and embedding models require a Mistral API key and are billed based on usage through the Mistral API pricing.
Search Toolkit supports PDF, DOCX, PPTX, ODT, HTML, plain text, and audio files. It includes specialized extractors for each type, with OCR for scanned documents, HTML parsing that strips boilerplate, and audio transcription with speaker diarization.
Search Toolkit is designed for developers and data scientists building RAG applications or enterprise search systems. It's particularly useful for teams that need full control over their document processing pipeline and want to deploy production-grade search infrastructure.





