Code: link
In the era of information overload, efficient and intelligent search mechanisms have become indispensable. Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm in this context, combining the strengths of large language models with the ability to retrieve and leverage external knowledge. This approach not only enhances the accuracy and relevance of search results but also allows for more nuanced and context-aware responses to user queries.
RAG works by first retrieving relevant documents or passages from a knowledge base in response to a query, and then using these retrieved pieces to augment the context given to a language model. This allows the model to generate responses that are both informed by its pre-trained knowledge and grounded in specific, relevant information from the knowledge base.
The advantages of RAG are numerous:
While RAG offers significant benefits, there’s a growing concern about data privacy and security when utilizing cloud-based or third-party solutions. This is where local, private implementations of RAG systems become crucial. By hosting such systems locally, organizations and individuals can:
LoPAI Search (Locally Hosted Private AI-Powered Search) is an open-source project that exemplifies the principles of local and private RAG-based search. Let’s look into its architecture and implementation details.
Video link.
LoPAI Search is designed with modularity and scalability in mind. The project leverages Docker Compose to orchestrate multiple services, each responsible for a specific aspect of the RAG pipeline. Let’s examine the key components:
The docker-compose.yml
file orchestrates these services:
version: '3.3'
services:
qdrant:
image: qdrant/qdrant:v1.11.5
# ... (configuration details)
llama-index:
build:
context: .
dockerfile: Dockerfile
# ... (configuration details)
ollama:
build:
context: .
dockerfile: Dockerfile.ollama
# ... (configuration details)
volumes:
ollama_data:
qdrant_data:
documents:
This modular approach allows for easy scaling and maintenance of individual components.
The heart of LoPAI Search lies in two main classes: CollectionManager
and Collection
. These classes work together to provide a flexible and powerful interface for managing multiple document collections.
The CollectionManager
class serves as the central hub for managing multiple collections. Key functionalities include:
class CollectionManager:
def __init__(self):
self.collections: Dict[str, Collection] = {}
# ... (initialization code)
def create_collection(self, name: str) -> Dict:
# ... (implementation details)
def delete_collection(self, name: str) -> Dict:
# ... (implementation details)
def rename_collection(self, old_name: str, new_name: str) -> Dict:
# ... (implementation details)
The Collection
class encapsulates the logic for individual document collections. It handles:
class Collection:
def __init__(self, client, name: str):
# ... (initialization code)
def query(self, question: str) -> Dict:
# ... (implementation details)
def upload_files(self, files: List[UploadFile]) -> Dict:
# ... (implementation details)
def delete_documents(self, doc_ids: List[str]) -> Dict:
# ... (implementation details)
The project exposes its functionalities through a FastAPI-based REST API, defined in main.py
. This API provides endpoints for:
@app.post("/collections/{collection_name}/query")
async def query(collection_name: str, question: Question):
# ... (implementation details)
@app.post("/collections/{collection_name}/upload_files")
async def upload_files(collection_name: str, files: List[UploadFile] = File(...)):
# ... (implementation details)
@app.get("/collections")
async def list_collections():
# ... (implementation details)
LoPAI Search uses HuggingFace’s embedding models (specifically, “BAAI/bge-base-en-v1.5”) for converting documents into vector representations. These embeddings are then stored and indexed in Qdrant, allowing for efficient similarity search.
Settings.embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL)
When a query is received, the system performs the following steps:
def query(self, question: str) -> Dict:
response = self.query_engine.query(question)
# ... (processing and formatting the response)
The system supports various document operations:
LoPAI Search demonstrates the feasibility and advantages of implementing a local, private RAG-based search system. Its modular architecture, leveraging Docker Compose, allows for easy deployment and scaling. The use of separate CollectionManager
and Collection
classes provides a clean separation of concerns and enables efficient management of multiple document collections.
Future enhancements to the system could include:
LoPAI Search serves as a demo for tech enthusiasts to run cutting edge search systems on local hardware with a simple UI and a solid scheleton to build upon. I am personally using the system at home to help me search through local files and extract specific information from long documents like contracts, personal notes or academic papers.