about « all posts

RAG-Based Local Search: A Deep Dive into LoPAI Search

Sep 27 2024 · 5 min read
#nlp #rag #llm #llamaindex

Code: link

Introduction

In the era of information overload, efficient and intelligent search mechanisms have become indispensable. Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm in this context, combining the strengths of large language models with the ability to retrieve and leverage external knowledge. This approach not only enhances the accuracy and relevance of search results but also allows for more nuanced and context-aware responses to user queries.

The Power of RAG

RAG works by first retrieving relevant documents or passages from a knowledge base in response to a query, and then using these retrieved pieces to augment the context given to a language model. This allows the model to generate responses that are both informed by its pre-trained knowledge and grounded in specific, relevant information from the knowledge base.

The advantages of RAG are numerous:

  1. Improved Accuracy: By grounding responses in retrieved information, RAG reduces hallucinations and improves the factual accuracy of generated content.
  2. Up-to-date Information: The knowledge base can be regularly updated, allowing the system to access and utilize current information beyond the model’s training cutoff.
  3. Domain Specificity: RAG can be tailored to specific domains by curating the knowledge base, making it highly adaptable to various use cases.
  4. Transparency: The retrieved documents provide a clear trail of where information is sourced from, enhancing explainability.

The Need for Local and Private Solutions

While RAG offers significant benefits, there’s a growing concern about data privacy and security when utilizing cloud-based or third-party solutions. This is where local, private implementations of RAG systems become crucial. By hosting such systems locally, organizations and individuals can:

  1. Ensure Data Privacy: Sensitive information never leaves the local network, reducing the risk of data breaches.
  2. Maintain Compliance: Local hosting helps in adhering to data protection regulations like GDPR or HIPAA.
  3. Customize Freely: Local systems can be tailored and fine-tuned without restrictions often imposed by cloud services.
  4. Reduce Latency: Local processing can offer faster response times, especially for large-scale operations.
  5. Control Costs: While initial setup might be resource-intensive, long-term costs can be more predictable and often lower than cloud-based solutions.

LoPAI Search: A Local, Private RAG Implementation

LoPAI Search (Locally Hosted Private AI-Powered Search) is an open-source project that exemplifies the principles of local and private RAG-based search. Let’s look into its architecture and implementation details.

Video link.

Project Structure and Modularity

LoPAI Search is designed with modularity and scalability in mind. The project leverages Docker Compose to orchestrate multiple services, each responsible for a specific aspect of the RAG pipeline. Let’s examine the key components:

  1. Qdrant: A vector database service for efficient similarity search.
  2. Ollama: A service for running large language models locally.
  3. LlamaIndex: The core service that integrates document processing, embedding, and querying functionalities.

The docker-compose.yml file orchestrates these services:

version: '3.3'

services:
  qdrant:
    image: qdrant/qdrant:v1.11.5
    # ... (configuration details)

  llama-index:
    build:
      context: .
      dockerfile: Dockerfile
    # ... (configuration details)

  ollama:
    build:
      context: .
      dockerfile: Dockerfile.ollama
    # ... (configuration details)

volumes:
  ollama_data:
  qdrant_data:
  documents:

This modular approach allows for easy scaling and maintenance of individual components.

Core Classes: CollectionManager and Collection

The heart of LoPAI Search lies in two main classes: CollectionManager and Collection. These classes work together to provide a flexible and powerful interface for managing multiple document collections.

CollectionManager

The CollectionManager class serves as the central hub for managing multiple collections. Key functionalities include:

class CollectionManager:
    def __init__(self):
        self.collections: Dict[str, Collection] = {}
        # ... (initialization code)

    def create_collection(self, name: str) -> Dict:
        # ... (implementation details)

    def delete_collection(self, name: str) -> Dict:
        # ... (implementation details)

    def rename_collection(self, old_name: str, new_name: str) -> Dict:
        # ... (implementation details)

Collection

The Collection class encapsulates the logic for individual document collections. It handles:

class Collection:
    def __init__(self, client, name: str):
        # ... (initialization code)

    def query(self, question: str) -> Dict:
        # ... (implementation details)

    def upload_files(self, files: List[UploadFile]) -> Dict:
        # ... (implementation details)

    def delete_documents(self, doc_ids: List[str]) -> Dict:
        # ... (implementation details)

API Layer

The project exposes its functionalities through a FastAPI-based REST API, defined in main.py. This API provides endpoints for:

@app.post("/collections/{collection_name}/query")
async def query(collection_name: str, question: Question):
    # ... (implementation details)

@app.post("/collections/{collection_name}/upload_files")
async def upload_files(collection_name: str, files: List[UploadFile] = File(...)):
    # ... (implementation details)

@app.get("/collections")
async def list_collections():
    # ... (implementation details)

Technical Deep Dive

Embedding and Indexing

LoPAI Search uses HuggingFace’s embedding models (specifically, “BAAI/bge-base-en-v1.5”) for converting documents into vector representations. These embeddings are then stored and indexed in Qdrant, allowing for efficient similarity search.

Settings.embed_model = HuggingFaceEmbedding(model_name=EMBEDDING_MODEL)

Query Processing

When a query is received, the system performs the following steps:

  1. The query is embedded using the same model as the documents.
  2. Qdrant performs a similarity search to retrieve the most relevant document chunks.
  3. The retrieved chunks, along with the original query, are passed to the language model (either OpenAI’s GPT-4 or a locally hosted model via Ollama).
  4. The language model generates a response based on the query and the retrieved context.
def query(self, question: str) -> Dict:
    response = self.query_engine.query(question)
    # ... (processing and formatting the response)

Document Management

The system supports various document operations:

Conclusion and Future Directions

LoPAI Search demonstrates the feasibility and advantages of implementing a local, private RAG-based search system. Its modular architecture, leveraging Docker Compose, allows for easy deployment and scaling. The use of separate CollectionManager and Collection classes provides a clean separation of concerns and enables efficient management of multiple document collections.

Future enhancements to the system could include:

LoPAI Search serves as a demo for tech enthusiasts to run cutting edge search systems on local hardware with a simple UI and a solid scheleton to build upon. I am personally using the system at home to help me search through local files and extract specific information from long documents like contracts, personal notes or academic papers.