Introduction to Enterprise Knowledge Retrieval Challenges
Modern enterprises often accumulate vast amounts of internal documentation, ranging from product specifications and compliance guidelines to operational procedures and technical manuals. New employees frequently encounter a deluge of information, often struggling to locate specific details within lengthy documents or unknowingly using outdated versions. This inefficiency highlights a critical need for an effective internal knowledge management system.
This article outlines the development of a sophisticated internal Q&A system leveraging Easysearch and Large Language Models (LLMs). The system is designed to comprehend user queries and provide precise answers based on the most current internal documentation, utilizing the LangChain framework to orchestrate interactions between a locally deployed LLM and Easysearch.
Architectural Overview
The proposed knowledge retrieval architecture comprises four key components:
- Data Sources: This includes various formats such as unstructured documents (PDF, DOCX, TXT), structured data, or even source code. For this demonstration, we focus on unstructured PDF documents.
- Large Language Model (LLM) Application: This component acts as the orchestrator, managing interactions with the LLM to generate relevant responses.
- Large Language Model(s): The core AI engine(s) responsible for understanding, reasoning, and generating answers. Multiple models can be employed for different tasks.
- Q&A Interface: A user-facing web application built with a framework like Flask, providing an interactive front-end for employees to submit questions and receive answers.
Document Ingestion
Our example utilizes an "INFINI Product Installation Manual.pdf" as the knowledge base. The initial step involves loading these documents into our system. LangChain's document_loaders module offers extensive support for various data formats, simplifying the ingestion process. For PDF files, we employ the PyPDFLoader class.
The following Python snippet demonstrates loading all PDF files from a specified directory:
import os
from langchain_community.document_loaders import PyPDFLoader
# Define the directory containing source documents
document_source_path = '.\\easysearch'
raw_documents = []
# Iterate through files and load PDFs
for doc_file_name in os.listdir(document_source_path):
file_path = os.path.join(document_source_path, doc_file_name)
if doc_file_name.endswith('.pdf'):
loader = PyPDFLoader(file_path)
raw_documents.extend(loader.load())
This code populates the raw_documents list with the content extracted from the PDF files, preparing it for subsequent processing.
Text Segmentation
Large documents often exceed the context window limitations of many LLMs and can be inefficient for retrieval and storage. Therefore, it's crucial to segment the text into smaller, manageable chunks. This approach ensures that during query time, only the most relevant document parts are retrieved, optimizing performance and relevance. LangChain provides robust tools for text splitting.
In this example, we divide documents into chunks of 1000 characters with an overlap of 200 characters between consecutive chunks. This overlap helps preserve context and reduces the risk of splitting important sentences or ideas.
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Initialize a text splitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
# Segment the loaded documents
processed_text_chunks = text_splitter.split_documents(raw_documents)
The processed_text_chunks list now holds the segmented content, ready for vectorization. The choice of chunk_size and chunk_overlap is critical and often requires experimentation to achieve optimal application performance.
Vectorization and Easysearch Integration
The next step involves converting these text chunks into numerical vector representations, which are then stored in a vector data base. For this system, we utilize the mxbai-embed-large model for generating embeddings, with Ollama handling the local model deployment. The embeddings, along with the original text chunks, are then indexed within Easysearch.
Here’s how to configure the embedding model and integrate with Easysearch:
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import EcloudESVectorStore
# Define the embedding model
embedding_model_instance = OllamaEmbeddings(
model="mxbai-embed-large",
)
# Easysearch cluster configuration
ES_HOST_URL = "https://192.168.56.3:9200" # Replace with your Easysearch URL
ES_USERNAME = "admin"
ES_PASSWORD = "e5ac1b537785ae27c187" # Replace with your Easysearch password
VECTOR_INDEX_NAME = "infini_knowledge"
# Create and populate the Easysearch vector store
vector_store_instance = EcloudESVectorStore.from_documents(
processed_text_chunks,
embedding_model_instance,
es_url=ES_HOST_URL,
user=ES_USERNAME,
password=ES_PASSWORD,
index_name=VECTOR_INDEX_NAME,
verify_certs=False, # Set to True for production with valid SSL/TLS certs
)
Upon successful execution, the text chunks are vectorized and stored in the specified Easysearch index, infini_knowledge. Each document in the index typically contains a text field for the original content and a vector field for its corresponding numerical embedding.
Retrieval and Answer Synthesis
This phase defines the generative LLM and constructs a RetrievalQA chain, which is responsible for fetching relevant information and synthesizing answers. The RetrievalQA chain critically depends on two main components:
- LLM: The large language model responsible for generating human-like responses.
- Retriever (
vector_store_instance.as_retriever()): This component queries the vector database (Easysearch) to find document chunks most semantically similar to the user's question. These "knowledge blocks" are then provided as context to the LLM. Retrieval is essential because LLMs, trained on public data, lack proprietary internal knowledge.
import logging
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.chains import RetrievalQA
from langchain_community.chat_models import ChatOllama
from langchain.prompts import PromptTemplate
# Configure logging for MultiQueryRetriever
logging.basicConfig()
logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)
# Instantiate the large language model (ensure qwen2:latest is running via Ollama)
language_model = ChatOllama(model="qwen2:latest")
# Define a prompt template for generating diverse query versions
retrieval_prompt_template = PromptTemplate(
input_variables=["question"],
template="""You are an expert AI assistant tasked with improving document retrieval.
Generate 3 distinct rephrased versions of the given user query in Chinese to maximize the chances
of finding relevant documents in a vector database. These alternative questions should cover
different facets or wordings of the original intent. Provide these alternative
questions, each on a new line. Original query: {question}""",
)
# Instantiate a MultiQueryRetriever to generate multiple perspectives of the user query
enhanced_retriever = MultiQueryRetriever.from_llm(
retriever=vector_store_instance.as_retriever(),
llm=language_model,
prompt=retrieval_prompt_template,
include_original=True
)
# Instantiate the RetrievalQA chain
question_answering_pipeline = RetrievalQA.from_chain_type(
language_model,
retriever=enhanced_retriever,
chain_type="stuff" # 'stuff' combines all retrieved documents into a single prompt
)
Here, we use Ollama to locally host a qwen2:latest LLM, which serves both for query rewriting and answer generation. You can initiate the LLM by running ollama run qwen2.
Upon receiving a user query, the MultiQueryRetriever first employs the qwen2 model to generate three semantically similar versions of the original question. Easysearch then performs vector retrieval using these multiple queries to fetch relevant content. Finally, all retrieved content is combined, de-duplicated, and passed along with the original question to the qwen2 LLM for final answer synthesis.
A significant advantage of using Easysearch as the retrieval engine is its capability to perform both vector similarity search and traditional full-text search, allowing for a hybrid retrieval strategy that can further enhance accuracy and coverage.
User Interface for Q&A
To provide an interactive experience, we develop a simple Flask web application (requires Flask installation) that accepts user queries and displays the generated answers. The index.html template is used for rendering the interface.
from flask import Flask, request, render_template
app = Flask(__name__) # Initialize Flask application
@app.route('/', methods=['GET', 'POST'])
def home_interface():
if request.method == 'POST':
# Retrieve user input from the form
user_query = request.form.get('query_input')
# Invoke the Q&A pipeline to get a response
response_data = question_answering_pipeline({"query": user_query})
# Render the response on the webpage
return render_template('index.html', query_result=response_data)
# For GET requests or initial page load
return render_template('index.html')
if __name__ == "__main__":
app.run(host='0.0.0.0', debug=True, port=5000)
This Flask application integrates the previously defined question_answering_pipeline to process user inputs and present the LLM-generated answers via a web interface.
Operational Demonstration
When a user poses a question through the web interface, the Q&A system processes it as described above. The speed of the response will depend on the local computing resources allocated for the LLM and Easysearch. For instance, if a user asks about a specific configuration parameter like LOGGING_ES_ENDPOINT, the system will retrieve relevant snippets from the indexed "INFINI Product Installation Manual.pdf" and provide a concise answer. Validation against the original document would confirm the accuracy and relevance of the system's output. Further fine-tuning can be achieved by modifying the prompt template (retrieval_prompt_template) or by experimenting with different large language models.