2024-7-15 Update
The Langchain-Chatchat codebase has advanced to version 0.3.1, which revises CLI execution. The original Step 4 instructions are no longer compatible; follow the project’s official README instead.
The 0.3.0 release of Langchain-Chatchat introduced architectural adjustments, requiring integration with third-party model inference tools. Below is a detailed, practical breakdown of setup and common resolution steps.
Setup Workflow
- Set up the Langchain-Chatchat environment
- Install a compatible model serving framework (XInference selected here)
- Launch the serving framework and load required models before starting Langchain-Chatchat
- Initialize and launch Langchain-Chatchat
Step 1: Set Up Langchain-Chatchat Environment
# Create a dedicated Python 3.10 virtual environment
conda create -n chat-rag-py310 python==3.10
conda activate chat-rag-py310
# Install Langchain-Chatchat with Xinference integration
pip install "langchain-chatchat[xinference]" --upgrade
Step 2: Set Up XInference Environment
# Create a separate Python 3.10 virtual environment for inference
conda create -n model-serve-py310 python==3.10
conda activate model-serve-py310
# Install XInference with Transformers engine (supports most PyTorch-based LLMs/embedders)
# Alternative engines: vLLM, SGLang, etc., may be used if compatible with your models
pip install "xinference[transformers]"
Step 3: Launch XInference and Load Models
# Start XInference server on all interfaces, default port 9997
xinference-local --host 0.0.0.0 --port 9997
# Access the web UI at http://localhost:9997/ for monitoring
# Download your preferred LLM and embedding models locally (use ModelScope if Hugging Face is unavailable)
# Register custom/local models with: xinference register --model-type [TYPE] --file model-config.json --persist
# Launch a custom fine-tuned GLM-4 chat model
xinference launch --model-name my-glm4-chat --model-format pytorch --model-engine Transformers
# Launch a custom BGE large Chinese embedding model
xinference launch --model-name my-bge-large-zh-v1.5 --model-type embedding
Step 4: Initialize and Launch Langchain-Chatchat
# Clone the full Langchain-Chatchat repository
git clone https://github.com/chatchat-space/Langchain-Chatchat.git
cd Langchain-Chatchat
# Activate the chat RAG environment
conda activate chat-rag-py310
# Configure default models and Xinference platform
chatchat-config model --default_llm_model my-glm4-chat
chatchat-config model --default_embedding_model my-bge-large-zh-v1.5
chatchat-config model --set_model_platforms "[
{
\"platform_name\": \"xinference\",
\"platform_type\": \"xinference\",
\"api_base_url\": \"http://127.0.0.1:9997/v1\",
\"api_key\": \"NONE\",
\"api_concurrencies\": 4,
\"llm_models\": [
\"my-glm4-chat\"
],
\"embed_models\": [
\"my-bge-large-zh-v1.5\"
],
\"image_models\": [],
\"reranking_models\": [],
\"speech2text_models\": [],
\"tts_models\": []
}
]"
# Attempt to rebuild knowledge base (note known 0.3.0 issues below)
chatchat-kb -r --embed-model=my-bge-large-zh-v1.5
# Launch full Langchain-Chatchat service
chatchat -a
Troubelshooting Common Errors
Error 1: Connection Refused to Port 9997
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=9997): Max retries exceeded with url: /v1/cluster/auth (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7febe88d6200>: Failed to establish a new connection: [Errno 111] Connection refused'))
Root Cause
XInference model serving server is not running before Langchain-Chatchat initialization.
Fix
Reactivate the model-serve-py310 environment and start XInference:
conda activate model-serve-py310
xinference-local --host 0.0.0.0 --port 9997
Error 2: Knowledge Base Initialization Hangs
The process freezes indeifnitely after displaying a warning message.
Root Cause
Known unpatched bug in the 0.3.0 release branch.
Workaround
Skip knowledge base rebuilding entirely and start the core service directly:
chatchat -a
Error 3: Remote Protocol Error During Chat
RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
Fix
Downgrade the OpenAI Python library to version 0.28.1. If the issue persists, delete the existing virtual environment and rebuild using Python 3.9 instead of 3.10.