Deploying Langchain-Chatchat 0.3.0 with Xinference: Setup Walkthrough and Troubleshooting Tips

2024-7-15 Update

The Langchain-Chatchat codebase has advanced to version 0.3.1, which revises CLI execution. The original Step 4 instructions are no longer compatible; follow the project’s official README instead.

The 0.3.0 release of Langchain-Chatchat introduced architectural adjustments, requiring integration with third-party model inference tools. Below is a detailed, practical breakdown of setup and common resolution steps.

Setup Workflow

  1. Set up the Langchain-Chatchat environment
  2. Install a compatible model serving framework (XInference selected here)
  3. Launch the serving framework and load required models before starting Langchain-Chatchat
  4. Initialize and launch Langchain-Chatchat

Step 1: Set Up Langchain-Chatchat Environment

# Create a dedicated Python 3.10 virtual environment
conda create -n chat-rag-py310 python==3.10
conda activate chat-rag-py310

# Install Langchain-Chatchat with Xinference integration
pip install "langchain-chatchat[xinference]" --upgrade

Step 2: Set Up XInference Environment

# Create a separate Python 3.10 virtual environment for inference
conda create -n model-serve-py310 python==3.10
conda activate model-serve-py310

# Install XInference with Transformers engine (supports most PyTorch-based LLMs/embedders)
# Alternative engines: vLLM, SGLang, etc., may be used if compatible with your models
pip install "xinference[transformers]"

Step 3: Launch XInference and Load Models

# Start XInference server on all interfaces, default port 9997
xinference-local --host 0.0.0.0 --port 9997
# Access the web UI at http://localhost:9997/ for monitoring

# Download your preferred LLM and embedding models locally (use ModelScope if Hugging Face is unavailable)
# Register custom/local models with: xinference register --model-type [TYPE] --file model-config.json --persist

# Launch a custom fine-tuned GLM-4 chat model
xinference launch --model-name my-glm4-chat --model-format pytorch --model-engine Transformers

# Launch a custom BGE large Chinese embedding model
xinference launch --model-name my-bge-large-zh-v1.5 --model-type embedding

Step 4: Initialize and Launch Langchain-Chatchat

# Clone the full Langchain-Chatchat repository
git clone https://github.com/chatchat-space/Langchain-Chatchat.git
cd Langchain-Chatchat

# Activate the chat RAG environment
conda activate chat-rag-py310

# Configure default models and Xinference platform
chatchat-config model --default_llm_model my-glm4-chat
chatchat-config model --default_embedding_model my-bge-large-zh-v1.5
chatchat-config model --set_model_platforms "[
    {
        \"platform_name\": \"xinference\",
        \"platform_type\": \"xinference\",
        \"api_base_url\": \"http://127.0.0.1:9997/v1\",
        \"api_key\": \"NONE\",
        \"api_concurrencies\": 4,
        \"llm_models\": [
            \"my-glm4-chat\"
        ],
        \"embed_models\": [
            \"my-bge-large-zh-v1.5\"
        ],
        \"image_models\": [],
        \"reranking_models\": [],
        \"speech2text_models\": [],
        \"tts_models\": []
    }
]"

# Attempt to rebuild knowledge base (note known 0.3.0 issues below)
chatchat-kb -r --embed-model=my-bge-large-zh-v1.5

# Launch full Langchain-Chatchat service
chatchat -a

Troubelshooting Common Errors

Error 1: Connection Refused to Port 9997

urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=9997): Max retries exceeded with url: /v1/cluster/auth (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7febe88d6200>: Failed to establish a new connection: [Errno 111] Connection refused'))

Root Cause

XInference model serving server is not running before Langchain-Chatchat initialization.

Fix

Reactivate the model-serve-py310 environment and start XInference:

conda activate model-serve-py310
xinference-local --host 0.0.0.0 --port 9997

Error 2: Knowledge Base Initialization Hangs

The process freezes indeifnitely after displaying a warning message.

Root Cause

Known unpatched bug in the 0.3.0 release branch.

Workaround

Skip knowledge base rebuilding entirely and start the core service directly:

chatchat -a

Error 3: Remote Protocol Error During Chat

RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)

Fix

Downgrade the OpenAI Python library to version 0.28.1. If the issue persists, delete the existing virtual environment and rebuild using Python 3.9 instead of 3.10.

Tags: Langchain-Chatchat Xinference LLM Deployment RAG Model Serving

Posted on Thu, 04 Jun 2026 17:57:22 +0000 by kpetsche20