Complete Configuration Guide for django-haystack Full-Text Search with Chinese Support

Dependencies Installation

Install required packages first:

pip install django-haystack whoosh jieba

Core Settings Configuration

Add haystack to the end of INSTALLED_APPS in your project's settings.py to avoid resource override conflicts:

INSTALLED_APPS = [
    # Pre-existing application entries
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'blog',
    'haystack'
]

Append haystack connection and behavior configurations to settings.py:

import os
from pathlib import Path
BASE_DIR = Path(__file__).resolve().parent.parent

HAYSTACK_CONNECTIONS = {
    'default': {
        # Use custom Chinese-optimized Whoosh engine we will create later
        'ENGINE': 'blog.whoosh_cn_backend.WhooshEngine',
        'PATH': os.path.join(BASE_DIR, 'search_index_storage'),
    }
}
# Enable real-time index sync when database records are modified
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
  • ENGINE specifies the search backend implementation, we use a Jieba-integrated custom version instead of the default English-only Whoosh backend
  • PATH points to the directory for storing generated index files, which will be created automatically during the first index build
  • HAYSTACK_SIGNAL_PROCESSOR enables real-time index updates, suitable for low-write workloads like personal blogs and documentation sites

URL Routing Setup

You can use haystack's built-in routes for basic use, or define custom routes for extended logic:

# project/urls.py
from django.urls import path, include
from blog.views import CustomBlogSearchView

urlpatterns = [
    # Pre-existing route entries
    # Uncomment below line if no custom search logic is needed
    # path('search/', include('haystack.urls')),
    # Custom search endpoint
    path('blog-search/', CustomBlogSearchView.as_view(), name='haystack_search'),
]

Search Index Definition

Haystack requires a search_indexes.py file in each app that contains searchable models to define index rules:

# blog/search_indexes.py
from haystack import indexes
from .models import BlogPost

class BlogPostIndex(indexes.SearchIndex, indexes.Indexable):
    # Primary search field, required, must have document=True flag
    text = indexes.CharField(document=True, use_template=True)
    # Additional indexed fields for filtering or direct response return
    post_id = indexes.IntegerField(model_attr='id')
    title = indexes.CharField(model_attr='title')
    publish_date = indexes.DateTimeField(model_attr='published_at')

    def get_model(self):
        return BlogPost

    def index_queryset(self, using=None):
        # Only index publicly published posts
        return self.get_model().objects.filter(is_published=True)

Only one field per index can have document=True, conventionally named text to avoid internal framework conflicts. use_template=True means we will define the content of the primary index field via a separate template file.

Index Content Template

Create the index content template following the mendatory path format templates/search/indexes/<app_name>/<lowercase_model_name>_text.txt:

# templates/search/indexes/blog/blogpost_text.txt
{{ object.title }}
{{ object.content }}
{{ object.summary }}
{{ object.category.name }}

This template defines which model fields are concatenated into the primary search text field for indexing.

Basic Search Results Template

Create a default results page at templates/search/search.html:

<!DOCTYPE html>
<html>
<head>
    <title>Search Results</title>
    <style>.pagination { margin: 20px 0; gap: 10px; display: flex; }</style>
</head>
<body>
{% if search_query %}
    <h3>Matching Blog Posts:</h3>
    {% for result in page_obj.object_list %}
        <div class="result-item">
            <a href="{% url 'blog:post_detail' result.object.id %}">{{ result.object.title }}</a>
            <p>{{ result.object.summary|truncatechars:150 }}</p>
        </div>
    {% empty %}
        <p>No posts matched your search query. Try different keywords or adjust filter conditions.</p>
    {% endfor %}

    {% if page_obj.has_other_pages %}
        <div class="pagination">
            {% if page_obj.has_previous %}
                <a href="?q={{ search_query }}&page={{ page_obj.previous_page_number }}">&laquo; Previous</a>
            {% endif %}
            <span>Page {{ page_obj.number }} of {{ page_obj.paginator.num_pages }}</span>
            {% if page_obj.has_next %}
                <a href="?q={{ search_query }}&page={{ page_obj.next_page_number }}">Next &raquo;</a>
            {% endif %}
        </div>
    {% endif %}
{% endif %}
</body>
</html>

Chinece Segmentation Configuration

To enable accurate Chinese search, integrate the Jieba analyzer:

  1. Locate the default Whoosh backend file in your Python site-packages directory: haystack/backends/whoosh_backend.py
  2. Copy this file to your application directory (e.g. blog/) and rename it to whoosh_cn_backend.py
  3. Add the Jieba analyzer import at the top of the copied file:
from jieba.analyse import ChineseAnalyzer
  1. Replace all instances of StemmingAnalyzer() in the file with ChineseAnalyzer(), there are approximately 2-3 occurrences total.

Frontend Search Form

Add a search entry point to your page templates:

<form method="get" action="{% url 'haystack_search' %}" target="_blank">
    <input type="text" name="q" placeholder="Search blog posts..." required>
    <button type="submit">Search</button>
</form>

API Search Implementation

For REST API based search, use the drf-haystack extension: First install additional dependencies:

pip install djangorestframework drf-haystack

Create search serializers:

# blog/serializers.py
from rest_framework.serializers import ModelSerializer
from drf_haystack.serializers import HaystackSerializerMixin
from .models import BlogPost
from .search_indexes import BlogPostIndex

class BlogPostBaseSerializer(ModelSerializer):
    class Meta:
        model = BlogPost
        fields = ['id', 'title', 'summary', 'published_at', 'view_count', 'cover_image']

class BlogPostSearchSerializer(HaystackSerializerMixin, BlogPostBaseSerializer):
    class Meta(BlogPostBaseSerializer.Meta):
        index_classes = [BlogPostIndex]
        search_fields = ['text', 'title', 'content']

Create the API search viewset:

# blog/views.py
from drf_haystack.viewsets import HaystackViewSet
from .serializers import BlogPostSearchSerializer
from .models import BlogPost

class BlogSearchViewSet(HaystackViewSet):
    index_models = [BlogPost]
    serializer_class = BlogPostSearchSerializer

Configure API routes:

# project/urls.py
from rest_framework.routers import DefaultRouter
from blog.views import BlogSearchViewSet

router = DefaultRouter()
router.register(r'api/blog-search', BlogSearchViewSet, basename='blog_api_search')

urlpatterns += router.urls

Endex Generation

Run the following command to generate the initial full index manually:

python manage.py rebuild_index

Enter y when prompted to confirm deletion of old index files.

Custom Search Logic

Extend the base SearchView class to implement custom business logic:

# blog/views.py
from haystack.views import SearchView
from .models import BlogPost

class CustomBlogSearchView(SearchView):
    # Use custom results template
    template = 'search/custom_blog_results.html'

    def get_context_data(self, *args, **kwargs):
        context = super().get_context_data(*args, **kwargs)
        # Fallback to partial match if full-text search returns no results
        if not self.results.count():
            fallback_results = BlogPost.objects.filter(
                is_published=True,
                title__icontains=self.query
            ) | BlogPost.objects.filter(
                is_published=True,
                content__icontains=self.query
            )
            context['fallback_results'] = fallback_results.distinct().order_by('-published_at')
        # Add custom context variables
        context['page_title'] = f'Search results for "{self.query}"'
        return context

Keyword Highlighting

Enable keyword highlighting in search results using haystack's built-in highlight template tag, first load the tag library:

{% load highlight %}

{% highlight result.object.content with query css_class "highlight-keyword" html_tag "mark" max_length 200 %}

Available configuration parameters:

  • css_class: CSS class applied to highlighted keyword segments
  • html_tag: HTML element wrapped around matching keywords
  • max_length: Maximum length of the returned text snippet

You can modify the highlight implementation by copying haystack/templatetags/highlight.py and haystack/utils/highlighting.py to your project and adjusting the logic as needed.

For advanced configuration options, refer to the official django-haystack documentation.

Tags: Django django-haystack Full-Text Search Whoosh Jieba

Posted on Sun, 10 May 2026 07:01:03 +0000 by ou812