Dependencies Installation
Install required packages first:
pip install django-haystack whoosh jieba
Core Settings Configuration
Add haystack to the end of INSTALLED_APPS in your project's settings.py to avoid resource override conflicts:
INSTALLED_APPS = [
# Pre-existing application entries
'django.contrib.auth',
'django.contrib.contenttypes',
'blog',
'haystack'
]
Append haystack connection and behavior configurations to settings.py:
import os
from pathlib import Path
BASE_DIR = Path(__file__).resolve().parent.parent
HAYSTACK_CONNECTIONS = {
'default': {
# Use custom Chinese-optimized Whoosh engine we will create later
'ENGINE': 'blog.whoosh_cn_backend.WhooshEngine',
'PATH': os.path.join(BASE_DIR, 'search_index_storage'),
}
}
# Enable real-time index sync when database records are modified
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
ENGINEspecifies the search backend implementation, we use a Jieba-integrated custom version instead of the default English-only Whoosh backendPATHpoints to the directory for storing generated index files, which will be created automatically during the first index buildHAYSTACK_SIGNAL_PROCESSORenables real-time index updates, suitable for low-write workloads like personal blogs and documentation sites
URL Routing Setup
You can use haystack's built-in routes for basic use, or define custom routes for extended logic:
# project/urls.py
from django.urls import path, include
from blog.views import CustomBlogSearchView
urlpatterns = [
# Pre-existing route entries
# Uncomment below line if no custom search logic is needed
# path('search/', include('haystack.urls')),
# Custom search endpoint
path('blog-search/', CustomBlogSearchView.as_view(), name='haystack_search'),
]
Search Index Definition
Haystack requires a search_indexes.py file in each app that contains searchable models to define index rules:
# blog/search_indexes.py
from haystack import indexes
from .models import BlogPost
class BlogPostIndex(indexes.SearchIndex, indexes.Indexable):
# Primary search field, required, must have document=True flag
text = indexes.CharField(document=True, use_template=True)
# Additional indexed fields for filtering or direct response return
post_id = indexes.IntegerField(model_attr='id')
title = indexes.CharField(model_attr='title')
publish_date = indexes.DateTimeField(model_attr='published_at')
def get_model(self):
return BlogPost
def index_queryset(self, using=None):
# Only index publicly published posts
return self.get_model().objects.filter(is_published=True)
Only one field per index can have document=True, conventionally named text to avoid internal framework conflicts. use_template=True means we will define the content of the primary index field via a separate template file.
Index Content Template
Create the index content template following the mendatory path format templates/search/indexes/<app_name>/<lowercase_model_name>_text.txt:
# templates/search/indexes/blog/blogpost_text.txt
{{ object.title }}
{{ object.content }}
{{ object.summary }}
{{ object.category.name }}
This template defines which model fields are concatenated into the primary search text field for indexing.
Basic Search Results Template
Create a default results page at templates/search/search.html:
<!DOCTYPE html>
<html>
<head>
<title>Search Results</title>
<style>.pagination { margin: 20px 0; gap: 10px; display: flex; }</style>
</head>
<body>
{% if search_query %}
<h3>Matching Blog Posts:</h3>
{% for result in page_obj.object_list %}
<div class="result-item">
<a href="{% url 'blog:post_detail' result.object.id %}">{{ result.object.title }}</a>
<p>{{ result.object.summary|truncatechars:150 }}</p>
</div>
{% empty %}
<p>No posts matched your search query. Try different keywords or adjust filter conditions.</p>
{% endfor %}
{% if page_obj.has_other_pages %}
<div class="pagination">
{% if page_obj.has_previous %}
<a href="?q={{ search_query }}&page={{ page_obj.previous_page_number }}">« Previous</a>
{% endif %}
<span>Page {{ page_obj.number }} of {{ page_obj.paginator.num_pages }}</span>
{% if page_obj.has_next %}
<a href="?q={{ search_query }}&page={{ page_obj.next_page_number }}">Next »</a>
{% endif %}
</div>
{% endif %}
{% endif %}
</body>
</html>
Chinece Segmentation Configuration
To enable accurate Chinese search, integrate the Jieba analyzer:
- Locate the default Whoosh backend file in your Python site-packages directory:
haystack/backends/whoosh_backend.py - Copy this file to your application directory (e.g.
blog/) and rename it towhoosh_cn_backend.py - Add the Jieba analyzer import at the top of the copied file:
from jieba.analyse import ChineseAnalyzer
- Replace all instances of
StemmingAnalyzer()in the file withChineseAnalyzer(), there are approximately 2-3 occurrences total.
Frontend Search Form
Add a search entry point to your page templates:
<form method="get" action="{% url 'haystack_search' %}" target="_blank">
<input type="text" name="q" placeholder="Search blog posts..." required>
<button type="submit">Search</button>
</form>
API Search Implementation
For REST API based search, use the drf-haystack extension:
First install additional dependencies:
pip install djangorestframework drf-haystack
Create search serializers:
# blog/serializers.py
from rest_framework.serializers import ModelSerializer
from drf_haystack.serializers import HaystackSerializerMixin
from .models import BlogPost
from .search_indexes import BlogPostIndex
class BlogPostBaseSerializer(ModelSerializer):
class Meta:
model = BlogPost
fields = ['id', 'title', 'summary', 'published_at', 'view_count', 'cover_image']
class BlogPostSearchSerializer(HaystackSerializerMixin, BlogPostBaseSerializer):
class Meta(BlogPostBaseSerializer.Meta):
index_classes = [BlogPostIndex]
search_fields = ['text', 'title', 'content']
Create the API search viewset:
# blog/views.py
from drf_haystack.viewsets import HaystackViewSet
from .serializers import BlogPostSearchSerializer
from .models import BlogPost
class BlogSearchViewSet(HaystackViewSet):
index_models = [BlogPost]
serializer_class = BlogPostSearchSerializer
Configure API routes:
# project/urls.py
from rest_framework.routers import DefaultRouter
from blog.views import BlogSearchViewSet
router = DefaultRouter()
router.register(r'api/blog-search', BlogSearchViewSet, basename='blog_api_search')
urlpatterns += router.urls
Endex Generation
Run the following command to generate the initial full index manually:
python manage.py rebuild_index
Enter y when prompted to confirm deletion of old index files.
Custom Search Logic
Extend the base SearchView class to implement custom business logic:
# blog/views.py
from haystack.views import SearchView
from .models import BlogPost
class CustomBlogSearchView(SearchView):
# Use custom results template
template = 'search/custom_blog_results.html'
def get_context_data(self, *args, **kwargs):
context = super().get_context_data(*args, **kwargs)
# Fallback to partial match if full-text search returns no results
if not self.results.count():
fallback_results = BlogPost.objects.filter(
is_published=True,
title__icontains=self.query
) | BlogPost.objects.filter(
is_published=True,
content__icontains=self.query
)
context['fallback_results'] = fallback_results.distinct().order_by('-published_at')
# Add custom context variables
context['page_title'] = f'Search results for "{self.query}"'
return context
Keyword Highlighting
Enable keyword highlighting in search results using haystack's built-in highlight template tag, first load the tag library:
{% load highlight %}
{% highlight result.object.content with query css_class "highlight-keyword" html_tag "mark" max_length 200 %}
Available configuration parameters:
css_class: CSS class applied to highlighted keyword segmentshtml_tag: HTML element wrapped around matching keywordsmax_length: Maximum length of the returned text snippet
You can modify the highlight implementation by copying haystack/templatetags/highlight.py and haystack/utils/highlighting.py to your project and adjusting the logic as needed.
For advanced configuration options, refer to the official django-haystack documentation.