Building a Scalable Log Processing Pipeline with Filebeat, Kafka, Logstash, and Elasticsearch

Distributed log processing systems are essential for modern application monitoring and analysis. A common approach involves using Filebeat for log collection, Kafka as a message buffer, Logstash for transformation, Elasticsearch for storage, and Kibana for visualization. Grafana can also integrate with Elasticsearch for real-time monitoring dashboards.

Filebeat is deployed on application servers to minimize resource contention, handling only log reading and forwarding. Logstash, Elasticsearch, and Kibana typically run on dedicated servers, with Logstash performing filtering operations that may require CPU optimization through efficient filter configurations.

Common Architecture Patterns

Direct Filebeat to Elasticsearch Integration

Filebeat sends logs directly to Elasticsearch, with Kibana providing search and visualization capabilities.

Buffered Pipeline with Kafka

Multiple Filebeat instances forward logs to a Kafka cluster. One to three Logstash nodes consume from Kafka and output to an Elasticsearch cluster. This design ensures data persistence; if Logstash fails, logs remain in Kafka until processing resumes. Kibana serves as the front-end for log exploration.

Filebeat Configuration and Deployment

Docker Deployment for Elasticsearch Output

Ensure Filebeat and Elasticsearch versions match. Configure logback.xml to output JSON format.

docker run --privileged --name filebeat --net=host -d -m 1000M \
      --log-driver json-file --log-opt max-size=1024m \
      -v /config/filebeat.yml:/usr/share/filebeat/filebeat.yml \
      -v /local/logs:/app/logs \
      -v /filebeat/data:/data \
      registry.example.com/filebeat:7.10.0

Example filebeat.yml configuration:

filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /app/logs/service-a/*.log
    - /app/logs/service-b/*.log
  ignore_older: 12h
  clean_inactive: 14h
  tags: ["primary-logs"]

- type: log
  enabled: true
  paths:
    - /app/logs/service-c/*.log
  ignore_older: 12h
  clean_inactive: 14h
  tags: ["secondary-logs"]

json.keys_under_root: true
json.overwrite_keys: true

setup.ilm.enabled: false
setup.template.name: "app-logs"
setup.template.pattern: "app-logs-*"
setup.template.enabled: false
setup.template.overwrite: true
setup.template.settings:
  index.number_of_shards: 2
  index.number_of_replicas: 1
  index.codec: best_compression

output.elasticsearch:
  hosts: ["elasticsearch-host:9200"]
  indices:
    - index: "app-logs-primary-%{+yyyy.MM.dd}"
      when.contains:
        tags: "primary-logs"
    - index: "app-logs-secondary-%{+yyyy.MM.dd}"
      when.contains:
        tags: "secondary-logs"

processors:
  - decode_json_fields:
      fields: ["message"]
      target: ""
      overwrite_keys: true
  - rename:
      fields:
        - from: "exception"
          to: "app_exception"
  - drop_fields:
      fields: ["beat", "host", "input", "agent"]

Alternative Elasticsearch output configuration:

output.elasticsearch:
  hosts: ["es-node:9200"]
  index: "logs-%{[fields.log_source]}-*"
  indices:
    - index: "logs-web-%{+yyyy.MM.dd}"
      when.equals:
        fields.log_source: "web_server"
    - index: "logs-app-%{+yyyy.MM.dd}"
      when.equals:
        fields.log_source: "application"

Multiline Log Aggregation

To handle Java exception stack traces:

multiline:
  pattern: '^[[:space:]]+(at|\.{3})\b|^Caused by:'
  negate: false
  match: after

Custom Elasticsearch Index Templates

Enable custom field mappings:

setup.template.json.enabled: true
setup.template.json.path: "/usr/share/filebeat/index_template.json"
setup.template.json.name: "custom_template"

Add volume mounts for template files:

-v /config/fields.yml:/usr/share/filebeat/fields.yml
-v /config/index_template.json:/usr/share/filebeat/index_template.json

Example fields.yml:

- key: app-logs
  title: Application Logs
  description: "Custom log schema for application monitoring"

Tags: filebeat Kafka Logstash elasticsearch kibana

Posted on Fri, 15 May 2026 04:29:59 +0000 by wgh