Building an NL2SQL Chatbot with FastGPT for Business Data Queries

Overview

Large language models have opened new possibilities for natural language to SQL conversion. This article demonstrates a practical implementation using FastGPT to enable non-technical users to query business data through conversational interfaces.

Database Schema

The solution uses a single denormalized table containing achievement metrics across different time periods:

CREATE TABLE `achievement_facts` (
  `period_key` varchar(50) DEFAULT NULL COMMENT 'Time period identifier',
  `parent_dept_code` varchar(50) DEFAULT NULL COMMENT 'Parent department code',
  `parent_dept_name` varchar(50) DEFAULT NULL COMMENT 'Parent department name',
  `dept_code` varchar(50) DEFAULT NULL COMMENT 'Department code',
  `dept_name` varchar(50) DEFAULT NULL COMMENT 'Department name',
  `achievement_value` decimal(38,4) DEFAULT NULL COMMENT 'Actual achievement',
  `target_value` varchar(150) DEFAULT NULL COMMENT 'Target value',
  `completion_ratio` decimal(38,18) DEFAULT NULL COMMENT 'Completion percentage',
  `product_category` varchar(50) DEFAULT NULL COMMENT 'Product line'
) COMMENT='Department Achievement Facts';

Period formats include: yearly (2024), monthly (202411), and weekly (2024082).

Workflow Architecture

The FastGPT workflow follows these sequential steps:

  1. Knowledge Retrieval - Match user query against existing Q&A pairs
  2. Schema Injection - Provide AI with relevant table schema and field descriptions
  3. SQL Generation - Translate natural language to SQL with context awareness
  4. Database Execution - Run the generated query against the data warehouse
  5. Result Presentation - Display formatted results to the user

Prompt Engineering for SQL Generation

The core prompt defines the AI as an NL2SQL specialist with specific guidelines:

**Role**
You are an expert NLP-to-SQL translator. Users provide natural language queries along with database schema information. Generate appropriate SQL statements with brief explanations.

**Reasoning Steps**
1. Determine if the query requires single-table or multi-table join operations
2. Identify required output columns based on user requirements, ensuring all relevant fields are returned
3. Apply user context filters (department, role, time constraints)
4. Enforce data format requirements from system guidelines
5. Validate generated SQL against schema for correctness
6. Format output according to conversion rules

**User Context Variables**
- Current user name: [variable]
- Employee ID: [variable]
- Department code: [variable]
- Department name: [variable]
- Department level: [variable]
- Current timestamp: [variable]

**Schema Reference**
```sql
CREATE TABLE `achievement_facts` (
  `period_key` varchar(50) DEFAULT NULL,
  `parent_dept_code` varchar(50) DEFAULT NULL,
  `parent_dept_name` varchar(50) DEFAULT NULL,
  `dept_code` varchar(50) DEFAULT NULL,
  `dept_name` varchar(50) DEFAULT NULL,
  `achievement_value` decimal(38,4) DEFAULT NULL,
  `target_value` varchar(150) DEFAULT NULL,
  `completion_ratio` decimal(38,18) DEFAULT NULL,
  `product_category` varchar(50) DEFAULT NULL
);

Field Semantics

  • Dimension fields: period_key (formats: 2024-year, 202411-month, 2024113-week), parent_dept_name, dept_name, product_category
  • Metric fields: achievement_value, target_value, completion_ratio
  • Calculation: completion_ratio = achievement_value / target_value
  • Parent hierarchy: Use parent_dept_name when querying organizational subtrees
  • Product matching: Apply partial match using LIKE operator

Query Transformation Examples

Input: "What is the group's annual sales achievement this year?"

{
  "sql": "SELECT 'Group' AS dept_name, period_key AS year,
    achievement_value/10000 AS total_achievement_wan,
    period_key AS data_period,
    product_category AS product_line
  FROM achievement_facts
  WHERE dept_name = 'Group' AND period_key = 2024 AND product_category = 'All'",
  "explanation": "Retrieves 2024 annual achievement for the Group entity"
}

Input: "Show performance and completion status for East China Theater and subsidiaries for the full year."

{
  "sql": "SELECT 
    t1.dept_name AS department,
    t1.achievement_value/10000 AS sales_wan,
    t1.target_value/10000 AS target_wan,
    (t1.achievement_value/t1.target_value) AS completion_pct,
    t1.period_key AS time_period,
    t1.product_category AS category
  FROM achievement_facts t1
  WHERE t1.period_key = '2024' 
    AND (t1.parent_dept_name = 'East China Theater' OR t1.dept_name = 'East China Theater')
    AND t1.product_category = 'All'",
  "explanation": "Returns achievement metrics for East China region and all nested departmants"
}

Output Requirements

  1. Return only raw SQL without markdown formatting or explanations
  2. Use exact match for categorical filters (product lines, department names)
  3. Always include numerator and denominator for ratio calculations
  4. Format currency values in wan units with 2 decimal precision and thousand separators
  5. Express percentages with 2 decimal places and % symbol
  6. Prefer name fields over code fields when both are available

## Knowledge Base Configuration

The knowledge base contains curated Q&A pairs to improve query matching accuracy. Each entry consists of:

- **Question**: Natural language query pattern
- **Answer**: Corresponding SQL statement with explanation

This approach provides fallback responses for common queries and helps the system handle variations in user phrasing.

## Key Implementation Details

The system leverages GPT-4 for SQL generation with several safeguards:

- **Data Security**: User context variables enable row-level security filtering
- **Query Validation**: Generated SQL undergoes schema validation before execution
- **Graceful Degradation**: Knowledge base provides reliable responses for standard queries
- **Format Enforcement**: Prompt instructions ensure consistent output formatting

The workflow handles both simple single-entity queries and complex hierarchical organization queries requiring recursive parent-child relationships.

Tags: NL2SQL FastGPT LLM Workflow prompt engineering

Posted on Sun, 10 May 2026 15:36:35 +0000 by phpmania1