Building an Intelligent NL2SQL Data Assistant with FastGPT

As Large Language Models (LLMs) continue to evolve, many industries are exploring practical applications such as Natural Language to SQL (NL2SQL) to simplify data querying. Given the high cost of fine-tuning models, most enterprises rely on a combination of advanced prompt engineering and Retrieval-Augmented Generation (RAG) using knowledge bases. This article demonstrates how to implement a data-querying assistant using FastGPT and GPT-4.

1. Data Schema Preparation

The first step involves organizing the data into a structured format. In this scenario, we use a unified performance table that stores data across different time granularities: yearly (e.g., 2024), month (e.g., 202405), and weekly (e.g., 202422). Below is the SQL structure for the performance tracking table:

CREATE TABLE `sales_performance_metrics` (
  `report_period` varchar(50) DEFAULT NULL COMMENT 'Time cycle (Year, Month, or Week)',
  `parent_org_id` varchar(50) DEFAULT NULL COMMENT 'ID of the parent department',
  `parent_org_name` varchar(100) DEFAULT NULL COMMENT 'Name of the parent department',
  `dept_id` varchar(50) DEFAULT NULL COMMENT 'Unique department ID',
  `dept_name` varchar(100) DEFAULT NULL COMMENT 'Name of the department',
  `revenue_actual` decimal(18,4) DEFAULT NULL COMMENT 'Actual revenue achieved',
  `revenue_target` decimal(18,4) DEFAULT NULL COMMENT 'Target revenue goal',
  `achievement_rate` decimal(18,8) DEFAULT NULL COMMENT 'Completion percentage',
  `product_category` varchar(50) DEFAULT NULL COMMENT 'Specific product line'
) COMMENT='Departmental Sales Performance Table';

2. Workflow Configuration in FastGPT

The core logic is implemented through a FastGPT workflow. The system follows a multi-step process to ensure accuracy:

  • Knowledge Base Integration: Stores common query patterns and table descriptions to provide context.
  • Schema Injection: A code node passes the relevant table schema and metadata to the LLM.
  • Intent Analysis: The LLM (specifically GPT-4) interprets the user's natural language and maps it to the database schema.
  • SQL Generation: The model generates a precise SQL query based on the prompt constraints.
  • Data Execution: The gneerated SQL is executed against the database, and the results are formatted for the user.

3. Optimizing the NL2SQL Prompt

The effectiveness of the assistant depends heavily on the system prompt. Below is a structured prompt designed to guide the model through complex SQL generation tasks:

**Role**
You are a Senior Data Engineer and NL2SQL expert. Your task is to convert natural language questions into valid SQL queries based on the provided database schema.

**Reasoning Steps**
1. Analyze the schema to determine if the query requires a single table or multiple joins.
2. Identify necessary columns. Always include grouping fields if the user asks for aggregated data (e.g., if asking for monthly performance, include the month column).
3. Incorporate user context (Department ID, Name, Current Time) if provided.
4. Validate the SQL syntax against the specific table structure provided.
5. Apply formatting rules for currency (converted to 'Ten Thousand' units) and percentages.

**Contextual Constraints**
- Use exact matches for department names and product categories.
- When querying subordinates, use the `parent_org_name` field.
- Default to using names rather than IDs for readability unless specified otherwise.
- Output revenue values in units of 10,000, rounded to two decimal places.

**Example Schema Context**
Table: `sales_performance_metrics`
Fields: report_period, parent_org_name, dept_name, revenue_actual, revenue_target, achievement_rate, product_category.

**Output Format**
Return only the raw SQL statement. Do not include markdown blocks or additional explanations unless specifically requested in a separate field.

**Example Conversion**
Input: "Show me the 2024 performance for the South-East Division and its sub-units."
Output:
SELECT 
    dept_name, 
    revenue_actual / 10000 AS revenue_wan, 
    revenue_target / 10000 AS target_wan, 
    achievement_rate * 100 AS completion_pct,
    report_period
FROM sales_performance_metrics
WHERE report_period = '2024' 
AND (parent_org_name = 'South-East Division' OR dept_name = 'South-East Division');

4. Knowledge Base Refinement

To improve the model's performence on edge cases, the FastGPT knowledge base should be populated with "Question-SQL" pairs. This allows the system to perform a semantic search before generating a query, providing the LLM with relevant few-shot examples that match the user's intent. For instance, if a user asks about "達成率" (achievement rate), the knowledge base ensures the LLM knows to use the revenue\_actual / revenue\_target logic if the pre-calculated column is unavailable or needs validation.

Tags: NL2SQL FastGPT LLM Workflow prompt engineering

Posted on Sun, 10 May 2026 15:36:35 +0000 by phpmania1