MySQL SQL Parser Architecture: Understanding Flex and Bison Integration
This technical overview examines how MySQL 8.0.34 implements SQL query parsing using bison-generated parser code. The analysis focuses on key implementation details and customizations that differ from standard bison usage patterns.
Grammar Definition Location
MySQL stores its SQL grammar specification in sql/sql_yacc.yy. This file serves as the foundation for the entire parsing system and defines all SQL syntax rules that MySQL recognizes.
Parser Function Naming Convention
The standard bison parser entry point yyparse undergoes significant renaming in MySQL's build configuration. The transformation occurs through the name-prefix directive found in sql/CMakeLists.txt:
--name-prefix=MYSQL
This configuration causes the parser function to be renamed from yyparse to MYSQLparse. In more recent MySQL versions, an alternative approach using api.prefix further transforms this to my_sql_parser_parse through the directive %define api.prefix {my_sql_parser_}.
Parser Invocation Point
The generated parser function is invoked from within the THD::sql_parser() method, which resides in sql/sql_class.cc. This method serves as the main entry point for all SQL parsing operations within a thread descriptor context.
Function Signature and Parameter Passing
The parser function declaration follows this signature:
extern int MYSQLparse(class THD *thd, class Parse_tree_root **root);
This signature is constructed through bison parameter declarations in the grammar file:
%parse-param { class THD *YYTHD }
%parse-param { class Parse_tree_root **parse_tree }
Utilizing Parameters in Semantic Actions
The YYTHD parameter provides access to the thread descriptor, enabling semantic actions to manipulate session state. Consider this example demonstrating parameter access:
prepare:
PREPARE_SYM identifier FROM source_expr
{
THD *current_thread = YYTHD;
LEX *lex_context = current_thread->lex;
lex_context->sql_command = SQLCOM_PREPARE;
lex_context->prepared_stmt_name = to_lex_cstring($2);
lex_context->contains_plaintext_password = true;
}
;
The parse_tree paramter serves as an output mechanism for returning the constructed parse tree to the caller:
simple_statement_or_begin:
simple_statement { *parse_tree = $1; }
| begin_stmt
;
Value Stack Type Definition Strategy
MySQL deviates from standard bison practices by not using the %union directive for defining YYSTYPE. Instead, the type is explicitly defined in a separate header file sql/parser_yystype.h:
union YYSTYPE {
Lexer_yystype lexer;
};
The Lexer_yystype union encapsulates various lexer-generated value types:
union Lexer_yystype {
LEX_STRING text_content;
LEX_SYMBOL token_symbol;
const CHARSET_INFO *character_set;
PT_hint_list *optimization_hints;
LEX_CSTRING hint_data;
};
Nonterminal symbol type are then specified using the dot notation:
%type <lexer.text_content> TEXT_STRING_sys TEXT_STRING
Lexical Analyzer Implementation
While bison invokes yylex for token generation, MySQL replaces this with MYSQLlex due to the naming prefix. However, MySQL does not employ the flex tool for lexical analysis. Instead, the lexer is implemented entire within the Lex_input_stream class, providing custom tokenization logic tailored to SQL parsing requirements.