Problem Context
The objective is to retrieve a list of customer names from a database table, specifically exclduing those who were referred by the customer with ID 2. A naive approach might involve a direct comparison operator in the WHERE clause.
SELECT name
FROM customer
WHERE referee_id <> 2;
However, executing this query often yields unexpected results, failing to return rows where the referee_id is NULL. This discrepancy arises from the SQL standard's implementation of logic.
The Logic of NULLs
SQL databases operate on a three-valued logic system comprising TRUE, FALSE, and UNKNOWN. In this system, any comparison operation involving a NULL value—such as NULL = 2 or NULL <> 2—automatically evaluates to UNKNOWN. Since the WHERE clause only filters for rows that evaluate to TRUE, rows resulting in FALSE or UNKNOWN are discarded.
Robust Query Solutions
To accurately capture the required data, the query must explicitly account for the UNKNOWN state. There are two primary methods to handle this.
Method 1: Explicit NULL Checking
The most standard SQL approach involves expanding the condition to explicitly include NULL values using the IS NULL predicate.
SELECT name
FROM customer
WHERE referee_id <> 2
OR referee_id IS NULL;
Method 2: Value Substitution
Alternatively, one can use a function to substitute the NULL value with a placeholder (such as 0) that will satisfy the inequality condition. The COALESCE function (or IFNULL in MySQL) is suitable for this.
SELECT name
FROM customer
WHERE COALESCE(referee_id, 0) <> 2;
Deep Dive: Three-Valued Logic
The Nature of NULL
In databace theory, NULL serves as a marker for the absence of a value, which generally falls into two categories:
- Unknown: The value exists but is currently missing (e.g., the color of a person's eyes in a photo where they are wearing sunglasses).
- Inapplicable: The attribute does not exist for the entity (e.g., the email adress of a rock).
While these concepts are distinct logically, SQL Relational Database Management Systems (RDBMS) typically treat them as a single NULL state.
Predicates and NULLs
Predicates in SQL (such as =, <, BETWEEN, IN) are functions that return boolean values. Because NULL is not a value itself but a placeholder, applying a comparison predicate to it is semantically invalid, resulting in UNKNOWN.
This explains why IS NULL is required rather than = NULL. The IS NULL construct should be viewed as a single, atomic predicate (conceptually similar to IS_NULL) designed specifically to detect the presence of this marker. Unlike standard comparison operators, it returns TRUE when the marker is detected.
Logical Operations with UNKNOWN
When combining UNKNOWN with standard boolean values, the result follows specific priority rules. It is crucial to distinguish between the boolean value unknown and the SQL NULL marker.
NOT unknownresults inunknown.unknown = unknownresults intrue(treating them as the same boolean state).UNKNOWN = UNKNOWN(comparing the markers) results inunknown.
Understanding the interaction in AND and OR operations is vital for writing complex conditions:
-- OR Operations: TRUE dominates
TRUE OR unknown => TRUE
unknown OR unknown => unknown
FALSE OR unknown => unknown
-- AND Operations: FALSE dominates
TRUE AND unknown => unknown
unknown AND unknown => unknown
FALSE AND unknown => FALSE
A useful mnemonic for precedence is:
- For
AND:FALSE>unknown>TRUE - For
OR:TRUE>unknown>FALSE