Database and SQL
Query Execution Order
The order of execution for SQL statements:
FROM- identifies the source tablesJOIN- combines rows from multiple tablesON- specifies join conditionsWHERE- filters rows based on conditionsGROUP BY- groups rows by specified columnsHAVING- filters groups after aggregationSELECT- selects columns to displayDISTINCT- removes duplicate rowsORDER BY- sorts resultsLIMIT- restricts number of returned rows
Window Functions
Window functions perform calculations across a set of rows related to the current row. Common types include:
- Aggregate functions (SUM, AVG, COUNT)
- Ranking functions (RANK, DENSE_RANK, ROW_NUMBER)
- Analytical functions (LAG, LEAD)
Index Usage
To check if an index is being used:
EXPLAIN SELECT * FROM table_name WHERE column_name = 'value';
Key fields to examine:
type:indexorrangeindicates index usagepossible_keysandkey: matching values indicate index usageExtra:Using indexsuggests index hit
Python Concepts
Variables Scope
Global variables are defined outside functions. Local variables are defined inside functions.
When a local variable shares the same name as a global one, it shadows the global variable within the function scope.
Example:
global_var = 100
def test_function():
local_var = 123 # New local variable
print(local_var) # Prints 123
test_function()
Iterators vs Generators
Generators are a special type of iterator created using yield keyword.
def simple_generator():
yield 1
yield 2
yield 3
for value in simple_generator():
print(value)
Regular Expressions
re.match() matches from the start of string.
re.search() searches the entire string.
Greedy vs non-greedy matching:
import re
text = "abc123def456"
# Greedy matching
greedy_pattern = re.compile(r'\d+')
greedy_match = greedy_pattern.search(text)
print(greedy_match.group()) # Outputs: 123
# Non-greedy matching
non_greedy_pattern = re.compile(r'\d+?')
non_greedy_match = non_greedy_pattern.search(text)
print(non_greedy_match.group()) # Outputs: 1
Class and Object
A class defines a blueprint with attributes and methods. An object is an instance of a class.
class Vehicle:
def __init__(self, make, model):
self.make = make
self.model = model
def start_engine(self):
print(f"{self.make} {self.model} engine started")
# Create object
my_vehicle = Vehicle("Honda", "Civic")
my_vehicle.start_engine()
Expection Handling
def risky_operation():
try:
result = 10 / 0
except ZeroDivisionError as e:
print(f"Error: {e}")
raise # Re-raise exception
# To get exact line numbers:
import traceback
try:
risky_operation()
except Exception as e:
traceback.print_exc()
DataFrame Operations
To view all data in pandas DataFrame:
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
print(df)
# Or simply type df in interactive environment
Spark and Big Data
Concurrency vs Parallelism
Concurrency: Multiple tasks handled by single processor (logical simultaneity). Parallelism: Multiple tasks handled simultaneously by multiple processors (physical simultaniety).
Count Function Differences
COUNT(*): Counts all rows including NULL valuesCOUNT(column): Excludes NULL values in that columnCOUNT(1): Similar to COUNT(*), but may have different performance characteristics
Composite Index Rules
In composite indexes, MySQL uses leftmost prefix principle. If you have an index on (A,B,C,D) and query with A=1 AND B=2 AND C>3 AND D=4, only A and B will use the index.