Pandas Fundamentals: Data Structures and Operations

Pandas is a powerful Python library for data manipulation and analysis. It provides two primary data structures: Series (1D) and DataFrame (2D), along with numerous functions for data processing.

Importing Pandas

# Import necessary libraries
import numpy as np
import pandas as pd

Reading and Writting Data

Pandas supports various file formats for data import and export:

Reading CSV Files

# Read a CSV file
customer_data = pd.read_csv('customers.csv')
customer_data.head()

Reading Text Files

# Read a text file with custom separator
log_data = pd.read_table('server_logs.txt', sep='|')
log_data.head()

Reading Excel Files

# Read an Excel file
sales_data = pd.read_excel('monthly_sales.xlsx')
sales_data.head()

Reading Parameters

Key parameters for reading files:

  • header=None: Treat first row as data, not column names
  • index_col: Use specified column(s) as index
  • usecols: Select specific columns to read
  • nrows: Limit number of rows to read
# Example with parameters
sample_data = pd.read_csv('large_dataset.csv', 
                        index_col=['ID'],
                        usecols=['ID', 'Name', 'Age'],
                        nrows=100)

Basic Data Structures

Series

A Series is a one-dimensional labeled array capable of holding any data type:

# Create a Series
product_prices = pd.Series(data=[19.99, 29.99, 9.99, 49.99],
                         index=['Product_A', 'Product_B', 'Product_C', 'Product_D'],
                         dtype='float64',
                         name='Price_List')
product_prices

DataFrame

A DataFrame is a two-dimensional labeled data structure with columns of potentially different types:

# Create DataFrame from list of lists
employee_records = [[101, 'John Doe', 'Engineering', 75000],
                   [102, 'Jane Smith', 'Marketing', 68000],
                   [103, 'Bob Johnson', 'Sales', 72000]]
employees = pd.DataFrame(data=employee_records,
                        columns=['ID', 'Name', 'Department', 'Salary'])
employees

DataFrame Operaitons

# Select a single column
employees['Name']

# Select multiple columns
employees[['Name', 'Department']]

# Transpose DataFrame
employees.T

Essential DataFrame Functions

Basic Information

# Get column names
employees.columns

# View first n rows (default 5)
employees.head(3)

# View last n rows (default 5)
employees.tail(2)

# Get DataFrame info
employees.info()

# Get descriptive statistics
employees.describe()

Statistical Functions

# Calculate mean
employees['Salary'].mean()

# Calculate maximum value
employees['Salary'].max()

# Calculate quantile
employees['Salary'].quantile(0.75)

# Count non-null values
employees['Department'].count()

# Find index of maximum value
employees['Salary'].idxmax()

Unique Values

# Get unique values
employees['Department'].unique()

# Count unique values
employees['Department'].nunique()

# Value counts
employees['Department'].value_counts()

Duplicates

# Remove duplicates
unique_employees = employees.drop_duplicates(['Department', 'Salary'])

# Check for duplicates
employees.duplicated(['Department', 'Salary'])

Data Replacement

Mapping Replacement

# Replace values using mapping
employees['Department'].replace({'Engineering': 'Tech', 'Marketing': 'BizDev'})

Logical Replacement

# Replace based on condition
employees['Salary'].where(employees['Salary'] > 70000, 70000)

# Mask values based on condition
employees['Salary'].mask(employees['Salary'] < 65000, 65000)

Numerical Operations

# Round values
employees['Salary'].round(-3)

# Absolute values
employees['Salary'].abs()

# Clip values to range
employees['Salary'].clip(lower=60000, upper=80000)

Sorting Data

Value Sorting

# Sort by single column
employees.sort_values('Salary', ascending=False)

# Sort by multiple columns
employees.sort_values(['Department', 'Salary'], ascending=[True, False])

Index Sorting

# Sort by index
employees.sort_index(level=['Department', 'Name'], ascending=[True, False])

Apply Function

The apply() method alllows applying custom functions to DataFrame rows or columns:

# Define custom function
def salary_category(salary):
    if salary > 70000:
        return 'High'
    elif salary > 60000:
        return 'Medium'
    else:
        return 'Low'

# Apply function to column
employees['Salary_Category'] = employees['Salary'].apply(salary_category)

Tags: Pandas DataFrame series data-analysis python

Posted on Mon, 08 Jun 2026 18:42:23 +0000 by phpcoder