Python Requests Library for HTTP Operations

Installation

Install the library via pip:

pip install requests

For unstable networks, specify a mirror source:

pip install requests -i https://pypi.mirrors.ustc.edu.cn/simple/

Available mirrors include:

  • Tsinghua University: https://pypi.tuna.tsinghua.edu.cn/simple
  • Alibaba Cloud: http://mirrors.aliyun.com/pypi/simple/
  • University of Science and Technology of China: https://pypi.mirrors.ustc.edu.cn/simple/
  • Huazhong University of Science and Technology: http://pypi.hustunique.com/
  • Shandong University of Technology: http://pypi.sdutlinux.org/
  • Douban: http://pypi.douban.com/simple/

Core Functionality

GET Requests

Retrieve web resources using requests.get(). The method returns a resposne object.

import requests

target_url = 'https://httpbin.org/get'
request_headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
response_text = requests.get(url=target_url, headers=request_headers).text
print(response_text)

Pass query parameters using a dictionary assigned to the params argument.

Response Object

The object returned by a request provides several properties:

  • text: Decoded response body as a string.
  • content: Raw response body as bytes, suitable for binary files.
  • encoding: Character encoding of the response. Can be viewed or set.
  • status_code: HTTP status code.
  • headers: Response headers.
  • cookies: Cookies sent by the server.
  • json(): Parses a JSON response body into a Python object.

Custom Headers and Parameters

Include headers to mimic a browser or pass authentication tokens. Query parameters can be appended to the URL or passed as a dictionary to params.

Cookie Management

Cookies maintain state across requests. They can be handled in two ways:

  1. Included in the headers dictionary under the Cookie key.
  2. Passed as a dictionary to the cookies parameter.

Example of passing cookies directly:

cookie_data = {"session_id": "abc123xyz"}
resp = requests.get('https://example.com', cookies=cookie_data)

POST Requests

Submit data to a server using requests.post(). The data is typically sent in the request body.

api_endpoint = 'https://httpbin.org/post'
payload = {'username': 'test_user', 'password': 'secure_pass'}
req_headers = {'Content-Type': 'application/x-www-form-urlencoded'}
post_response = requests.post(url=api_endpoint, data=payload, headers=req_headers)
print(post_response.status_code)

Session Objects

A session maintains parameters like cookies across multiple requests.

with requests.Session() as session:
    session.headers.update({'User-Agent': 'CustomAgent/1.0'})
    first_resp = session.get('https://example.com/login')
    second_resp = session.get('https://example.com/dashboard')

Anti-Scraping Countermeasures

Common techniques to avoid detection include:

  • User-Agent Spoofing: Rotate or mimic legitimate browser identifiers.
  • Proxy Rotation: Use services like Luminati or Oxylabs to distribute requests.
  • CAPTCHA Solving: Employ third-party services such as Anti-Captcha or 2Captcha.
  • Dynamic Content Rendering: Utilize tools like Selenium or Puppeteer to execute JavaScript.
  • Encryption Analysis: Reverse-engineer client-side encryption algorithms.

Tags: python Requests HTTP web scraping API

Posted on Tue, 23 Jun 2026 16:28:07 +0000 by bobbfwed