Installation
Install the library via pip:
pip install requests
For unstable networks, specify a mirror source:
pip install requests -i https://pypi.mirrors.ustc.edu.cn/simple/
Available mirrors include:
- Tsinghua University:
https://pypi.tuna.tsinghua.edu.cn/simple - Alibaba Cloud:
http://mirrors.aliyun.com/pypi/simple/ - University of Science and Technology of China:
https://pypi.mirrors.ustc.edu.cn/simple/ - Huazhong University of Science and Technology:
http://pypi.hustunique.com/ - Shandong University of Technology:
http://pypi.sdutlinux.org/ - Douban:
http://pypi.douban.com/simple/
Core Functionality
GET Requests
Retrieve web resources using requests.get(). The method returns a resposne object.
import requests
target_url = 'https://httpbin.org/get'
request_headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
response_text = requests.get(url=target_url, headers=request_headers).text
print(response_text)
Pass query parameters using a dictionary assigned to the params argument.
Response Object
The object returned by a request provides several properties:
text: Decoded response body as a string.content: Raw response body as bytes, suitable for binary files.encoding: Character encoding of the response. Can be viewed or set.status_code: HTTP status code.headers: Response headers.cookies: Cookies sent by the server.json(): Parses a JSON response body into a Python object.
Custom Headers and Parameters
Include headers to mimic a browser or pass authentication tokens. Query parameters can be appended to the URL or passed as a dictionary to params.
Cookie Management
Cookies maintain state across requests. They can be handled in two ways:
- Included in the
headersdictionary under theCookiekey. - Passed as a dictionary to the
cookiesparameter.
Example of passing cookies directly:
cookie_data = {"session_id": "abc123xyz"}
resp = requests.get('https://example.com', cookies=cookie_data)
POST Requests
Submit data to a server using requests.post(). The data is typically sent in the request body.
api_endpoint = 'https://httpbin.org/post'
payload = {'username': 'test_user', 'password': 'secure_pass'}
req_headers = {'Content-Type': 'application/x-www-form-urlencoded'}
post_response = requests.post(url=api_endpoint, data=payload, headers=req_headers)
print(post_response.status_code)
Session Objects
A session maintains parameters like cookies across multiple requests.
with requests.Session() as session:
session.headers.update({'User-Agent': 'CustomAgent/1.0'})
first_resp = session.get('https://example.com/login')
second_resp = session.get('https://example.com/dashboard')
Anti-Scraping Countermeasures
Common techniques to avoid detection include:
- User-Agent Spoofing: Rotate or mimic legitimate browser identifiers.
- Proxy Rotation: Use services like Luminati or Oxylabs to distribute requests.
- CAPTCHA Solving: Employ third-party services such as Anti-Captcha or 2Captcha.
- Dynamic Content Rendering: Utilize tools like Selenium or Puppeteer to execute JavaScript.
- Encryption Analysis: Reverse-engineer client-side encryption algorithms.