The Problem with Traditional Approaches
When performing web scraping with requests, websites requiring authentication demand extensive analysis of network packets, JavaScript source code, and complex request construction. Anti-crawling mechanisms like CAPTCHAs, JavaScript obfuscation, and signature parameters create high barriers to entry. When data is generated through JavaScript calculations, developers must replicate the entire computation process—resulting in poor user experience and low development efficiency.
Browser automation offers a solution to bypass many of these obstacles, but traditional browser-based approaches suffer from poor performance.
Introducing DrissionPage
DrissionPage was designed to merge the strengths of both approaches, allowing seamless switching between modes based on requirements while providing an intuitive API. The library encapsulates common functionalities at the page level, delivering streamlined operations that enable developers to focus on functionality implementation rather than implementation details.
Key Advantages
This library employs a fully custom-developed kernel with numerous practical features, offering significant improvements over Selenium:
- Stealth Operation: No webdriver fingerprints, avoiding detection by websites
- Driver Independence: No need to download different drivers for different browser versions
- Enhanced Performance: Faster execution compared to traditional solutions
- Cross-frame Element Location: Find elements across iframes without context swithcing
- Simplified Frame Handling: Treat iframes as regular elements—locate once, search within
- Multi-tab Support: Operate on mulitple browser tabs simultaneously, including inactive ones
- Direct Cache Access: Read browser cache directly to save images without GUI interactions
- Full Page Screenshots: Capture the entire page including off-viewport content (browser version 90+)
Additional Features
The library includes numerous user-friendly designs:
- Concise Syntax: Integrated common operations with elegant code patterns
- Robust Element Location: More powerful and stable element finding capabilities
- Automatic Wait and Retry: Handles unstable networks gracefully, improving stability
- Built-in Download Manager: Reliable download functionality during browser operations
- Browser Reusability: Reuse existing browser instances without restarting, simplifying debugging
- Configuration Management: INI file support for common settings with automatic loading
- High-speed Parsing: Integrated lxml engine for faster parsing
- POM Support: Page Object Model implementation ready for testing
- Integrated Utilities: Convenient features throughout
Architecture Overview
WebPage inherits from both ChromiumPage and SessionPage. The former controls the browser while the latter handles network packet transmission. Consequently, WebPage can control browsers, send/receive packets, and share login states between both modes.
Practical Comparisons
Fetching Element Content
Using requests:
import requests
from lxml import etree
url = 'https://example.com/product'
headers = {'User-Agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
tree = etree.HTML(response.text)
heading = tree.xpath('//h1')[0]
product_name = heading.text
Using DrissionPage:
from DrissionPage import WebPage
page = WebPage()
page.get('https://example.com/product')
product_name = page('tag:h1').text
File Download
Using requests:
import requests
url = 'https://example.com/data.zip'
destination = r'C:\temp'
response = requests.get(url, stream=True)
with open(f'{destination}\archive.zip', 'wb') as file:
for chunk in response.iter_content(chunk_size=8192):
file.write(chunk)
Using DrissionPage:
from DrissionPage import WebPage
page = WebPage()
page.download('https://example.com/data.zip', r'C:\temp', 'archive')
Element Location with Wait
Selenium approach:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
element = WebDriverWait(driver).until(
EC.presence_of_element_located((By.XPATH, '//*[contains(text(), "keyword")]'))
)
DrissionPage approach:
from DrissionPage import WebPage
page = WebPage()
element = page('keyword')
Tab Navigation
Selenium approach:
driver.switch_to.window(driver.window_handles[0])
DrissionPage approach:
page.to_tab(0)
Dropdown Selection
Selenium approach:
from selenium.webdriver.support.select import Select
dropdown = Select(element)
dropdown.select_by_visible_text('Option Text')
DrissionPage approach:
element.select('Option Text')
Drag and Drop
Selenium approach:
from selenium.webdriver.common.action_chains import ActionChains
ActionChains(driver).drag_and_drop(source, target).perform()
DrissionPage approach:
source.drag_to(target)
XPath Attribute Extraction
Selenium approach:
# Requires complex chained calls
DrissionPage approach:
css_class = element('xpath://div[@id="container"]/@class')
content = element('xpath://div[@id="container"]/text()[2]')
Sample Application
Monitoring product availability and sending notifications:
from DrissionPage import MixPage
import requests
import time
browser = MixPage()
browser.get('https://shop.example.com/items/12345')
browser.to_iframe('mainFrame')
stock_status = browser.ele('@class=stock-count').text
print(f'Current stock: {stock_status}')
if int(stock_status) > 0:
webhook = 'https://api.notify.service.com/send'
payload = {
'token': 'abc123',
'title': 'Item Available',
'message': f'Product now has {stock_status} units in stock',
'format': 'html'
}
requests.post(webhook, json=payload)
time.sleep(2)
Installation
pip install DrissionPage
Import Options
The WebPage class provides the most comprehensive functionality, controlling both browser and network packets:
from DrissionPage import WebPage
Additional classes available for specific use cases:
# Browser-only control (no network packet support)
from DrissionPage import ChromiumPage
# Network packet handling only (no browser control)
from DrissionPage import SessionPage
# Browser launch configuration
from DrissionPage import DriverOptions
# Session object configuration
from DrissionPage import SessionOptions
# Action chain for mouse/keyboard simulation
from DrissionPage import ActionChains
Class Hierarchy
- ChromiumPage: Controls Chromium-based browsers
- SessionPage: Handles HTTP requests
- MixPage: Combines ChromiumPage and SessionPage capabilities
- WebPage: Full-featured class supporting both modes with automatic switching