Web automation is frequently used to monitor e-commerce prices. The following script demonstrates how to track a product price and trigger an API notification when a target threshold is met. This example utilizes the mixed mode to handle navigation and data extraction seamlessly.
from DrissionPage import WebPage
import requests
import time
# Initialize the unified page object
browser = WebPage()
# Navigate to the target product page
browser.get('https://www.example-shop.com/product-id')
# Handle page elements, such as switching to an embedded iframe if necessary
browser.to_iframe('product-frame-id')
# Extract the price string and convert to integer
price_text = browser.ele('@class=current-price').text
current_price = int(price_text)
print(f"Detected price: {current_price}")
# Define purchase threshold
TARGET_PRICE = 1200
if current_price <= TARGET_PRICE:
# Prepare notification payload
notification_token = 'your_api_token_here'
alert_title = 'Price Drop Alert'
message_body = f'The price has dropped to {current_price}'
# Construct the API request URL
endpoint = f"https://api.notify-service.com/send?token={notification_token}&title={alert_title}&content={message_body}&template=html"
# Send the notification
requests.get(endpoint)
Design Philosophy
Traditional web scraping often forces a choice between using requests or a browser automation tool like Selenium. The requests library is efficient for data transfer but struggles with dynamic content, complex authentication, and obfuscated JavaScript, requiring significant reverse engineering. Conversely, browser tools bypass these hurdles but suffer from high resource consumption and slower execution speeds.
This library is designed to merge these approaches into a single framework. It allows developers to switch between packet-based data transmission and browser-controlled rendering based on the specific requirements of the task. By abstracting common web operations into a simplified API, it reduces boilerplate code and improves development velocity.
Core Features
DrissionPage utilizes a proprietary kernel that integrates essential optimization features. Compared to standard Selenium implementations, it offers several distinct advantages:
- Stealth Mode: It lacks the standard
webdriverattributes, making it significantly harder for websites to detect bot activity. - Driver Management: There is no need to manually download and configure browser drivers for different versions.
- Performance: Execution speed is optimized for faster operation.
- Frame Handling: Elements can be located across iframes without the need to manually switch context; iframes are treated as standard containers.
- Tab Management: It supports interacting with multiple browser tabs simultaneously, regardless of which tab is currently active.
- Resource Access: Direct access to browser caches allows for efficient image retrieval without GUI-based saving.
- Full Capture: Supports full-page screenshots, including content out side the viewport (supported in Chromium 90+).
Architecture Overview
The class structure is designed to maximize flexibility. The WebPage class inherits from both ChromiumPage and SessionPage. The former handles browser control and rendering, while the latter manages HTTP packet requests. This inheritance allows WebPage to control the browser, manage network requests, and share login states (cookies) between the two modes seamlessly.
Code Comparison and Syntax Efficiency
The following examples illustrate how DrissionPage simplifies common tasks compared to conventional libraries.
Comparison with requests
Extracting Element Text:
url = 'https://baike.baidu.com/item/python'
# Traditional requests approach
from lxml import etree
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'}
response = requests.get(url, headers=headers)
tree = etree.HTML(response.text)
title_text = tree.xpath('//h1')[0].text
# DrissionPage approach
page = WebPage('s') # 's' indicates session mode
page.get(url)
title_text = page('tag:h1').text
File Downloading:
file_url = 'https://example.com/image.png'
save_directory = r'C:\Downloads'
# Traditional requests approach
r = requests.get(file_url, stream=True)
with open(f'{save_directory}\\image.png', 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
# DrissionPage approach
page.download(file_url, save_directory, 'image') # Handles renaming and conflicts automatically
Comparison with Selenium
Locating Elements with Explicit Wait:
# Selenium approach
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common import by
elem = WebDriverWait(driver, 10).until(
ec.presence_of_element_located((By.XPATH, '//*[contains(text(), "target text")]'))
)
# DrissionPage approach
elem = page('target text') # Integrated waiting and retry logic
Switching Tabs:
# Selenium approach
driver.switch_to.window(driver.window_handles[0])
# DrissionPage approach
page.to_tab(0)
Selecting Dropdown Options:
# Selenium approach
from selenium.webdriver.support.select import Select
select_obj = Select(element)
select_obj.select_by_visible_text('Option Name')
# DrissionPage approach
element.select('Option Name')
Complex XPath Retrieval:
# DrissionPage allows direct attribute or text node access via XPath
css_class = element('xpath://div[@id="main"]/@class')
node_text = element('xpath://div[@id="main"]/text()[2]')
Installation and Setup
To integrate DrissionPage into your project, install the package via pip:
pip install DrissionPage
While WebPage is the primary class that unifies browsing and session capabilities, specific modules can be imported for granular control:
from DrissionPage import WebPage
# For browser-only control
from DrissionPage import ChromiumPage
# For HTTP request-only operations
from DrissionPage import SessionPage
# For managing browser launch arguments
from DrissionPage import DriverOptions
# For managing session configurations
from DrissionPage import SessionOptions
# For simulating complex input chains
from DrissionPage import ActionChains