Scrapy Framework Setup and XPath Querying Techniques
Core Framwork Architecture
Scrapy operates as an asynchronous web scraping framework built upon Twisted. The main components coordinating data flow are:
Engine: Orchestrates triggers and overall data handling.
Scheduler: Accepts requests, maintains a priority queue, and deduplicates URLs.
Downloader: Retrieves page content via non-blocking I/O ...
Posted on Mon, 15 Jun 2026 17:21:19 +0000 by Buffas
Web Scraping with XPath: Extracting News Headlines from 36Kr
Having previously explored the powerful BeautifulSoup library for HTML parsing and techniques for capturing HTTP requests through on line tools, we now turn our attention to another fundamental web scraping approach: XPath. XPath serves as a query language designed to navigate and select specific portions of XML documents. While originally deve ...
Posted on Fri, 12 Jun 2026 18:32:30 +0000 by 9mm
Mobile App Element Interaction Methods in UI Automation
Click action: element.click()
Text input: element.send_keys("text_value")
Value setting: element.set_value("new_value")
Clear content: element.clear()
Visibility check: element.is_displayed() returns boolean
Enabled status: element.is_enabled() returns boolean
Selection state: element.is_selected() returns boolean
Attribute ...
Posted on Wed, 10 Jun 2026 16:22:19 +0000 by bbaker
Applying XPath Expressions with Python's lxml Library
Installation
Install the library using pip:
pip install lxml
XPath Core Concepts
Node Types
XPath defines seven node types: element, attribute, text, namespace, processing instruction, comment, and the document (root) node. An XML document is represented as a node tree, with the root of the tree being the document or root node.
Consider this s ...
Posted on Wed, 20 May 2026 18:13:14 +0000 by alego
Introduction to the Scrapy Framework for Web Scraping
This article explores the fundamentals of web scraping using Python, covering aspects from basic browser automation to the powerful Scrapy framework.
Web Scraping with Selenium
Selenium is a popular tool for browser automation, enabling the simulation of user interactions with web pages. The following example demonstrates how to use Selenium wi ...
Posted on Sun, 17 May 2026 23:33:31 +0000 by strago
:Practical XPath Parsing with Python's lxml Library
The lxml library serves as a powerful Pythonic wrapper around C libraries like libxml2 and libxslt, delivering exceptional perofrmance for parsing HTML and XML documents. Its comprehensive support for XPath 1.0 makes it an ideal choice for targeted data extraction tasks.
Setup
Install the package via pip:
pip install lxml
Initializing Parser O ...
Posted on Fri, 15 May 2026 15:42:04 +0000 by zackcez
Scraping Douban Book Data with Scrapy
Scrapy is an asynchronous web crawling framework built on Twisted, enabling efficient and scalable data extraction in Python. To begin scraping book information from Douban’s web site, first install Scrapy using pip:
pip install Scrapy -i https://pypi.tuna.tsinghua.edu.cn/simple
Create a new project named douban:
scrapy startproject douban
cd ...
Posted on Fri, 15 May 2026 10:30:34 +0000 by webmaster1
Working with XML in C#: A Practical Guide
XML (eXtensible Markup Language) serves as a versatile data storage and interchange format where developers define their own tags according to specific needs. Unlike HTML with its predefined tags, XML provides complete flexibility in structuring data. While Document Type Definitions (DTD) can enforce validation rules on XML documents, they are ...
Posted on Thu, 14 May 2026 08:41:54 +0000 by robche