Web Scraping with XPath: Extracting News Headlines from 36Kr

Having previously explored the powerful BeautifulSoup library for HTML parsing and techniques for capturing HTTP requests through on line tools, we now turn our attention to another fundamental web scraping approach: XPath. XPath serves as a query language designed to navigate and select specific portions of XML documents. While originally deve ...

Posted on Fri, 12 Jun 2026 18:32:30 +0000 by 9mm

Applying XPath Expressions with Python's lxml Library

Installation Install the library using pip: pip install lxml XPath Core Concepts Node Types XPath defines seven node types: element, attribute, text, namespace, processing instruction, comment, and the document (root) node. An XML document is represented as a node tree, with the root of the tree being the document or root node. Consider this s ...

Posted on Wed, 20 May 2026 18:13:14 +0000 by alego

:Practical XPath Parsing with Python's lxml Library

The lxml library serves as a powerful Pythonic wrapper around C libraries like libxml2 and libxslt, delivering exceptional perofrmance for parsing HTML and XML documents. Its comprehensive support for XPath 1.0 makes it an ideal choice for targeted data extraction tasks. Setup Install the package via pip: pip install lxml Initializing Parser O ...

Posted on Fri, 15 May 2026 15:42:04 +0000 by zackcez