Quick Python Web Scraping Guide: Choose Your Meal in Minutes

This process isn't technically complex—it's more about patience and attention to detail. That’s why many people choose web scraping as a side job. Though it’s time-consuming, the technical barrier is relatively low. After this lesson, you won't think web scraping is hard anymore. You may later encounter challenges like session management or bypassing CAPTCHAs, since harder-to-scrape sites usually don’t want to be crawled. These advanced topics are worth exploring deeper in futture studies.

Today, we'll tackle a real-life issue—deciding what to eat—with a recipe-based approach.

How Web Scrapers Work

Web scraping mimics a user browsing a site. It starts by visiting the official page, checking for clickable links, and then proceeds to view content. When desired text or images are found, they’re downloaded or extracted. This basic architecture is shown in the diagram below, hopefully helping you grasp the concept.

Parsing HTML Content

In web scraping, the first step is typically sending an HTTP request to retrieve data. Often, we expect JSON responses for processing, but for HTML pages, we need to parse the raw HTML. Python offers various libraries for this, and here's one example:

from urllib.request import urlopen, Request
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0'
}
req = Request("https://www.meishij.net/?from=space_block", headers=headers)
html = urlopen(req)
html_text = bytes.decode(html.read())
print(html_text)

Usually, this returns the full HTML source of the recipe page, similar to pressing F12 in your browser.

Extracting Elements

The brute-force method involves string parsing, but Python has better tools. BeautifulSoup simplifies HTML parsing. Other methods exist, but you can find what you need through research without memorizing everything.

Popular Recipes

Let's extract popular recipes from the homepage:

from urllib.request import urlopen, Request
from bs4 import BeautifulSoup as bf

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0'
}
req = Request("https://www.meishij.net/?from=space_block", headers=headers)
html = urlopen(req)
html_text = bytes.decode(html.read())
obj = bf(html_text, 'html.parser')

index_hotlist = obj.find_all('a', class_='sancan_item')
for ul in index_hotlist:
    for li in ul.find_all('strong', class_='title'):
        print(li.get_text())

We locate elements visually, use BeautifulSoup to identify them, and extract the desired conntent. Here, we pull out titles from list items.

Random Recipe Picker

Choosing what to cook can be tricky, so let's scrape all recipes into a list and randomly select one:

from urllib.request import urlopen, Request
from bs4 import BeautifulSoup as bf

for i in range(3):
    url = f"https://www.meishij.net/chufang/diy/jiangchangcaipu/?&page={i}"
    html = urlopen(url)
    html_text = bytes.decode(html.read())
    obj = bf(html_text, 'html.parser')
    index_hotlist = obj.find_all('img')
    for p in index_hotlist:
        if p.get('alt'):
            print(p.get('alt'))

We fetch the first three pages of recipes and print their names.

Recipe Instructions

After selecting a dish, we can fetch its instructions:

from urllib.request import urlopen, Request
import urllib, string
from bs4 import BeautifulSoup as bf

url = f"https://so.meishij.net/index.php?q=红烧排骨"
url = urllib.parse.quote(url, safe=string.printable)
html = urlopen(url)
html_text = bytes.decode(html.read())
obj = bf(html_text, 'html.parser')
index_hotlist = obj.find_all('a', class_='img')
url = index_hotlist[0].get('href')
html = urlopen(url)
html_text = bytes.decode(html.read())
obj = bf(html_text, 'html.parser')
index_hotlist = obj.find_all('div', class_='step_content')
for div in index_hotlist:
    for p in div.find_all('p'):
        print(p.get_text())

Wrapping Up

To make things easier, we encapsulate the above steps into a simple console app:

# Import required modules
from urllib.request import urlopen, Request
import urllib, string
from bs4 import BeautifulSoup as bf
from random import choice, sample
from colorama import init
from os import system
from termcolor import colored
from readchar import readkey

FGS = ['green', 'yellow', 'blue', 'cyan', 'magenta', 'red']

print(colored('Searching recipes...', choice(FGS)))
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36 Edg/122.0.0.0'
}
req = Request("https://www.meishij.net/?from=space_block", headers=headers)
html = urlopen(req)
html_text = bytes.decode(html.read())
hot_list = []
all_food = []
food_page = 3

def clear():
    system("CLS")

def draw_menu(menu_list):
    clear()
    for idx, i in enumerate(menu_list):
        print(colored(f'{idx}:{i}', choice(FGS)))

def hot_list_func():
    global html_text
    obj = bf(html_text, 'html.parser')
    index_hotlist = obj.find_all('a', class_='sancan_item')
    for ul in index_hotlist:
        for li in ul.find_all('strong', class_='title'):
            hot_list.append(li.get_text())

def search_food_detail(food):
    print('Searching for tutorial, please wait...')
    url = f"https://so.meishij.net/index.php?q={food}"
    url = urllib.parse.quote(url, safe=string.printable)
    html = urlopen(url)
    html_text = bytes.decode(html.read())
    obj = bf(html_text, 'html.parser')
    index_hotlist = obj.find_all('a', class_='img')
    url = index_hotlist[0].get('href')
    html = urlopen(url)
    html_text = bytes.decode(html.read())
    obj = bf(html_text, 'html.parser')
    random_color = choice(FGS)
    print(colored(f"{food} instructions:", random_color))
    index_hotlist = obj.find_all('div', class_='step_content')
    for div in index_hotlist:
        for p in div.find_all('p'):
            print(colored(p.get_text(), random_color))

def get_random_food():
    global food_page
    if not all_food:
        for i in range(food_page):
            url = f"https://www.meishij.net/chufang/diy/jiangchangcaipu/?&page={i}"
            html = urlopen(url)
            html_text = bytes.decode(html.read())
            obj = bf(html_text, 'html.parser')
            index_hotlist = obj.find_all('img')
            for p in index_hotlist:
                if p.get('alt'):
                    all_food.append(p.get('alt'))
    my_food = choice(all_food)
    print(colored(f'Today’s meal: {my_food}', choice(FGS)))
    return my_food

init()
hot_list_func()
print(colored('Search complete!', choice(FGS)))

my_array = list(range(0, 9))
my_key = ['q', 'c', 'd', 'm']
my_key.extend(my_array)

while True:
    while True:
        move = readkey()
        if move in my_key:
            break
    if move == 'q':
        break
    if move == 'c':
        clear()
    if move == 'm':
        random_food = sample(hot_list, 8)
        draw_menu(random_food)
    if move.isdigit() and int(move) <= len(random_food):
        if int(move) == 8:
            my_food = get_random_food()
        else:
            my_food = random_food[int(move)]
        print(my_food)
    if move == 'd' and my_food:
        search_food_detail(my_food)
        my_food = ''

Creating a simple crawler takes less than five minutes once the fundamentals are understood. It’s a meticulous task that requires searching for solutions online. If one method works, stick with it; otherwise, try another.

Conclusion

This guide aims to help beginners get started with web scraping. While the basics are straightforward, real-world challenges like login requirements or CAPTCHA can arise. It's best to start with simpler sites like news portals before moving to more complex ones. Once you understand the fundamentals, you can add components or use third-party libraries to overcome obstacles.

Tags: web scraping python beautifulsoup urllib Requests

Posted on Mon, 11 May 2026 06:37:04 +0000 by sahel