Security Testing: Comprehensive Guide to Offensive and Defensive Penetration Testing Information Gathering Methods and Tools

Significance of Information Gathering

Information gathering is crucial in the early stages of penetration testing. As the saying goes, "Know yourself and know your enemy, and you will never be defeated." Information gathering ensures the success of penetration testing. Only by obtaining enough information about the target website or host can we conduct penetration testing more effectively.

Information gathering can be divided into two types: active and passive.

  • Active Information Gathering This involves directly accessing the website, performing operations on it, or scanning it. This method generates network traffic that passes through the target server.

  • Passive Information Gathering This relies on public channels, such as search engines, to obtain information without directly interacting with the target system, leaving minimal traces.

Both methods have their advantages. Active information gathering can obtain more details but leaves more obvious traces, making it easier to trace back to the source. Passive information gathering, since it does not involve targeted scanning of the website, typically yields less information, but the actions are not detected by the target host. Therefore, we must flexibly use different collection methods to ensure the completeness of information gathering.

Website Information Gathering

Operating System

Servers commonly use two main operating systems: Windows and Linux. However, Linux systems are more prevalent in enterprise servers. There are three common methods for identification:

  • Ping command: The default TTL value for Windows is generally 128, while for Linux it is 64. A TTL greater than 100 usually indicates Windows, while a value around several dozen indicates Linux. However, this method is not 100% accurate, as some Windows servers also have TTL values in the dozens, and some servers disable ping, making this method unusable.
  • Nmap scanning: Use the -O or -A parameters to scan the operating system. The advantage is that it can identify the specific OS version; the disadvantage is that the scan leaves obvious traces and is easily detected.
  • Case sensitivity: Windows is case-insensitive, while Linux is case-sensitive. This can also be used to determine the operating system. When accessing a website, try changing part of the path to uppercase and see if the request succeeds.

Web Service/Container Type

Common web servers include Apache, Nginx, Tomcat, and IIS. After identifying the web server type, we also need to detect the specific version. For example, Nginx versions < 0.83 have parsing vulnerabilities, IIS 6.0 has filename parsing vulnerabilities, and IIS 7.0 has malformed parsing vulnerabilities. Different web server versions have different vulnerabilities.

Script Type

Common script types for websites include PHP, JSP, ASP, ASPX, and Python.

  • Identify by the website URL.
  • Use Google search: site:xxx filetype:php
  • Use the Wappalyzer plugin.

Database Type

Common database types include:

  • MySQL: A relational database management system developed by MySQL AB (now owned by Oracle). It is one of the most popular RDBMSs and is best suited for web applications, often used with PHP pages. Default port: 3306.
  • SQL Server: A relational database management system developed by Microsoft. It is a relatively large database. Default port: 1433. Database file extension: .mdf.
  • Access: Microsoft Office Access, a relational database management system. A small database whose performance degrades when it reaches around 100 MB. Database file extension: .mdb. It is commonly used with ASP web pages.
  • Oracle: Also known as Oracle RDBMS, a relational database management system from Oracle Corporation. It is often used for larger websites. Default port: 1521.

Common Combinations:

  • ASP and ASPX: ACCESS, SQL Server + Windows
  • PHP: MySQL + Windows/Linux, PostgreSQL + Linux
  • JSP: Oracle, MySQL + Windows/Linux

CMS Identification

Common CMS: DedeCMS (ZhiMeng), Discuz, PHPcms, etc.

Sensitive Directories, Backend

Common Directory Types

  • Admin backend: Weak passwords, universal passwords, brute force attacks.
  • Backup files: Obtain database information or even website source code.
  • Upload directory: Truncation attacks, uploading image webshells, etc.
  • MySQL management interface: Weak passwords, brute force, universal passwords to obtain database information.
  • Installation page: Could be reinstalled to bypass restrictions.
  • phpinfo: Exposes various configuration details.
  • Editors: FCK, KE, etc.

Common Tools

WAF Identification

Web Application Firewall (WAF) protects web applications by executing a series of security policies for HTTP/HTTPS.

Functions:

  • Prevents common network attacks such as SQL injection, XSS, CSRF, and web shells.
  • Prevents automated attacks like brute force, credential stuffing, bulk registration, and auto-posting.
  • Blocks other threats like crawlers, zero-day attacks, code analysis, sniffing, data tampering, unauthorized access, sensitive information leakage, application-layer DDoS, remote file inclusion, hotlinking, privilege escalation, and scanning.

Identification Methods:

Domain Name Information Gathering

Domain Name Introduction

A domain name is a string of names separated by dots used to identify one or more IP addresses on the Internet. It serves as a human-readable locator for computers or services, mapping to IP addresses via the Domain Name System (DNS).

Domain Name Classification

  • Top-Level Domain (TLD):
    • Government: .gov
    • Commercial: .com
    • Education: .edu
  • Second-Level Domain (SLD): e.g., baidu.com
  • Third-Level Domain: e.g., www.baidu.com

Whois

Whois is a query/response protocol used to retrieve information about domain names, IP addresses, and their owners. It provides a database containing details like the domain registrant, registrar, registration, and expiration dates.

Whois protocol: Establish a TCP connection to port 43, send the query keyword followed by carriage return and line feed, then receive the server's response.

Whois Query Methods:

ICP Filing Query

ICP (Internet Content Provider) filing is required for websites in China. Query methods:

Reverse Whois (Registrant/Email Reverse Lookup)

First obtain the registrant and email via Whois, then use them to find other domains.

Disadvantage: Many companies register domains through DNS resolution providers, so the retrieved information may represent the provider instead of the actual company.

Subdomain Enumeration

A subdomain is a domain that is part of a larger domain. For example, mail.baidu.com and bbs.baidu.com are subdomains of baidu.com.

  1. Search Engines (Google Hacking):
    site:hetianlab.com
    
  2. Third-party Web Services:
  3. Cyberspace Search Engines:
  4. SSL Certificate Search:
  5. Automated Tools:

IP Information Gathering

CDN Detection

A Content Delivery Network (CDN) distributes content across multiple geographically distributed servers to improve access speed and reliability.

Method 1: Use multi-location ping services to check if the target resolves to multiple IP addresses. If it does, it likely uses a CDN.

Method 2: Use nslookup. If the domain resolves to multiple IP addresses, CDN is likely in use.

CDN Bypass

For more details, refer to: https://www.cnblogs.com/tomyyyyy/p/13699134.html

Reverse IP Lookup (IP to Domain)

If the target is a virtual host, this is valuable because multiple virtual hosts may share one IP. Compromising one host can lead to compromising others ("side-jacking" or "lateral movement").

C-Class Network Host Discovery

Port Scanning

Ports are used for network communication via TCP or UDP protocols.

Protocol Ports

  • TCP port: Connection-oriented, reliable.
  • UDP port: Connectionless, unreliable.
  • TCP and UDP ports are independent of each other.

Port Types

  • Well-known ports: 0–1023, e.g., port 80 for HTTP.
  • Dynamic ports: 49152–65535, not fixed to specific services.
  • Registered ports: 1024–49151, assigned to user processes.

Common Penetration Testing Ports

FTP - 21

File Transfer Protocol. Uses TCP ports 20 (data) and 21 (control).

  • Brute force: Hydra, MSF modules.
  • Anonymous access: username anonymous, password empty/any email.
  • vsftpd backdoor: versions 2.0.0–2.3.4.
  • Packet sniffing (requires local network access).
  • Remote code execution, FTP bounce attack.

SSH - 22

Secure Shell for secure remote login.

  • Brute force: Hydra, MSF modules.
  • SSH backdoors.
  • User enumeration (CVE-2018-15473).

HTTP - 80

  • Web server/middleware vulnerabilities (IIS, Apache, Nginx).
  • Common web application vulnerabilities.

NetBIOS/SMB - 139/445

  • 139: NetBIOS session service (file and printer sharing).
  • 445: SMB (Server Message Block).
  • MS17-010 (EternalBlue) exploit.
  • Other SMB vulnerabilities (MS06-040, MS08-067).
  • IPC$ connection for further exploitation.

MySQL - 3306

  • Brute force attacks.
  • Malicious UDF (User Defined Function) for command execution.
  • SQL injection, LOAD_FILE() function.

RDP - 3389

Remote Desktop Protocol.

  • Brute force attack.
  • MS12-020 (Blue Screen of Death).
  • CVE-2019-0708 (BlueKeep).
  • Enabling RDP via registry or MSF.

Redis - 6379

In-memory data structure store.

  • Brute force.
  • Unauthorized access with SSH key privilege escalation.
  • Replication-based remote code execution (CVE-2022-0543).

WebLogic - 7001

Java EE middleware.

  • Weak passwords (e.g., weblogic/Oracle@123).
  • Deploy WAR backdoor via admin console.
  • SSRF vulnerability.
  • Deserialization vulnerabilities.

Tomcat - 8080

Open-source web server.

  • CVE-2019-0232 (RCE).
  • CVE-2017-12615 (arbitrary file upload).
  • Weak password for manager console to get shell.

Historical Vulnerability Information

Tags: Penetration Testing Information Gathering Security Testing Vulnerability Assessment Ethical Hacking

Posted on Mon, 18 May 2026 06:11:55 +0000 by V34