Security Testing: Comprehensive Guide to Offensive and Defensive Penetration Testing Information Gathering Methods and Tools

Significance of Information Gathering

Information gathering is crucial in the early stages of penetration testing. As the saying goes, "Know yourself and know your enemy, and you will never be defeated." Information gathering ensures the success of penetration testing. Only by obtaining enough information about the target website or host can we conduct penetration testing more effectively.

Information gathering can be divided into two types: active and passive.

Active Information Gathering This involves directly accessing the website, performing operations on it, or scanning it. This method generates network traffic that passes through the target server.
Passive Information Gathering This relies on public channels, such as search engines, to obtain information without directly interacting with the target system, leaving minimal traces.

Both methods have their advantages. Active information gathering can obtain more details but leaves more obvious traces, making it easier to trace back to the source. Passive information gathering, since it does not involve targeted scanning of the website, typically yields less information, but the actions are not detected by the target host. Therefore, we must flexibly use different collection methods to ensure the completeness of information gathering.

Website Information Gathering

Operating System

Servers commonly use two main operating systems: Windows and Linux. However, Linux systems are more prevalent in enterprise servers. There are three common methods for identification:

Ping command: The default TTL value for Windows is generally 128, while for Linux it is 64. A TTL greater than 100 usually indicates Windows, while a value around several dozen indicates Linux. However, this method is not 100% accurate, as some Windows servers also have TTL values in the dozens, and some servers disable ping, making this method unusable.
Nmap scanning: Use the -O or -A parameters to scan the operating system. The advantage is that it can identify the specific OS version; the disadvantage is that the scan leaves obvious traces and is easily detected.
Case sensitivity: Windows is case-insensitive, while Linux is case-sensitive. This can also be used to determine the operating system. When accessing a website, try changing part of the path to uppercase and see if the request succeeds.

Web Service/Container Type

Common web servers include Apache, Nginx, Tomcat, and IIS. After identifying the web server type, we also need to detect the specific version. For example, Nginx versions < 0.83 have parsing vulnerabilities, IIS 6.0 has filename parsing vulnerabilities, and IIS 7.0 has malformed parsing vulnerabilities. Different web server versions have different vulnerabilities.

F12: Check the Server field in the response headers.
whatweb: https://www.whatweb.net/
wappalyzer: https://www.wappalyzer.com/

Script Type

Common script types for websites include PHP, JSP, ASP, ASPX, and Python.

Identify by the website URL.
Use Google search: site:xxx filetype:php
Use the Wappalyzer plugin.

Database Type

Common database types include:

MySQL: A relational database management system developed by MySQL AB (now owned by Oracle). It is one of the most popular RDBMSs and is best suited for web applications, often used with PHP pages. Default port: 3306.
SQL Server: A relational database management system developed by Microsoft. It is a relatively large database. Default port: 1433. Database file extension: .mdf.
Access: Microsoft Office Access, a relational database management system. A small database whose performance degrades when it reaches around 100 MB. Database file extension: .mdb. It is commonly used with ASP web pages.
Oracle: Also known as Oracle RDBMS, a relational database management system from Oracle Corporation. It is often used for larger websites. Default port: 1521.

Common Combinations:

ASP and ASPX: ACCESS, SQL Server + Windows

PHP: MySQL + Windows/Linux, PostgreSQL + Linux

JSP: Oracle, MySQL + Windows/Linux

CMS Identification

Common CMS: DedeCMS (ZhiMeng), Discuz, PHPcms, etc.

Online Identification Tools:
- http://whatweb.bugscaner.com/look/
- https://www.yunsee.cn/
Onlinetools:
- https://github.com/iceyhexman/onlinetools
- https://pentest.gdpcisa.org/
Ehole: https://github.com/EdgeSecurityTeam/EHole
Glass: https://github.com/s7ckTeam/Glass

Sensitive Directories, Backend

Common Directory Types

Admin backend: Weak passwords, universal passwords, brute force attacks.
Backup files: Obtain database information or even website source code.
Upload directory: Truncation attacks, uploading image webshells, etc.
MySQL management interface: Weak passwords, brute force, universal passwords to obtain database information.
Installation page: Could be reinstalled to bypass restrictions.
phpinfo: Exposes various configuration details.
Editors: FCK, KE, etc.

Common Tools

dirsearch: https://github.com/maurosoria/dirsearch
Yujian (Sword) Backend Scanner: https://03w43c.lanzous.com/ivPY5ohubda Password: gw3m
7kbscan: https://github.com/7kbstorm/7kbscan-WebPathBrute
dirmap: https://github.com/H4ckForJob/dirmap
GitHack (.git exposure): https://github.com/lijiejie/GitHack
svnExploit (.svn exposure): https://github.com/admintony/svnExploit
JSFinder (JS files): https://github.com/Threezh1/JSFinder

WAF Identification

Web Application Firewall (WAF) protects web applications by executing a series of security policies for HTTP/HTTPS.

Functions:

Prevents common network attacks such as SQL injection, XSS, CSRF, and web shells.
Prevents automated attacks like brute force, credential stuffing, bulk registration, and auto-posting.
Blocks other threats like crawlers, zero-day attacks, code analysis, sniffing, data tampering, unauthorized access, sensitive information leakage, application-layer DDoS, remote file inclusion, hotlinking, privilege escalation, and scanning.

Identification Methods:

wafw00f: https://github.com/EnableSecurity/wafw00f
whatwaf: https://github.com/Ekultek/WhatWaf

nmap WAF detection:

nmap -p 80,443 --script http-waf-detect <target_ip>
nmap -p 80,443 --script http-waf-fingerprint <target_ip>

Visual identification: Summary of common WAF interception pages. https://mp.weixin.qq.com/s/PWkqNsygi-c_S7tW1y_Hxw

Domain Name Information Gathering

Domain Name Introduction

A domain name is a string of names separated by dots used to identify one or more IP addresses on the Internet. It serves as a human-readable locator for computers or services, mapping to IP addresses via the Domain Name System (DNS).

Domain Name Classification

Top-Level Domain (TLD):
- Government: .gov
- Commercial: .com
- Education: .edu
Second-Level Domain (SLD): e.g., baidu.com
Third-Level Domain: e.g., www.baidu.com

Whois

Whois is a query/response protocol used to retrieve information about domain names, IP addresses, and their owners. It provides a database containing details like the domain registrant, registrar, registration, and expiration dates.

Whois protocol: Establish a TCP connection to port 43, send the query keyword followed by carriage return and line feed, then receive the server's response.

Whois Query Methods:

Web Interface:
- Alibaba Cloud Whois: https://whois.aliyun.com/
- Global Whois: https://www.whois365.com/cn/
- ThreatBook: https://x.threatbook.cn/
- Chinaz Whois: http://whois.chinaz.com/
Command Line:
```
whois baidu.com
```

ICP Filing Query

ICP (Internet Content Provider) filing is required for websites in China. Query methods:

Official site: https://beian.miit.gov.cn/#/Integrated/recordQuery
Chinaz: http://icp.chinaz.com/

Reverse Whois (Registrant/Email Reverse Lookup)

First obtain the registrant and email via Whois, then use them to find other domains.

Disadvantage: Many companies register domains through DNS resolution providers, so the retrieved information may represent the provider instead of the actual company.

Subdomain Enumeration

A subdomain is a domain that is part of a larger domain. For example, mail.baidu.com and bbs.baidu.com are subdomains of baidu.com.

Search Engines (Google Hacking):
```
site:hetianlab.com
```
Third-party Web Services:
- https://dnsdumpster.com/
- http://tool.chinaz.com/subdomain/
Cyberspace Search Engines:
- Fofa: https://fofa.so/
```
domain="baidu.com"
```
- ZoomEye: https://www.zoomeye.org/
```
site:"baidu.com"
```
- Shodan: https://www.shodan.io/
```
hostname:baidu.com
```
SSL Certificate Search:
- https://crt.sh/
- https://developers.facebook.com/tools/ct/search/
Automated Tools:
- Subdomain Scanner (Ziyuming Wajueji): https://03w43c.lanzous.com/i8pn2ohw5ha Password: hmuv
- SubDomainsBrute: https://github.com/lijiejie/subDomainsBrute
- OneForAll: https://github.com/shmilylty/OneForAll
- JSFinder: https://github.com/Threezh1/JSFinder
- wydomain: https://github.com/ring04h/wydomain

IP Information Gathering

CDN Detection

A Content Delivery Network (CDN) distributes content across multiple geographically distributed servers to improve access speed and reliability.

Method 1: Use multi-location ping services to check if the target resolves to multiple IP addresses. If it does, it likely uses a CDN.

Method 2: Use nslookup. If the domain resolves to multiple IP addresses, CDN is likely in use.

CDN Bypass

Access from abroad: Many CDN configurations only cover domestic traffic.
- http://asm.ca.com/en/ping.php
Query subdomain IPs: Subdomains may not be behind the CDN and might share the same IP or C-class network as the main site.
- https://ip.tool.chinaz.com/ipbatch
phpinfo file: Look for the SERVER_ADDR.
MX records: Mail servers often have the real IP.

Cyberspace Search Engines:

Fofa: Query by certificate serial number (convert from hex to decimal).
```
cert="17144636119767802547749573191550762477"
```

Censys: https://censys.io/

443.https.tls.certificate.parsed.extensions.subject_alt_name.dns_names:hetianlab.com

Historical DNS records:

For more details, refer to: https://www.cnblogs.com/tomyyyyy/p/13699134.html

Reverse IP Lookup (IP to Domain)

If the target is a virtual host, this is valuable because multiple virtual hosts may share one IP. Compromising one host can lead to compromising others ("side-jacking" or "lateral movement").

C-Class Network Host Discovery

Nmap:

nmap -sP www.example.com/24
nmap -sP 192.168.1.*

Cwebscanner: https://github.com/se55i0n/Cwebscanner
Cyberspace search engines: Search for IPs in the same C-class subnet.

Port Scanning

Ports are used for network communication via TCP or UDP protocols.

Protocol Ports

TCP port: Connection-oriented, reliable.
UDP port: Connectionless, unreliable.
TCP and UDP ports are independent of each other.

Port Types

Well-known ports: 0–1023, e.g., port 80 for HTTP.
Dynamic ports: 49152–65535, not fixed to specific services.
Registered ports: 1024–49151, assigned to user processes.

Common Penetration Testing Ports

FTP - 21

File Transfer Protocol. Uses TCP ports 20 (data) and 21 (control).

Brute force: Hydra, MSF modules.
Anonymous access: username anonymous, password empty/any email.
vsftpd backdoor: versions 2.0.0–2.3.4.
Packet sniffing (requires local network access).
Remote code execution, FTP bounce attack.

SSH - 22

Secure Shell for secure remote login.

Brute force: Hydra, MSF modules.
SSH backdoors.
User enumeration (CVE-2018-15473).

HTTP - 80

Web server/middleware vulnerabilities (IIS, Apache, Nginx).
Common web application vulnerabilities.

NetBIOS/SMB - 139/445

139: NetBIOS session service (file and printer sharing).
445: SMB (Server Message Block).
MS17-010 (EternalBlue) exploit.
Other SMB vulnerabilities (MS06-040, MS08-067).
IPC$ connection for further exploitation.

MySQL - 3306

Brute force attacks.
Malicious UDF (User Defined Function) for command execution.
SQL injection, LOAD_FILE() function.

RDP - 3389

Remote Desktop Protocol.

Brute force attack.
MS12-020 (Blue Screen of Death).
CVE-2019-0708 (BlueKeep).
Enabling RDP via registry or MSF.

Redis - 6379

In-memory data structure store.

Brute force.
Unauthorized access with SSH key privilege escalation.
Replication-based remote code execution (CVE-2022-0543).

WebLogic - 7001

Java EE middleware.

Weak passwords (e.g., weblogic/Oracle@123).
Deploy WAR backdoor via admin console.
SSRF vulnerability.
Deserialization vulnerabilities.

Tomcat - 8080

Open-source web server.

CVE-2019-0232 (RCE).
CVE-2017-12615 (arbitrary file upload).
Weak password for manager console to get shell.

Historical Vulnerability Information

WooYun Mirror: http://wy.zone.ci/
WooYun Knowledge Base: https://wooyun.kieran.top/#!/
Exploit-DB: https://www.exploit-db.com/
KnownSec Seebug: https://www.seebug.org/
Summary of Critical Systems Vulnerabilities: https://www.cnblogs.com/tomyyyyy/p/14701925.html

Tags: Penetration Testing Information Gathering Security Testing Vulnerability Assessment Ethical Hacking

Posted on Mon, 18 May 2026 06:11:55 +0000 by V34

Freaks City