Significance of Information Gathering
Information gathering is crucial in the early stages of penetration testing. As the saying goes, "Know yourself and know your enemy, and you will never be defeated." Information gathering ensures the success of penetration testing. Only by obtaining enough information about the target website or host can we conduct penetration testing more effectively.
Information gathering can be divided into two types: active and passive.
-
Active Information Gathering This involves directly accessing the website, performing operations on it, or scanning it. This method generates network traffic that passes through the target server.
-
Passive Information Gathering This relies on public channels, such as search engines, to obtain information without directly interacting with the target system, leaving minimal traces.
Both methods have their advantages. Active information gathering can obtain more details but leaves more obvious traces, making it easier to trace back to the source. Passive information gathering, since it does not involve targeted scanning of the website, typically yields less information, but the actions are not detected by the target host. Therefore, we must flexibly use different collection methods to ensure the completeness of information gathering.
Website Information Gathering
Operating System
Servers commonly use two main operating systems: Windows and Linux. However, Linux systems are more prevalent in enterprise servers. There are three common methods for identification:
- Ping command: The default TTL value for Windows is generally 128, while for Linux it is 64. A TTL greater than 100 usually indicates Windows, while a value around several dozen indicates Linux. However, this method is not 100% accurate, as some Windows servers also have TTL values in the dozens, and some servers disable ping, making this method unusable.
- Nmap scanning: Use the
-Oor-Aparameters to scan the operating system. The advantage is that it can identify the specific OS version; the disadvantage is that the scan leaves obvious traces and is easily detected. - Case sensitivity: Windows is case-insensitive, while Linux is case-sensitive. This can also be used to determine the operating system. When accessing a website, try changing part of the path to uppercase and see if the request succeeds.
Web Service/Container Type
Common web servers include Apache, Nginx, Tomcat, and IIS. After identifying the web server type, we also need to detect the specific version. For example, Nginx versions < 0.83 have parsing vulnerabilities, IIS 6.0 has filename parsing vulnerabilities, and IIS 7.0 has malformed parsing vulnerabilities. Different web server versions have different vulnerabilities.
- F12: Check the Server field in the response headers.
- whatweb: https://www.whatweb.net/
- wappalyzer: https://www.wappalyzer.com/
Script Type
Common script types for websites include PHP, JSP, ASP, ASPX, and Python.
- Identify by the website URL.
- Use Google search:
site:xxx filetype:php - Use the Wappalyzer plugin.
Database Type
Common database types include:
- MySQL: A relational database management system developed by MySQL AB (now owned by Oracle). It is one of the most popular RDBMSs and is best suited for web applications, often used with PHP pages. Default port: 3306.
- SQL Server: A relational database management system developed by Microsoft. It is a relatively large database. Default port: 1433. Database file extension: .mdf.
- Access: Microsoft Office Access, a relational database management system. A small database whose performance degrades when it reaches around 100 MB. Database file extension: .mdb. It is commonly used with ASP web pages.
- Oracle: Also known as Oracle RDBMS, a relational database management system from Oracle Corporation. It is often used for larger websites. Default port: 1521.
Common Combinations:
- ASP and ASPX: ACCESS, SQL Server + Windows
- PHP: MySQL + Windows/Linux, PostgreSQL + Linux
- JSP: Oracle, MySQL + Windows/Linux
CMS Identification
Common CMS: DedeCMS (ZhiMeng), Discuz, PHPcms, etc.
- Online Identification Tools:
- Onlinetools:
- Ehole: https://github.com/EdgeSecurityTeam/EHole
- Glass: https://github.com/s7ckTeam/Glass
Sensitive Directories, Backend
Common Directory Types
- Admin backend: Weak passwords, universal passwords, brute force attacks.
- Backup files: Obtain database information or even website source code.
- Upload directory: Truncation attacks, uploading image webshells, etc.
- MySQL management interface: Weak passwords, brute force, universal passwords to obtain database information.
- Installation page: Could be reinstalled to bypass restrictions.
- phpinfo: Exposes various configuration details.
- Editors: FCK, KE, etc.
Common Tools
- dirsearch: https://github.com/maurosoria/dirsearch
- Yujian (Sword) Backend Scanner: https://03w43c.lanzous.com/ivPY5ohubda Password: gw3m
- 7kbscan: https://github.com/7kbstorm/7kbscan-WebPathBrute
- dirmap: https://github.com/H4ckForJob/dirmap
- GitHack (.git exposure): https://github.com/lijiejie/GitHack
- svnExploit (.svn exposure): https://github.com/admintony/svnExploit
- JSFinder (JS files): https://github.com/Threezh1/JSFinder
WAF Identification
Web Application Firewall (WAF) protects web applications by executing a series of security policies for HTTP/HTTPS.
Functions:
- Prevents common network attacks such as SQL injection, XSS, CSRF, and web shells.
- Prevents automated attacks like brute force, credential stuffing, bulk registration, and auto-posting.
- Blocks other threats like crawlers, zero-day attacks, code analysis, sniffing, data tampering, unauthorized access, sensitive information leakage, application-layer DDoS, remote file inclusion, hotlinking, privilege escalation, and scanning.
Identification Methods:
- wafw00f: https://github.com/EnableSecurity/wafw00f
- whatwaf: https://github.com/Ekultek/WhatWaf
- nmap WAF detection:
nmap -p 80,443 --script http-waf-detect <target_ip> nmap -p 80,443 --script http-waf-fingerprint <target_ip> - Visual identification: Summary of common WAF interception pages. https://mp.weixin.qq.com/s/PWkqNsygi-c_S7tW1y_Hxw
Domain Name Information Gathering
Domain Name Introduction
A domain name is a string of names separated by dots used to identify one or more IP addresses on the Internet. It serves as a human-readable locator for computers or services, mapping to IP addresses via the Domain Name System (DNS).
Domain Name Classification
- Top-Level Domain (TLD):
- Government:
.gov - Commercial:
.com - Education:
.edu
- Government:
- Second-Level Domain (SLD): e.g.,
baidu.com - Third-Level Domain: e.g.,
www.baidu.com
Whois
Whois is a query/response protocol used to retrieve information about domain names, IP addresses, and their owners. It provides a database containing details like the domain registrant, registrar, registration, and expiration dates.
Whois protocol: Establish a TCP connection to port 43, send the query keyword followed by carriage return and line feed, then receive the server's response.
Whois Query Methods:
- Web Interface:
- Alibaba Cloud Whois: https://whois.aliyun.com/
- Global Whois: https://www.whois365.com/cn/
- ThreatBook: https://x.threatbook.cn/
- Chinaz Whois: http://whois.chinaz.com/
- Command Line:
whois baidu.com
ICP Filing Query
ICP (Internet Content Provider) filing is required for websites in China. Query methods:
- Official site: https://beian.miit.gov.cn/#/Integrated/recordQuery
- Chinaz: http://icp.chinaz.com/
Reverse Whois (Registrant/Email Reverse Lookup)
First obtain the registrant and email via Whois, then use them to find other domains.
Disadvantage: Many companies register domains through DNS resolution providers, so the retrieved information may represent the provider instead of the actual company.
- https://whois.chinaz.com/reverse?ddlSearchMode=1
- http://whois.4.cn/reverse
- https://whois.aizhan.com/
Subdomain Enumeration
A subdomain is a domain that is part of a larger domain. For example, mail.baidu.com and bbs.baidu.com are subdomains of baidu.com.
- Search Engines (Google Hacking):
site:hetianlab.com - Third-party Web Services:
- Cyberspace Search Engines:
- Fofa: https://fofa.so/
domain="baidu.com" - ZoomEye: https://www.zoomeye.org/
site:"baidu.com" - Shodan: https://www.shodan.io/
hostname:baidu.com
- Fofa: https://fofa.so/
- SSL Certificate Search:
- Automated Tools:
- Subdomain Scanner (Ziyuming Wajueji): https://03w43c.lanzous.com/i8pn2ohw5ha Password: hmuv
- SubDomainsBrute: https://github.com/lijiejie/subDomainsBrute
- OneForAll: https://github.com/shmilylty/OneForAll
- JSFinder: https://github.com/Threezh1/JSFinder
- wydomain: https://github.com/ring04h/wydomain
IP Information Gathering
CDN Detection
A Content Delivery Network (CDN) distributes content across multiple geographically distributed servers to improve access speed and reliability.
Method 1: Use multi-location ping services to check if the target resolves to multiple IP addresses. If it does, it likely uses a CDN.
- http://ping.chinaz.com/
- http://ping.aizhan.com/
- http://ce.cloud.360.cn/
- http://www.webkaka.com/Ping.aspx
Method 2: Use nslookup. If the domain resolves to multiple IP addresses, CDN is likely in use.
CDN Bypass
- Access from abroad: Many CDN configurations only cover domestic traffic.
- Query subdomain IPs: Subdomains may not be behind the CDN and might share the same IP or C-class network as the main site.
- phpinfo file: Look for the
SERVER_ADDR. - MX records: Mail servers often have the real IP.
- Cyberspace Search Engines:
- Fofa: Query by certificate serial number (convert from hex to decimal).
cert="17144636119767802547749573191550762477" - Censys: https://censys.io/
443.https.tls.certificate.parsed.extensions.subject_alt_name.dns_names:hetianlab.com
- Fofa: Query by certificate serial number (convert from hex to decimal).
- Historical DNS records:
For more details, refer to: https://www.cnblogs.com/tomyyyyy/p/13699134.html
Reverse IP Lookup (IP to Domain)
If the target is a virtual host, this is valuable because multiple virtual hosts may share one IP. Compromising one host can lead to compromising others ("side-jacking" or "lateral movement").
C-Class Network Host Discovery
- Nmap:
nmap -sP www.example.com/24 nmap -sP 192.168.1.* - Cwebscanner: https://github.com/se55i0n/Cwebscanner
- Cyberspace search engines: Search for IPs in the same C-class subnet.
Port Scanning
Ports are used for network communication via TCP or UDP protocols.
Protocol Ports
- TCP port: Connection-oriented, reliable.
- UDP port: Connectionless, unreliable.
- TCP and UDP ports are independent of each other.
Port Types
- Well-known ports: 0–1023, e.g., port 80 for HTTP.
- Dynamic ports: 49152–65535, not fixed to specific services.
- Registered ports: 1024–49151, assigned to user processes.
Common Penetration Testing Ports
FTP - 21
File Transfer Protocol. Uses TCP ports 20 (data) and 21 (control).
- Brute force: Hydra, MSF modules.
- Anonymous access: username
anonymous, password empty/any email. - vsftpd backdoor: versions 2.0.0–2.3.4.
- Packet sniffing (requires local network access).
- Remote code execution, FTP bounce attack.
SSH - 22
Secure Shell for secure remote login.
- Brute force: Hydra, MSF modules.
- SSH backdoors.
- User enumeration (CVE-2018-15473).
HTTP - 80
- Web server/middleware vulnerabilities (IIS, Apache, Nginx).
- Common web application vulnerabilities.
NetBIOS/SMB - 139/445
- 139: NetBIOS session service (file and printer sharing).
- 445: SMB (Server Message Block).
- MS17-010 (EternalBlue) exploit.
- Other SMB vulnerabilities (MS06-040, MS08-067).
- IPC$ connection for further exploitation.
MySQL - 3306
- Brute force attacks.
- Malicious UDF (User Defined Function) for command execution.
- SQL injection,
LOAD_FILE()function.
RDP - 3389
Remote Desktop Protocol.
- Brute force attack.
- MS12-020 (Blue Screen of Death).
- CVE-2019-0708 (BlueKeep).
- Enabling RDP via registry or MSF.
Redis - 6379
In-memory data structure store.
- Brute force.
- Unauthorized access with SSH key privilege escalation.
- Replication-based remote code execution (CVE-2022-0543).
WebLogic - 7001
Java EE middleware.
- Weak passwords (e.g.,
weblogic/Oracle@123). - Deploy WAR backdoor via admin console.
- SSRF vulnerability.
- Deserialization vulnerabilities.
Tomcat - 8080
Open-source web server.
- CVE-2019-0232 (RCE).
- CVE-2017-12615 (arbitrary file upload).
- Weak password for manager console to get shell.
Historical Vulnerability Information
- WooYun Mirror: http://wy.zone.ci/
- WooYun Knowledge Base: https://wooyun.kieran.top/#!/
- Exploit-DB: https://www.exploit-db.com/
- KnownSec Seebug: https://www.seebug.org/
- Summary of Critical Systems Vulnerabilities: https://www.cnblogs.com/tomyyyyy/p/14701925.html