Among the request methods in the HTTP protocol, GET is the most commonly used—it retrieves data from a resource on the server. Another important method is POST, which submits data to a resource.
But how do we identify resources on the server? How do we distinguish "this" resource from "that" resource?
From previous lessons, you already know the answer: URI, which stands for Uniform Resource Identifier. Because it frequently appears in the browser's address bar, it's commonly called a "web address" or simply "URL."
Strictly speaking, URI is not entirely the same as URL. URI consists of two parts: URL and URN. In the HTTP world, what we commonly call a "web address" is actually a URL—Uniform Resource Locator. However, because URL is so ubiquitous, the two terms are often used interchangeably.
URIs are not only essential for everyday internet browsing but also crucial in development, testing, and operations work.
If you're developing iOS, Android, or mini-program applications, you'll need to connect to remote services, wich involves calling lower-level APIs that use URIs to access those services.
If you work with Java or PHP for backend web development, you'll use functions like getPath() or parse_url() to handle URIs and parse their various components.
When configuring web servers like Apache or Nginx during testing and operations, you must correctly understand URIs to distinguish between static and dynamic resources, or set up rules for page redirects.
In summary, URIs are extremely important. To understand HTTP and web applications, you must understand URIs.
URI Format
Have you ever looked at the long string of characters in your browser's address bar while browsing the web? Some are short, others don't even fit on one line. Some are understandable at a glance, while others contain various strange characters that look like "cryptic scripts."
Once you understand the URI format, you can easily "decode" these confusing "cryptic scripts."
A URI is essentially a string that uniquely identifies the location or name of a resource.
It's important to note that URIs can identify not only web resources but also other types like email systems, local file systems, or any other resources. "Resources" can be static files on disk like text or page data, or dynamic services provided by Java or PHP.
The following diagram shows the most common form of a URI, consisting of four parts: scheme, host:port, path, and query. However, some parts can be omitted depending on the situation.
Basic URI Components
The first component of a URI is called scheme, which can be translated as "scheme name" or "protocol name." It indicates which protocol should be used to access the resource.
The most common is "http," indicating the HTTP protocol. There's also "https," indicating encrypted, secure HTTPS. There are other less common schemes like ftp, ldap, file, and news.
When a browser or application sees the scheme in a URI, it knows what to do next—it will call the corresponding lower-level HTTP or HTTPS API. Obviously, if a URI doesn't provide a scheme, even if the rest of the address is perfect, it cannot be processed.
After the scheme, there must be three specific characters: "//". This separates the scheme from the subsequent part.
Honestly, this design is quite unusual. When I first started using the internet, I found the "//" in the address bar quite awkward, and even now I haven't fully adjusted to it. Tim Berners-Lee, the creator of URI, has privately admitted that "//" was unnecessary and originally "too hasty."
However, this design has been around for thirty years. Whether we like it or not, we have to accept it.
After "//" comes the part called "authority," which indicates where the resource is hosted. The usual form is "host:port," meaning hostname plus port number.
The hostname can be an IP address or domain name and is required—otherwise the browser won't find the server. But the port number can sometimes be omitted. Browsers and other clients will use default ports based on the scheme: port 80 for HTTP and port 443 for HTTPS.
With the protocol name and host address plus port number, plus the path that marks the resource location, the browser can connect to the server and access the resource.
URI paths use a similar representation to file system "directories" and "paths." Because early internet computers were mostly UNIX systems, they adopted UNIX's "/" style. It's actually quite intuitive—it's consistent with the "//" after the scheme.
I need to remind you again that the path component of a URI must start with "/", meaning it must include "/". Don't mistake "/" as belonging to the authority part.
After so much "theory," let's look at some practical examples:
http://apache.org
http://www.example.com:8080/15-1
https://datatracker.ietf.org/doc/html/rfc7230
file:///C:/workspace/projects/
The first URI is the simplest: the protocol is "http," the hostname is "apache.org," the port is omitted so it defaults to 80, and the path is also omitted, defaulting to "/" (root directory).
The second URI is a dedicated URI for this course in our lab environment. The hostname is "example.com," the port is 8080, and the path is "/15-1."
The third is the URI for the HTTP protocol standard document RFC7230. The hostname is "datatracker.ietf.org," and the path is "/doc/html/rfc7230."
Pay attention to the last URI—its protocol is not "http" but "file," indicating a local file. Why are there three slashes?
If you listened carefully to the scheme introduction, you can understand: the first two slashes belong to the URI separator "//", and then "/C:/workspace/projects/" is the path, while the hostname in the middle is "omitted." This is actually a "special case" for file-type URIs—it allows omitting the hostname, defaulting to localhost.
However, for network communication protocols like HTTP or HTTPS, the hostname cannot be omitted. As mentioned before, the browser wouldn't be able to find the server.
We can use Chrome in our lab environment to carefully examine the URI in HTTP messages.
Open Chrome, use F12 to open developer tools, then enter "example.com" in the address bar. The result is shown in the image below.
In developer tools, select "Network" then "Doc" to find the request URI. Then in the Headers page, look at Request Headers, use "view source" to see the raw request headers sent by the browser.
Did you notice anything special?
The URI in the HTTP message is "/15-1," which is quite different from "example.com" entered in the browser—the protocol name and hostname are gone, leaving only the path portion.
This is because the protocol name and hostname already appear in the HTTP version in the request line and the Host field in request headers, so there's no need to repeat them. Of course, using a complete URI in the request line is also acceptable—you can try this yourself after class.
Through this small experiment, we also reached a conclusion: the URI seen by the client and server are different. The client must see a complete URI, using a specific protocol to connect to a specific host. The server only sees the URI in the request line with the protocol name and hostname removed.
If you've configured Nginx, you should now understand that Nginx, as a web server, operates on URIs in its location and rewrite directives—these actually refer to the path and subsequent parts of the real URI.
URI Query Parameters
Using "protocol + hostname + path" can precisely locate any resource on the network. But this isn't enough—often we want to attach some extra修饰 parameters when operating on resources.
Here are a few examples: get a product image but want a 32x32 thumbnail version; get a product list but need pagination and sorting according to certain rules; redirect to a page but want to mark the original page before redirecting.
Using "protocol + hostname + path" alone cannot handle these scenarios. That's why there's a "query" part after the URI. It comes after the path, starts with "?" but doesn't include "?". This represents additional requirements for the resource. This is a very intuitive symbol—much better than "//"—clearly indicating "query" meaning.
The query parameters have their own format: multiple "key=value" strings, connected by the "&" character. Browsers and clients can parse this long string of query parameters into understandable dictionary or associative array forms according to this format.
You can try this URI with query parameters in our lab environment using Chrome:
http://www.example.com:8080/15-1?user_id=5678&category=electronics&source=homepage
Chrome's developer tools can also decode the key-value pairs in the query, saving us from manually parsing them.
Let's look at another actual URI—this is a product search URI from an e-commerce website. It's quite complex, but I'm sure you can now easily distinguish the protocol name, hostname, path, and query parameters.
https://search.example.com/search?keyword=docker&enc=utf-8&sort=price_asc&page=1&filter=in_stock
<p>You can also enter this URI into Chrome's address bar and use developer tools to carefully examine its components.</p>
<h3>Complete URI Format</h3>
<p>Now that we've covered query parameters, the URI is complete. The vast majority of URIs used in HTTP protocols are in this form.</p>
<p>However, it's important to mention that URIs have a "truly" complete form, as shown in the diagram.</p>
<p>This "true" form has two additional parts compared to the basic form.</p>
<p>The first additional part is the <strong>userinfo</strong> "user:passwd@" after the protocol name but before the hostname. This represents the username and password for logging into the host, but this form is no longer recommended (per RFC7230) because it exposes sensitive information in plain text, creating serious security risks.</p>
<p>The second additional part is the <strong>fragment identifier</strong> "#fragment" after the query parameters. It's an "anchor" or "tag" within the resource identified by the URI. Browsers can directly jump to the position it indicates after retrieving the resource.</p>
<p>However, fragment identifiers can only be used by clients like browsers—the server cannot see them. In other words, browsers never send URIs with "#fragment" to the server, and the server never processes resource fragments in this way.</p>
<h3>URI Encoding</h3>
<p>We just saw that URIs can only use ASCII characters. But what if we need to use Chinese, Japanese, or other non-English languages in a URI?</p>
<p>Also, certain special URIs may contain delimiter characters like "@&/" in the path or query, which can cause URI parsing errors. What should we do then?</p>
<p>Therefore, <strong>URIs introduce an encoding mechanism</strong>: <strong>for character sets beyond ASCII and special characters, a special operation converts them into a form that doesn't conflict with URI semantics</strong>. This is called "escape" and "unescape" in RFC specifications, commonly known as "encoding."</p>
<p><strong>The URI encoding rules are somewhat "simple and brutal"—they directly convert non-ASCII or special characters to hexadecimal byte values, then add a "%" prefix.</strong></p>
<p>For example, space is encoded as "%20," "?" is encoded as "%3F." Chinese and Japanese characters are typically encoded using UTF-8 before escaping—for example, "银河" becomes "%E9%93%B6%E6%B2%B3."</p>
<p>With this encoding rule, URIs become even more perfect—they can support any character set in any language to mark resources.</p>
<p>However, we usually don't see these escaped "gibberish" characters in browser address bars. This is actually a "friendly"表现 by the browser, hiding the "ugly side" of URI encoding. Try the following URI if you don't believe me:</p>
<code>http://www.example.com:8080/15-1? 后羿射日
</code>
<p>First, enter this URI with Chinese in the query into Chrome's address bar, then click the address bar and copy it to another editor—it will "reveal its true form":</p>
<code>http://www.example.com:8080/15-1?%E5%90%8E%E7%BE%9F%E5%B0%84%E6%97%A5
</code>
<h3>Summary</h3>
<p>Today we learned about web addresses, also known as URIs. Here's a summary of today's content:</p>
<ol>
<li>A URI is <strong>a string that uniquely identifies a resource on a server</strong>, commonly also called a URL;</li>
<li>A URI typically consists of <strong>scheme, host:port, path, and query</strong>—some parts can be omitted;</li>
<li>The scheme is called "scheme name" or <strong>"protocol name"</strong>, indicating which protocol should be used to access the resource;</li>
<li>"host:port" represents the <strong>hostname and port number</strong> where the resource is located;</li>
<li>The path marks <strong>the location of the resource</strong>;</li>
<li>The query represents <strong>additional requirements</strong> for the resource;</li>
<li>In URIs, special characters like "@&/" and Chinese characters must be encoded, otherwise the server won't be able to process the HTTP message correctly.</li>
</ol>
<h3>Homework</h3>
<p><strong>Question 1:</strong> HTTP protocol allows using complete URIs in the request line, but why don't browsers do this?</p>
<p>Browsers typically don't use complete URIs in HTTP request lines for several reasons:</p>
<ol>
<li><strong>The protocol name and hostname already appear in the HTTP version in the request line and the Host field in request headers—there's no need to repeat them.</strong></li>
<li><strong>Efficiency and conciseness:</strong> Sending only the necessary parts reduces the data volume of the request, improving sending and processing efficiency.</li>
<li><strong>Convention and standards:</strong> Following common practices and standards of the HTTP protocol—usually sending only specific parts meets the server's identification and processing requirements.</li>
<li><strong>Compatibility and interoperability:</strong> Following widely accepted specifications helps ensure good compatibility and interoperability between different browsers, servers, and network components.</li>
<li><strong>Security considerations:</strong> Reducing unnecessary information exposure can lower potential security risks.</li>
<li><strong>Server configuration and processing logic:</strong> Server design and configuration are typically based on standard request formats—using complete URIs might cause encompatibility with existing server architectures and processing logic.</li>
</ol>
<p><strong>Question 2:</strong> URI query parameters and HTTP header fields are similar—both are key-value forms and can be arbitrarily customized. How should they be distinguished when used?
</p><p>Although URI query parameters and HTTP header fields are both key-value pairs and customizable, they differ in usage as follows:</p>
<ol>
<li><strong>Scope and purpose:</strong>
<ul>
<li>Query parameters are typically used to pass <strong>resource-specific</strong> parameters to the server, affecting how the server processes and returns that resource.</li>
<li>Header fields are used to pass metadata information about the entire request or response, such as client preferences, authentication information, cache control, etc.</li>
</ul>
</li>
<li><strong>Transmission method:</strong>
<ul>
<li>Query parameters are included in the URI—in the URL portion of the request.</li>
<li>Header fields are sent in the headers of the request or response.</li>
</ul>
</li>
<li><strong>Visibility and cacheability:</strong>
<ul>
<li>Query parameters are typically visible in the browser address bar and may be recorded in server access logs. Their impact on caching depends on server configuration.</li>
<li>Header fields are less visible to users and provide more targeted and flexible cache control.</li>
</ul>
</li>
<li><strong>Data type and length limits:</strong>
<ul>
<li>Query parameters may face some restrictions from URL specifications in terms of data type and length.</li>
<li>Header fields typically have more lenient support for data types and lengths, but may also be subject to server and client configuration limits.</li>
</ul>
</li>
<li><strong>Security considerations:</strong>
<ul>
<li>Certain sensitive information (like authentication tokens) shouldn't be placed in query parameters because they're more easily exposed in URLs, increasing security risks. Header fields can be transmitted more securely through encrypted connections (like HTTPS).</li>
</ul>
</li>
</ol>
<p>In summary, choose between query parameters and header fields based on specific needs and scenarios to achieve more efficient, secure, and accurate communication.</p>
<p><strong>Question 3:</strong> What is the difference between URI and URL?</p>
<p>The difference between URI (Uniform Resource Identifier) and URL (Uniform Resource Locator) is mainly体现在以下几个方面:</p>
<ol>
<li><strong>Scope:</strong> URI is a more general concept for identifying resources; URL is a subset of URI that not only identifies a resource but also specifies how to locate and access it.</li>
<li><strong>Function:</strong>
<ul>
<li>URI primarily serves to uniquely identify a resource, without necessarily providing information about how to obtain or manipulate that resource.</li>
<li>URL explicitly indicates the location and access method for obtaining the resource—for example, through a specific protocol (like HTTP, FTP), server address, port number, file path, etc.</li>
</ul>
</li>
<li><strong>Examples:</strong>
<ul>
<li>A URI might just be a name or identifier—for example: "urn:isbn:0451450523" (identifying a book's ISBN)</li>
<li>A URL like: "https://www.example.com/page.html" explicitly indicates retrieving "page.html" from server "www.example.com" via HTTPS protocol.</li>
</ul>
</li>
</ol>
<p>In summary, URL is a specific type of URI that can locate resources, while URI is a broader concept that includes URL and other identifiers that only identify resources without specifying access methods.</p>