Parse an url string to get all the different parts (protocol, origin, params, port, username-password, ...)
URL Parser serves as a powerful tool that facilitates the dissection of a URL into its distinct components, providing a comprehensive breakdown of its intricate parts. This includes the identification and isolation of various elements such as the scheme, protocol, username, password, hostname, port, domain, subdomain, top-level domain (TLD), path, query-string, hash, and more.
What is the URI?
The URI stands for Uniform Resource Identifier. And it is used to identify the resource either by its "name" or "location" or "even both."
They come into two variants. In other words, the URIs are the superset of the URLs and URNs. The URLs and URNs are both themselves are URIs. But sometimes, the URIs may not be the URLs and URNs. If there is protocol such as HTTP and HTTPS is used. It is a URL. In the case of a URN, the protocol URN will be used.
For example ftp://example.com/file.zip
What is the URL?
The URL is derived from the Uniform Resource Locator. And it is used to locate the "resource" on the computer network. The characters' string identifies the resource's location by utilizing the scheme, such as HTTP and HTTPS.
The URL's primary function is to locate the resource that can be a web page, an image, a file, or any digital resource. You enter the URL in the browser's bar to locate the specific resource on the computer network.
What is the URN?
The URN is derived from the Uniform Resource Names. It is also used as an identifier to identify the resource by the name on the computer network. You can quickly figure out from the URIs, which are URNs. Because URNs always use the scheme urn.
For example urn:uuid:7b889726-edf5-4b92-87bf-ce6f3bf8e261
What is the structure of the URL?
Before going deeper, one must have an idea about the different parts of the URL. The URL consists of other parts. Some of them are essential, and some of them are optional. You can classify the URL into the following features.
Scheme: The scheme or the protocol is the specification of how you get to the specific resource. It indicates which scheme or protocol is used to access the particular resource on the computer network.
Today, HTTPS (Hypertext Transfer Protocol Secure) is the most used protocol on the computer network. It tells the browser to encrypt the data that the user enters, like its username, password, or credit card information, to secure the data and protect them from cybercriminals.
The other protocol includes HTTP, FTP, SFTP, and mailto, etc.
The scheme or protocol plays an essential role in your SEO ranking.
For example https://www.example.com
- Userinfo: It is an optional part of the URL that includes a username and password, followed by a @ symbol before the hostname. Its format is username: password. The username and password are split by the colon ":" sign. Some websites use the word "Auth" rather than "Userinfo."
- Hostname: The hostname is an essential part of the URL. It is always present in each URL that indicates the targeted server. The website's name often helps people know that they are visiting the website of a specific brand. The hostname consists of three parts.
- Subdomain: That is the optional part. It tells the particular segment of the website that the webserver should serve.
- Your website address is like your house address. More simply, you can say, it indicates or points toward the specific rooms in the house.
- Hostname: It is the primary part of the URL, and each URL contains the hostname. The website's name tells the people that they are visiting a specific name or brand. Having the hostname related to your niche positively impacts your SEO ranking.
- TLD: It specifies the entity under which your hostname registers on the computer network. For example, the most popular one is .com; the commercial entities use that. However, .edu is used for educational purposes. And .gov is used the government-owned institutions.
- Port: On the seaport, the specific sections are defined to perform a particular task. In the internet world, the port number is determined to use which port to access the target server's specific resource. The most used protocols like HTTP and HTTP by default use port 80 and 443, respectively. That is why they are omitted on the URLs. For example https://www.example.com:1234. The port is separated by the colon ":" sing, followed by the hostname.
- Path: The path defines a specific resource on the website that the user wants to access. It provides the hierarchy in a structured way. For example https://www.example.com/software/htp/cics/index.html On blogs, it mainly in that form: https://www.example.com/post/blog-post-name
The "blog-post-name" is the slug in that particular URL.
- Query string: Whatever comes after the "?" sign, the URL knows it is a query string. Like in Adwords, the query parameters are used to track the URL. For example https://www.example.com/?source=google&medium=cpc That is mostly found in dynamic pages. The key-value format is used. Each query string parameter is separated by ampersand "&" sign because the URL cannot contain the space.
- Fragment: Also knows as the hash. The hash "#" sign represents the fragment. And it indicates the secondary resource within the primary or first resource. For example https://www.example.com/introduction.html#section2
- Authority: The authority part of the URL consists of user info, hostname, and port. The user info and port are the optional part. The most used protocols like HTTP and HTTP by default use port 80 and 443, respectively. That is why they are omitted on the URLs.
- File suffix: It is a URL file extension. It indicates what type of file it is. It generally would not show, even if the showing option is on in your system by default.