Percent-Encoding

Percent-Encoding, also known as URL encoding, is a method to encode any Uniform Resource Identifier (URI) using only the US-ASCII character set.

Usage

Percent-encoding is used to encode a URI, which is an aggregate of both Uniform Resource Locator (URL) and Uniform Resource Name (URN). Although it is often referred to as URL encoding, it is in fact a more generally applied approach.

Percent-encoding is used to prepare data of media type application/x-www-form-urlencode, which is often used to submit HTML form data, as well as HTTP requests.

Note

Percent-encoding only applies to the path, query and/or fragment in a URL. Punycode is used for encoding the hostname with characters outside the ASCII character set.

Character set

A URI can be composed of both reserved and unreserved characters. In this context, a reserved character can be used for another purpose and must be percent-encoded. In contrast, an unreserved character does not have a special meaning and does not have to be represented using special characters.

The unreserved characters are:

  • Lowercase letters: a – z
  • Uppercase letters: A – Z
  • Digits: 0 – 9
  • Special characters:
    • hyphen: -
    • underscore: _
    • period: .
    • tilde: ~

Characters in this set do not have to be percent-encoded. Furthermore, although %30 represents the character zero 0, and in theory, it shall be treated the same way, it is not always the case. As such, for maximum compatibility, unreserved characters should never be percent-encoded.

Percent-Encoding

Percent-encoding uses three characters to represent a reserved character. The first is the percent sign %, and this is followed by the two-character hexadecimal value of the reserved ASCII character. The two-character hexadecimal value is not case sensitive.

Note

A reserved character that does not have a reserved meaning in a specific context can still be percent-encoded, but it is not mandatory.

Example

For example, in a URL that includes a query parameter, the question mark ? indicates the beginning of the query section. If for some reason, the question mark is required for something other than the prefix of the query section, then it must be percent-encoded as %3f or %3F.

Not percent-encoded URL (and likely to fail)

https://example.at/really?.html?source=123

Precent-encoded URL

https://example.at/really%3f.html?source=123

Takeaway

Percent-encoding is also known as URL encoding, and it refers to a method that uses a restricted set of US-ASCII characters to encode a URI.

Last updated: June 20, 2022