Percent-Encoding

Percent-encoding, also known as URL encoding, is a mechanism for representing characters in a URI using only the US-ASCII character set. Any octet is encoded as a percent sign % followed by two hexadecimal digits corresponding to the octet's value.

Usage

A URI is composed of a limited set of characters from the US-ASCII range. Characters outside this set, and reserved characters used as data rather than delimiters, need a way to be represented without breaking the URI structure. Percent-encoding solves this by replacing each octet of the character with % followed by its hexadecimal value.

The encoding applies to the path, query, and fragment components of a URL. The hostname component uses Punycode instead to represent internationalized characters.

Encoding scope

Percent-encoding applies to the path, query, and fragment of a URI. The hostname uses Punycode for non-ASCII characters. The scheme (https) and port are always ASCII.

Reserved characters

Two classes of reserved characters serve as URI delimiters.

General delimiters (gen-delims):

Character Encoded Role
: %3A Scheme/port separator
/ %2F Path segment separator
? %3F Query prefix
# %23 Fragment prefix
[ %5B IPv6 address start
] %5D IPv6 address end
@ %40 Userinfo delimiter

Subcomponent delimiters (sub-delims):

Character Encoded Role
! %21 Subcomponent delimiter
$ %24 Subcomponent delimiter
& %26 Query parameter separator
' %27 Subcomponent delimiter
( %28 Subcomponent delimiter
) %29 Subcomponent delimiter
* %2A Subcomponent delimiter
+ %2B Subcomponent delimiter
, %2C Subcomponent delimiter
; %3B Subcomponent delimiter
= %3D Key-value separator

When a reserved character appears as data rather than a delimiter, the character must be percent-encoded to prevent misinterpretation.

Unreserved characters

Unreserved characters carry no special meaning in URI syntax and do not require percent-encoding:

  • Uppercase letters: A-Z
  • Lowercase letters: a-z
  • Digits: 0-9
  • Hyphen: -
  • Underscore: _
  • Period: .
  • Tilde: ~

Unreserved characters stay unencoded during normalization. For example, a URI containing %41 (the encoded form of A) and one containing A are equivalent, but the decoded form is preferred.

Common percent-encoded characters

These characters appear frequently in encoded URIs:

Character Encoded Common context
(space) %20 Path and query values
/ %2F Literal slash in values
? %3F Literal question mark
# %23 Literal hash in values
& %26 Literal ampersand
= %3D Literal equals sign
@ %40 Email in path or query
% %25 Literal percent sign

The percent sign itself requires encoding as %25 when used as data. Failing to encode a literal % produces ambiguous URIs where %41 looks like an encoded A instead of the three characters %, 4, 1.

Form encoding vs URI encoding

HTML form submissions using application/x-www-form-urlencoded follow a variation of percent-encoding defined in the WHATWG URL Standard. The key difference is how spaces are handled:

Context Space encoding
URI path and query %20
Form data (application/x-www-form-urlencoded) +

In form-encoded data, + represents a space and a literal + is encoded as %2B. In a URI path, + is a literal plus sign. This distinction matters when decoding: a + in a query string from a form submission means a space, but a + in a URI path means a plus character.

Plus sign ambiguity

A + in a URL path is always a literal plus character. A + in query parameters from an HTML form submission represents a space. Mixing up these two contexts leads to encoding bugs difficult to trace.

Encoding in JavaScript

JavaScript provides two built-in functions for percent-encoding:

encodeURIComponent() encodes all characters except A-Z a-z 0-9 - _ . ~ ! ' ( ) *. Use this for encoding individual URI components like query parameter values.

encodeURIComponent("price=10&tax=2")
// "price%3D10%26tax%3D2"

encodeURI() preserves URI structure characters (:, /, ?, #, @, !, $, &, ', (, ), *, +, ,, ;, =). Use this for encoding a complete URI where delimiters need to remain intact.

encodeURI("https://example.re/path?q=hello world")
// "https://example.re/path?q=hello%20world"

The URLSearchParams API applies application/x-www-form-urlencoded encoding, encoding spaces as + rather than %20.

new URLSearchParams({q: "hello world"}).toString()
// "q=hello+world"

Example

A URL with a query parameter containing reserved characters needs percent-encoding to preserve the URI structure. The question mark ? separates the path from the query. When a question mark appears in the path as data, encoding prevents the parser from treating the path content as a query string.

Without percent-encoding (ambiguous)

https://example.re/really?.html?source=123

With percent-encoding (unambiguous)

https://example.re/really%3F.html?source=123

A search query containing spaces and special characters:

Without encoding

https://example.re/search?q=red shoes&size=10

Percent-encoded

https://example.re/search?q=red%20shoes&size=10

A path containing a literal slash as data, encoded to prevent interpretation as a path separator:

https://example.re/files/2026%2F03%2Freport.pdf

SEO and encoded URLs

Googlebot decodes percent-encoded URLs during crawling and treats %20 and a literal space as equivalent. Keeping URLs clean with hyphens instead of encoded spaces improves readability in search results and link sharing.

Takeaway

Percent-encoding represents reserved and non-ASCII characters in a URI by replacing each octet with a % followed by two hexadecimal digits. Reserved characters serve as delimiters in URI syntax and must be encoded when used as data. The application/x-www-form-urlencoded format adds a twist by encoding spaces as + instead of %20, a distinction worth noting when parsing form submissions versus URI components.

See also

Last updated: March 11, 2026