Percent-Encoding
Percent-encoding, also known as URL encoding,
is a mechanism for representing characters in a
URI using only the US-ASCII character set.
Any octet is encoded as a percent sign % followed
by two hexadecimal digits corresponding to the
octet's value.
Usage
A URI is composed of a limited set of
characters from the US-ASCII range. Characters outside
this set, and reserved characters used as data rather
than delimiters, need a way to be represented without
breaking the URI structure.
Percent-encoding solves this by replacing each
octet of the character with % followed by its
hexadecimal value.
The encoding applies to the path, query, and fragment components of a URL. The hostname component uses Punycode instead to represent internationalized characters.
Encoding scope
Percent-encoding applies to the path, query,
and fragment of a URI. The hostname uses
Punycode for non-ASCII characters. The
scheme (https) and port are always ASCII.
Reserved characters
Two classes of reserved characters serve as URI delimiters.
General delimiters (gen-delims):
| Character | Encoded | Role |
|---|---|---|
: |
%3A |
Scheme/port separator |
/ |
%2F |
Path segment separator |
? |
%3F |
Query prefix |
# |
%23 |
Fragment prefix |
[ |
%5B |
IPv6 address start |
] |
%5D |
IPv6 address end |
@ |
%40 |
Userinfo delimiter |
Subcomponent delimiters (sub-delims):
| Character | Encoded | Role |
|---|---|---|
! |
%21 |
Subcomponent delimiter |
$ |
%24 |
Subcomponent delimiter |
& |
%26 |
Query parameter separator |
' |
%27 |
Subcomponent delimiter |
( |
%28 |
Subcomponent delimiter |
) |
%29 |
Subcomponent delimiter |
* |
%2A |
Subcomponent delimiter |
+ |
%2B |
Subcomponent delimiter |
, |
%2C |
Subcomponent delimiter |
; |
%3B |
Subcomponent delimiter |
= |
%3D |
Key-value separator |
When a reserved character appears as data rather than a delimiter, the character must be percent-encoded to prevent misinterpretation.
Unreserved characters
Unreserved characters carry no special meaning in URI syntax and do not require percent-encoding:
- Uppercase letters: A-Z
- Lowercase letters: a-z
- Digits: 0-9
- Hyphen: -
- Underscore: _
- Period: .
- Tilde: ~
Unreserved characters stay unencoded during normalization.
For example, a URI containing %41 (the encoded form
of A) and one containing A are equivalent, but
the decoded form is preferred.
Common percent-encoded characters
These characters appear frequently in encoded URIs:
| Character | Encoded | Common context |
|---|---|---|
| (space) | %20 |
Path and query values |
/ |
%2F |
Literal slash in values |
? |
%3F |
Literal question mark |
# |
%23 |
Literal hash in values |
& |
%26 |
Literal ampersand |
= |
%3D |
Literal equals sign |
@ |
%40 |
Email in path or query |
% |
%25 |
Literal percent sign |
The percent sign itself requires encoding as %25
when used as data. Failing to encode a literal %
produces ambiguous URIs where %41 looks like an
encoded A instead of the three characters %,
4, 1.
Form encoding vs URI encoding
HTML form submissions using
application/x-www-form-urlencoded follow a
variation of percent-encoding defined in the
WHATWG URL Standard. The key difference is
how
spaces are handled:
| Context | Space encoding |
|---|---|
| URI path and query | %20 |
Form data (application/x-www-form-urlencoded) |
+ |
In form-encoded data, + represents a space and a
literal + is encoded as %2B. In a URI path, +
is a literal plus sign. This distinction matters when
decoding: a + in a query string from a form
submission means a space, but a + in a URI path
means a plus character.
Plus sign ambiguity
A + in a URL path is always a literal plus
character. A + in query parameters from an
HTML form submission represents a space. Mixing
up these two contexts leads to encoding bugs
difficult to trace.
Encoding in JavaScript
JavaScript provides two built-in functions for percent-encoding:
encodeURIComponent() encodes all characters
except A-Z a-z 0-9 - _ . ~ ! ' ( ) *. Use this
for encoding individual URI components like query
parameter values.
encodeURIComponent("price=10&tax=2")
// "price%3D10%26tax%3D2"
encodeURI() preserves URI structure characters
(:, /, ?, #, @, !, $, &, ', (,
), *, +, ,, ;, =). Use this
for encoding a complete URI where delimiters need
to remain intact.
encodeURI("https://example.re/path?q=hello world")
// "https://example.re/path?q=hello%20world"
The URLSearchParams API applies
application/x-www-form-urlencoded encoding,
encoding spaces as + rather than %20.
new URLSearchParams({q: "hello world"}).toString()
// "q=hello+world"
Example
A URL with a query parameter containing reserved
characters needs percent-encoding to preserve the
URI structure. The question mark ? separates the
path from the query. When a question mark appears
in the path as data, encoding prevents the parser
from treating the path content as a query string.
Without percent-encoding (ambiguous)
https://example.re/really?.html?source=123
With percent-encoding (unambiguous)
https://example.re/really%3F.html?source=123
A search query containing spaces and special characters:
Without encoding
https://example.re/search?q=red shoes&size=10
Percent-encoded
https://example.re/search?q=red%20shoes&size=10
A path containing a literal slash as data, encoded to prevent interpretation as a path separator:
https://example.re/files/2026%2F03%2Freport.pdf
SEO and encoded URLs
Googlebot decodes percent-encoded URLs during
crawling and treats %20 and a literal space as
equivalent. Keeping URLs clean with hyphens
instead of encoded spaces improves readability
in search results and link sharing.
Takeaway
Percent-encoding represents reserved and
non-ASCII characters in a URI by replacing
each octet with a % followed by two hexadecimal
digits. Reserved characters serve as delimiters in
URI syntax and must be encoded when used as data.
The application/x-www-form-urlencoded format adds
a twist by encoding spaces as + instead of %20,
a distinction worth noting when parsing form
submissions versus URI components.
See also
- RFC 3986: Uniform Resource Identifier (URI)
- WHATWG URL Standard
- Punycode
- URL
- URI
- Data URLs
- HTTP headers