Canonical URLs
Canonical URLs solve a deduplication problem.
When the same content is reachable at multiple
addresses, search engines pick one version to index
and consolidate ranking signals onto the preferred
version. The HTTP
Link header with rel="canonical" declares
the preferred URL at the protocol level,
delivering
the signal before the HTML body arrives.
Usage
The same page often exists at several URLs. Protocol
variants (http:// vs https://), www and non-www
hostnames, trailing slashes, query parameters for
tracking or sorting, print-friendly versions, and
syndicated copies all create duplicate URLs pointing
to identical or near-identical content. Without a
canonical declaration, search engines decide which URL
to index on their own, splitting ranking signals across
the duplicates.
The rel="canonical" link relation names the
preferred URL. Search engines treat the declaration
as a strong signal for consolidating indexing, link
equity, and ranking signals onto the canonical URL.
Link: <https://example.re/page>; rel="canonical"
Three delivery methods exist: the HTTP
Link header, the HTML
<link rel="canonical"> element, and inclusion in an
XML sitemap. All three carry weight with Google. The
HTTP Link header is the focus here because the
header operates at the server level and reaches
crawlers before any HTML parsing begins.
Signal, not directive
Canonical declarations are strong signals, not directives. Search engines evaluate canonical annotations alongside Redirects, internal links, sitemaps, and HTTPS preference to select the final canonical. When signals conflict, the search engine picks the canonical independently.
Early signal delivery
The HTTP Link header is parsed before the response body. When a crawler receives the headers of an HTTP response, the canonical URL is already known. The crawler decides whether to continue downloading and rendering the body or move on to the canonical URL instead.
This matters for crawl efficiency. Pages behind query parameters, session IDs, or tracking codes often return the same content as the canonical. Early canonical delivery avoids wasting crawl bandwidth and rendering resources on duplicate content.
103 Early Hints responses push this further. A server sends a 103 response with the canonical Link header before the final response is ready. The crawler receives the canonical signal while the server is still generating the page.
HTTP/1.1 103 Early Hints
Link: <https://example.re/page>; rel="canonical"
The final response follows with the full headers and body. The canonical is already communicated.
SEO
Early canonical delivery through the Link header or 103 Early Hints reduces wasted crawl rendering. When the canonical points elsewhere, the crawler skips the full download. This matters most for large sites with many parameterized duplicates consuming crawl budget.
Non-HTML resources
PDFs, images, downloadable files, and API responses
have no HTML <head> element. The HTTP Link header
is the only method for declaring a canonical URL on
non-HTML resources.
A PDF accessible at multiple URLs uses the Link header to point crawlers to the preferred version.
HTTP/1.1 200 OK
Content-Type: application/pdf
Link: <https://example.re/report.pdf>; rel="canonical"
A common pattern is canonicalizing a PDF to its dedicated download page. The PDF itself is the raw file, but the download page provides context, metadata, and internal links. Pointing the PDF's canonical to the HTML download page consolidates indexing signals onto the page with richer content.
HTTP/1.1 200 OK
Content-Type: application/pdf
Link: <https://example.re/reports/annual>; rel="canonical"
The download page at /reports/annual becomes the
indexed URL. The PDF stays accessible at its direct
URL but drops out of search results in favor of the
HTML page.
Server configuration handles this without modifying
the files themselves. Nginx add_header, Apache
Header set, and CDN edge rules inject the Link
header at the infrastructure level.
Cross-domain canonical
The rel="canonical" link relation works across
domains. The canonical URL exists on a different
hostname. Syndicated content, white-label pages, and
content distributed across partner sites use
cross-domain canonical to consolidate indexing signals
back to the original publisher.
Link: <https://original.example.re/article>; rel="canonical"
Cross-domain canonical is a strong consolidation signal. The target domain accumulates the ranking value from all syndication URLs pointing to the canonical. The syndicated copies still appear online but drop out of search results in favor of the canonical.
Trust requirement
Cross-domain canonical works because the syndicating site voluntarily points to the original. Search engines verify the relationship and ignore cross-domain canonicals when the content on the two URLs is substantially different.
Conflict resolution
When the HTTP Link header and the HTML
<link rel="canonical"> element declare different
canonical URLs for the same page, Google uses the
HTML element value. The HTML element is closer to the
content and considered more intentional by the page
author.
Mixing methods increases the chance of conflicting signals. Using one canonical method per page is the safest approach. If server-level configuration sets a Link header canonical and the CMS injects a different HTML canonical, the mismatch creates ambiguity and search engines resolve the conflict on their own terms.
Beyond explicit canonical declarations, search engines consider redirects, internal link patterns, sitemap URLs, HTTPS preference, and hreflang cluster membership when selecting the canonical URL.
Common mistakes
Missing self-reference. Every page benefits from a self-referencing canonical pointing to its own preferred URL. Without one, search engines rely entirely on other signals to pick the canonical.
Canonical pointing to a non-200 page. A canonical URL returning a redirect, 404, or 410 invalidates the declaration. The canonical target must return 200.
Canonical on paginated content. Each page in a paginated series self-canonicalizes to itself. Pointing all pages to page one hides pages two and beyond from the index.
Canonical combined with noindex. A page with
noindex and rel="canonical" sends conflicting
signals. The noindex tells search engines to drop
the page. The canonical tells them to consolidate
onto the page. Pick one.
Relative URLs. Canonical URLs must be absolute. A relative path creates parsing ambiguity and risks the canonical resolving to the wrong URL.
Canonicalizing to unrelated content. The target URL must contain content identical or nearly identical to the source. Pointing a product page canonical to the homepage is treated as a soft 404 signal.
Example
A product page accessible with and without query parameters. The canonical Link header consolidates signals onto the clean URL.
HTTP/1.1 200 OK
Content-Type: text/html
Link: <https://example.re/products/widget>; rel="canonical"
Both https://example.re/products/widget?ref=email
and https://example.re/products/widget?sort=price
return this same canonical header, pointing search
engines to the parameter-free URL.
A 103 Early Hints response delivering the canonical before the final response.
HTTP/1.1 103 Early Hints
Link: <https://example.re/products/widget>; rel="canonical"
HTTP/1.1 200 OK
Content-Type: text/html
Link: <https://example.re/products/widget>; rel="canonical"
A PDF hosted at multiple URLs with a canonical Link header declaring the preferred version.
HTTP/1.1 200 OK
Content-Type: application/pdf
Content-Disposition: inline
Link: <https://example.re/docs/guide.pdf>; rel="canonical"
A response combining canonical with resource hints in a single Link header.
HTTP/1.1 200 OK
Content-Type: text/html
Link: <https://example.re/page>; rel="canonical", </css/main.css>; rel="preload"; as="style", <https://cdn.example.re>; rel="preconnect"
Takeaway
Canonical URLs declared through the HTTP Link header consolidate search engine indexing signals onto a preferred URL before the HTML body arrives. The header-based approach works for both HTML and non-HTML resources, delivers the canonical signal early in the crawl process, and integrates with 103 Early Hints for even faster communication.
Note
For SEO and canonicalization assistance, contact ex-Google SEO consultants Search Brothers.
See also
- RFC 6596: The Canonical Link Relation
- RFC 8288: Web Linking
- Link
- Hreflang
- Redirects
- Soft 404
- Resource Hints
- HTTP headers