The major search engines handle the canonical URL, a method of reducing duplicate content problems. This WebRankInfo folder written by Olivier Duffez explains everything about the use of this special link rel = canonical tag.
What is the canonical URL for?
The problem of duplicate content
There are many situations that make a page accessible to multiple URLs, usually due to poor site design (I’ve given tips for correcting duplicate content issues ). This poses a problem for SEO because:
- to analyze a web page or any indexable web document (PDF, .doc, etc.), the search engines work with the principle of 1 page = 1 URL: a page is identified by a URL.
- so if the URL is different, it is a priori another page, analyzed differently
As a result, when the same web page is accessible in several places, the search engines consider by default that it is several pages. Imagine a product sheet that is accessible to multiple URLs (because of session IDs, tracking and affiliate settings , product presence in multiple categories, and so on).
Sometimes it’s strictly the same content that appears for different URLs. In this case we talk about DUST for D uplicate U RL S ame T ext.
If each of these versions receives different links (from other sites for example), then each page competes with the others in the Google results pages.
If conversely the site is well built and a page is only accessible to one URL whatever the conditions, then this page focuses all the assets for SEO (including through these famous backlinks).
Conclusion: You will not receive a red card from the Google page because you have pages that are full of URLs at a time, but you are less likely to succeed in positioning yourself in the engines.
The solution of the canonical URL
The solution proposed by the main engines is very simple and a priori very effective, so much so that one wonders why it was not introduced before.
OK, but then what is a canonical URL?
The principle is to allow the webmaster to indicate for each page of its site what is the canonical URL, that is to say the official URL of the page under which the engine must index the page .
In a way, the webmaster indicates the URL to which the page is supposed to be found. All other versions, which until now had duplicated content, will now be treated by the engines as strictly the same page.
Share the summary to your Twitter followers:
Although this is not the best way, the canonical URL helps to resolve duplicate content issues (to improve SEO).CLICK TO TWEET
The format of the rel = canonical tag
How to add a canonical URL tag? There are 2 ways: in the HTML code or in the HTTP header, with a syntax that differs slightly.
The canonical URL by a link tag in the HTML code
The simplest way for most cases is to place a tag in the HTML header of your page, with the following very simple format:
<link rel="canonical" href="URL" />
For example for the page of this article it gives:
<link rel = "canonical" href = "https://www.webrankinfo.com/documents/techniques/url-canonic" />
Where should the canonical URL be placed? Just between
The canonical URL by an HTTP header
It is also possible ( since June 2011 ) to define a canonical URL by adding a line in the HTTP header, for example by a directive in the .htaccess file. The format is as follows:
Link: <URL>; rel="canonical"
For example for the page of this article it gives:
Link: <https://www.webrankinfo.com/dollars/techniques/url-canonique>; rel = "canonical"
This is the same principle used to prohibit the indexing of a non-HTML document , for example a PDF (because in this case we can not put meta tag robots noindex ).
When and how to use canonical URLs
Why use a canonical URL?
Here are several cases where the canonical URL can solve duplicate content problems:
- tracking (affiliation, RSS feeds or others)
- the session IDs in the URL
- pages that are accessible to multiple URLs (example: a product located in multiple categories, a page accessible with optional parameters in the URL, a page accessible to a rewritten URL and a “normal”)
Is canonical URL a good solution?
Yes and no !
Yes because it is simple enough to set up and avoids technical problems.
No, because it’s a patch … It’s far more efficient not to have internal duplicate contents, even if they are managed by canonical URLs. If each product sheet is accessible to 2 URLs (because 2 variants), this doubles the number of URLs to crawl, which is not good for your crawling budget .
The best is to make sure that each content is always searchable at the same URL.
And if you need to fix a duplicate content problem , I think a good 301 redirect is more efficient and besides Matt Cutts agreed with me 😉
How to check canonical URLs?
Here’s the procedure to follow :
- Explore your entire site with a crawler that follows page-by-page links like Googlebot
- Deduce the list of all URLs. Focus on the ones that answer correctly (code 200).
- For each :
- extract the canonical URL (be careful, it can be defined in the HTML code as well as in the HTTP header)
- compare the URL of the crawled page to the specified canonical URL
- if the 2 URLs are identical, this is not a problem
- if the canonical URL is inaccessible (code 404, 410, 500, etc.) then the problem must be corrected
- if the canonical URL is redirected elsewhere (code 301, 302 …) then it is necessary to correct the problem by directly indicating the good final URL
- if the page content of the canonical URL differs too much from the page studied, then a better solution must be found
How to add canonical URLs with Joomla?
You have to go through plugins, there are plenty of them. For example, try EFSEO – Easy Frontend SEO or Sh404SEF which also manages a lot of other things for your SEO.
How to manage the canonical URL under WordPress?
There too, you have to use extensions, for example All In One SEO , SEOPress or Yoast SEO .
How to define the canonical URL in Drupal?
For example, you can install the free Metatag module which allows you to fill it in the editing of a content.
What are the most common mistakes?
Other questions answers on the canonical URL
(if you have other questions, ask them in the WebRankInfo forum )
is rel = canonical a directive?
No, it is an indication provided by the webmaster, taken into account by Google to determine which URL to index the page.
This differs from the robots meta tag which is a directive that Google applies.
How to check if Google takes into account the canonical URL?
Use the info: command and see which URL appears in the Google result: this is the URL under which Google chose to index the page.
What is canonicalization?
This is the process of defining a canonical URL common to several URLs corresponding to the same content. There is also the verb canonicaliser.
What is the relationship with Sitemaps?
In principle, the URLs you specify in your sitemap file correspond to canonical URLs. It is not advisable to indicate in a sitemap URLs for which a different URL is indicated as canonical (except special and temporary case of a de-indexing of pages ).
Should a relative or absolute URL be specified?
A relative URL will work well (at Google) but I advise you to use absolute URLs (which start with “http: //” or “https: //”).
Can the canonical URL be the URL of the page itself?
Yes it is possible to canonical self-designer. What’s the point ? just for prevention, if you have not planned other URL formats to access the same content.
Can canonical URL be part of another domain name?
Yes since December 2009, the canonical URL specified on one page may be part of another domain (this was not the case with the launch of the canonical URL tag in February 2009).
What happens if the different pages do not have exactly the same content?
Google allows slight differences, for example on a page that lists the products according to a sorting criterion. However Google will probably need to crawl the different versions, and several can sometimes (still) appear in the results.
2 pages that (really) do not have the same content should not use the same canonical URL.
What happens if the canonical URL returns a 404 code?
Google will continue to index your pages and its algorithm will attempt to find a canonical URL that works. Of course it is advisable to ensure that your canonical URLs do not return any error code.
The canonical URL is missing, is it serious?
No it is not essential, but as indicated above, it is better to define for each content under which URL it must be indexed.
Provided that the canonical URL provided is accurate and accessible!
To know more
I invite you to consult the following pages to know more about this canonical URL tag:
- the discussion in the WebRankInfo forum when launching the link tag rel = canonical
- the initial announcement by Google