When your site gets crawled by the search engine spiders, the first thing they look for is the file robots.txt. If you are the webmaster, you should have a basic knowledge about the purpose and syntax of the file. Though this post won’t be covering all the parts but it will be a simple and short guide about robots.txt.
Where and what it is?
Robots.txt – it is a text file located in the root directory of a site that holds specific instructions for search engines. These regulations or instructions may prohibit the indexation of certain sections or pages on the site, recommend a spider to observe a certain time interval between downloading documents from server, etc.
In this article, we will discuss how robots.txt should look for WordPress. Let’s start with the fact that it is a simple text file that can be easily edited in Notepad. The name of the file is case sensitive and you should write it always in lower case like “robots.txt”. If the name is not written like this, spiders will ignore this file.
Now let’s discuss the basic functions of the file. They are not so much.
User-agent - the name of the robot. Each bot has a unique name (User-agent: google).
Disallow – after the directive we need to specify the file or folder that you don’t want to allow access to. Note that the address should be relative to the root of your site. For example (Disallow: / admin).
Let’s create a robots.txt file and place the following code:
User-agent: * Disallow: / wp-login.php Disallow: / wp-register.php Disallow: / xmlrpc.php Disallow: / wp-admin Disallow: / wp-includes Disallow: / wp-content Disallow: / tag / Disallow: / trackback / Disallow: / feed / Disallow: / comments / Disallow: * / trackback / Disallow: * / feed / Disallow: * / comments / Disallow: /? Feed = Disallow: /? S = Allow: / wp-content/uploads / Sitemap: http://domain/sitemap.xml
These lines, we have banned robots to index the system folders (wp-admin, wp-includes, wp-content), a page with information on the tags, rss feed, search results. But, specify the exact path to the folder with media files, and a site map.
This is it. This is the final robots.txt file that you should place in the root directory of yout domain. This is important if you don’t want your login page being pulled up in google search results! Questions and comments are always welcome. If you have any queries, please leave a question below.