WordPress Tips & Tricks

How to Disallow Crawlers, Spiders and Bots from Websites

Sumit Pore

Published on July 10, 2014
Last updated on July 10, 2014

disallow-web-crawlers — Disallow Crawlers on Websites

Websites must be developed by following the highest coding and testing standards in order to achieve an exemplary result. All aspects concerned with security, coding, revisioning, and testing should be considered in doing so.

The conventional practice we adopt while building a website is to develop and test the website on a staging server. We then give the client a demonstration of the website using the staging site. Following an approval from the client we make the website live for the users on the designated server. In this complete process, staging sites play a vital role. Since the staging sites are to be used only by us and the clients, we don’t want any crawler or search engine to crawl the content of our staging site.

In order to achieve this goal, I have made a few changes to the apache configuration and created a robots.txt file in the root folder. I have added instructions to this text file which disallow the web crawlers from going further into the site. Now, whenever a web crawler comes to a website, it first requests for the robots.txt file. Once it is given the robots.txt file it reads the instructions which block it from the website and hence doesn’t crawl the website any further.

Steps to Create Robots.txt File

You will need SSH access to your staging server. I have implemented this on Apache 2.4.7 Webserver installed on Ubuntu 14.04

Open your terminal and connect to your server using SSH. Now create a file ‘robots.txt’ in /var/www folder. Execute the below command to create this file:
[pre]nano /var/www/robots.txt[/pre]
[space]
It will open the editor in a terminal. If it throws an error, then execute the following command.

[pre]sudo nano /var/www/robots.txt[/pre]

[space]

Copy the two lines below in the robots.txt file.

[pre]User-agent: *
Disallow:/ [/pre]

[space]

After adding the above two lines to the robots.txt file, save it. You will now need to change the owner and permissions of robots.txt file. First, we will change the owner of robots.txt file to the user who runs apache. Execute the following command to find the user.

[pre]ps -ef | grep apache2 | grep -v `whoami` | grep -v root | head -n1 | awk ‘{print $1}'[/pre]

[space]

Most of the times it returns ‘www-data’. However if that’s not the case then simply note down whatever it returns. Replace ‘www-data’ with that username in the command below and then run it to change the owner of robots.txt.

[pre]sudo chown www-data /var/www/robots.txt[/pre]

[space]

Now we have to change the permissions of the robots.txt file. To do this, we need to take help of chmod command, like below.

[pre]sudo chmod 644 /var/www/robots.txt[/pre]

[space]

So, we have the robots.txt file ready to be used. Now, let’s make a few changes in the apache configuration, so that if the server gets requests for robots.txt for any site defined in virtualhost (Don’t know, what is virtual host and how it create it? click here), then the recently created robots.txt file will be forwarded.

Steps to Change Apache Configuration

Lets open up apache2.conf which is present in directory /etc/apache2. To open the apache.conf file execute this command:

[pre]sudo nano /etc/apache2/apache2.conf[/pre]

[space]

Add the line below at the bottom of that file.

[pre]Alias /robots.txt /var/www/robots.txt[/pre]

The above line tells apache that if any requests for robots.txt are made, then forward the file /var/www/robots.txt. (Want to know more about Alias directive, read it here.)

[space]
Save the file after adding the above line and reload apache. Apache can be reloaded using the following command.
[pre]sudo service apache2 reload[/pre]

[space]
You are done! So now, if you have created a site abc.com in your virtualhosts, then a request for abc.com/robots.txt will show the content
[pre]User-agent:
Disallow: /[/pre]

[space]

You can also implement HttpBasicAuthentication on top of the above method as an additional measure to disallow crawlers from crawling your website. It shows username and password prompt. If you are looking for that method too, then this guide might help you.

Thanks for reading 🙂

Sumit Pore

One Response

jilsha mikhdad says:

November 18, 2018 at 11:21 am

hai..really informative article on spiders , crawlers , robots …thanks

Reply

Leave a Reply Cancel reply

Subscribe to our Newsletter

A key to unlock the world of open-source. We promise not to spam your inbox.

Get Expert Tips to Override WooCommerce Templates Seamlessly

Custom LearnDash Solutions, for You

Custom WooCommerce Solutions, for You

Scale your WordPress Business

Scale your WooCommerce Business

Scale your LearnDash Business

Services Title

Services Subtext

Title

Subtext

The Wisdm Digest delivers all the latest news, and resources from the world of open-source businesses to your inbox.

How to Disallow Crawlers, Spiders and Bots from Websites

Sumit Pore

Steps to Create Robots.txt File

Steps to Change Apache Configuration

Sumit Pore

One Response

Leave a Reply Cancel reply

Subscribe to our Newsletter

Suggested Reads

12 eCommerce Brands That Scaled After Hiring a WooCommerce Migration Agency for Custom Workflows

Managing Multiple Brands or Locations? How to Simplify Operations With WordPress Multisite Customization

10 Real-World Use Cases That Require Custom WordPress Plugin Development Services

LearnDash News After the StellarWP Wind-Down: A Calm, Fact-First Briefing on What Changed and What Did Not

Get Expert Tips to Override WooCommerce Templates Seamlessly

Custom LearnDash Solutions, for You

Custom WooCommerce Solutions, for You

Scale your WordPress Business

Scale your WooCommerce Business

Scale your LearnDash Business

Services Title

Services Subtext

Title

Subtext

Suggested Reads

What Happens When No One Owns Your Website Performance? Understanding the Role of Website Management Services

WordCamp Asia 2026: In Conversation With the Community

How to Migrate Platforms Without Losing the Conversion Gains You Built Over the Years

When Off-the-Shelf WordPress Stops Supporting Business Growth

LearnDash Migration Readiness Audit: A 90-Day Project Plan for $1M+ Founders

7 Reasons One-Off WordPress Projects Fail to Scale (And What WordPress Retainer Services Do Differently)

USA Office

Mumbai Office

Dubai Office

About Us

Our Work

Specialized Services

Connect With Us