Technology, WordPress Tips & Tricks

How to Block Crawlers, Spiders and Bots from Websites

Sumit Pore

Published on August 2, 2014
Last updated on August 2, 2014

disallow-web-crawlers — Block Web Crawlers

The one thing I have often noticed is that while a No Entry sign mostly suffices in preventing people from trespassing a restricted area it’s not an absolutely foolproof plan. There will always be some people who will have complete disregard for this sign and will venture into the restricted area. Using the robots.txt file to disallow crawlers from a certain website is similar. While the instructions in the robots.txt file will disallow crawlers, spiders and bots from crawling your website it does not set any kind of a mandate. There is a possibility that some spiders will still crawl your page. Hence there is a need to block crawlers.

In an earlier article we wrote about How to Disallow Crawlers, Spiders and Bots from Websites. While this method is efficient it clearly does not seem to be sufficient. Therefore, to resolve this issue that arises we will have to come up with a work around and I am going to provide you just that. Now, instead of just disallowing the crawlers with instructions in the robots.txt file we are going to block crawlers.

The method given below to block crawlers has been tried on Apache 2.4.7 (installed on Ubuntu). I expect that it should work with Apache 2.4.x. If you are not able to implement the methods given below on your Apache, then write to me in the comments section. Please give information about your Apache version and Server Operating system. If you are going to provide any sensitive information, then you can write to me at [email protected].

HTTP Basic Authentication to Block Crawlers

The first method I am going to demonstrate to block crawlers is using HTTP Basic Authentication. Sometimes you might have come across the authentication box when you try to access a few websites like the image given below.

Http-Authentication-block-crawlers — Authentication Pop-up for Website

[space]

Above box appears when HTTP Authentication is implemented. To implement this you have to edit virtualhost configuration file of your domain.

Create a Password File

First step is to create Password file containing username and password. Connect to your server using SSH and execute below command

htpasswd -c <path_of_the_password_file> <username>

[space]

Replace <path_of_the_password_file> with the location where you want to create a file which stores username and password combination in encrypted format. For sake of explanation, let’s assume that you provide a path /home/tahseen/Desktop. Replace <username> with username you want. For demonstration purposes I am going to create a username wisdmlabs. So now your command should look something like below.

htpasswd -c /home/tahseen/Desktop/password wisdmlabs

[space]

After replacing password file location and username in above command, hit enter. It would ask you for the password of the username you want to add. Provide it a password and hit enter. After adding username to the file, it will show a message Adding password for user <username>, where <username> will be username you wanted to add. The image below will help you clearly understand what I am saying.

create-password-file-block-crawlers — Create Password File

Note: In above command we have passed -c option, so that it creates a file. If you already have a file where it should save username-password combination, then you don’t need to provide -c parameter.

Edit Configuration File

Till now, we have created username and password. Now, it is time to add this information in site configuration. This step will help us block crawlers from our website. Let’s say, you are trying to implement this for abc.com. Virtualhost configuration for that domain will be in directory /etc/apache2/sites-available directory. I am assuming that configuration file for abc.com is abc.com.conf. Open that configuration file for editing using the command below.

sudo nano /etc/apache2/sites-available/abc.com.conf

[space]

Append below content in the end of VirtualHost block of the configuration file.

<Directory />
  #Allowing internal IPs to access websites directly. If you don’t have internal ips, then omit below line
  Require ip 192.168.2.1/24
  # Replace /var/.password with the file path you provided in for htpasswd command
  AuthType Basic
  AuthUserFile /var/.password
  AuthName "Authentication Required"
  require valid-user
  Satisfy Any
</Directory>

[space]

After adding above content, save the file and reload Apache by firing command below.

sudo service apache2 reload

[space]

You are done! Now try to visit the website, it should ask you username and password (if you are not visiting from internal network). If this authentication pop up appears then your attempt to block crawlers has worked!

[space]

Responding with 403 to Block Crawlers

The second method to block crawlers is to respond with 403 to crawlers. In this method, what we will do is, we will try to detect user-agents of crawlers and block them. Disadvantage of this method is, if useragent is changed, crawler can crawl the content.

You can add the content given below in .htaccess file to block crawlers. If it does not work after adding into the .htaccess file, then you will have to make edits in the virtualhost configuration file of corresponding domain like we did in earlier method.

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(googlebot|bingbot|yahoo|AhrefsBot|Baiduspider|Ezooms|MJ12bot|YandexBot|bot|agent|spider|crawler|extractor).*$ [NC]
RewriteRule .* - [F,L]
</IfModule>

[space]

If it still does not work, then make sure that Rewrite module is enabled. To do that, run command below.

apachectl -M

[space]

If it does not show rewrite_module in the output, then you will have to enable it in order to be able to block. If you don’t know how to enable it then refer to the article, Enable Rewrite Module.

[space]

The above two methods should be substantial to help you block crawlers from your website. However, if you are still having any difficulties then feel free to get in touch with me through the comments section.

Sumit Pore

Leave a Reply Cancel reply

Subscribe to our Newsletter

A key to unlock the world of open-source. We promise not to spam your inbox.

Get Expert Tips to Override WooCommerce Templates Seamlessly

Custom LearnDash Solutions, for You

Custom WooCommerce Solutions, for You

Scale your WordPress Business

Scale your WooCommerce Business

Scale your LearnDash Business

Services Title

Services Subtext

Title

Subtext

The Wisdm Digest delivers all the latest news, and resources from the world of open-source businesses to your inbox.

How to Block Crawlers, Spiders and Bots from Websites

Sumit Pore

HTTP Basic Authentication to Block Crawlers

Create a Password File

Edit Configuration File

Responding with 403 to Block Crawlers

Sumit Pore

Leave a Reply Cancel reply

Subscribe to our Newsletter

Suggested Reads

WordCamp Asia 2026: In Conversation With the Community

What Happens to Restrict Content Pro, MemberDash & Your LearnDash Add-Ons Now: A Clear-Eyed Inventory for $1M+ Founders

WordPress VS Wix VS Weebly: Which one is better?

The 8 Silent Site Failures Only the Decision-Makers Ever Notice (and What Each One Is Costing)

Get Expert Tips to Override WooCommerce Templates Seamlessly

Custom LearnDash Solutions, for You

Custom WooCommerce Solutions, for You

Scale your WordPress Business

Scale your WooCommerce Business

Scale your LearnDash Business

Services Title

Services Subtext

Title

Subtext

Suggested Reads

7 Reasons One-Off WordPress Projects Fail to Scale (And What WordPress Retainer Services Do Differently)

The eLearning Trends a Scaling Course Business Should Act On & the Ones That Are Just Noise

What a $1M+ Site Actually Costs to Run (Not What Your Maintenance Invoice Says)

What to Tell Your Course Customers About the LearnDash Platform Change (and When to Say Nothing at All)

Your Marketing Team Moves Fast—Can Your Website Keep Up? When to Hire Webflow Developer for Marketing-Led Website Work

The Brand Color Palette Decision Most Founders Get Wrong (and How to Make a Choice That Won’t Look Dated in 18 Months)

USA Office

Mumbai Office

Dubai Office

About Us

Our Work

Specialized Services

Connect With Us