Become a fan
Twitter
Home

How to remove a URL(s) from Google

Author Joseph Stenhouse on December 19, 2010 | | |

-Removing a URL(s) from a search engine index.

-How to prevent incorrect URL crawls.

This is a topic that is discussed a lot throughout many forums on the WWW but rarely is the advice given very accurate. Today we will clear a lot of the confusion about the best way to go about removing your URL(s) from Google or other search engines, specifically Google. We are going to go over the most solid ways to do it, and how to prevent incorrect URL crawls.

Why would you want to remove a URL from Google? Let's give you a couple practical examples of why:

1) Say your online forum has been spammed by Ukrainian porn bots and it shows pornographic advertisements as far as the eye can see.
This happens "all of the time" and will never go away therefore it is a good example. Considering that you really do not know WHEN Google is going to crawl your directory's again it's a good idea to "re-direct" the spiders until the problem can be fixed.

2) (Preventative Measure)  You are developing a new website and it is not quite ready to be published to the WWW yet. This is a good reason to make sure that you set preventative measures that will disallow Google to crawl your website until you a ready. It is not good enough to just stick everything in a sub directory and expect Google to not find it, because it will.

3) By mistake you have allowed various URL'S from your website to be indexed when they should have been a rel="no follow"
Here is an example of a URL that leads to a page to be printed.  http://www.involutionmedia.com/kbase/index.php/article/printer/involution  Go ahead and click on it, you will see the problem.

BEST PREVENTATIVE MEASURES

First, let’s list some preventative measures to ensure that you will never have to actually go through the process of removing a URL.

1).htaccess (password protect)

This method is probably the most robust method used if you want to be 100% sure that ANY AND ALL search engine robots do not visit a specified directory. Clearly, if a robot cannot get to your files then it will not cache anything.  If you do not have an .htaccess file on your web server then all you have to do is make one from a text file and save it as .htaccess.

In your .htaccess file you can password protect a specified directory. No robot is going to try and guess a password. Rather, it will leave. When you get your problem fixed then simply remove the rule from the .htaccess file.

The actual .htacces file might look something like this:

AuthUserFile /home/dave/.htpasswd

AuthName "Dave's Login Area"

AuthType Basic require user dave

 

And the .htpassword file might look like this:

username:encryptedpassword

dave:XO5UAT7ceqPvc

 

 

2) robots.txt

Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.

For quite some time this has been an industry standard in guiding search engine robots to a specified location in either allowing or disallowing access. Before a robot does anything it always finds the robots.txt file first.

User-agent: *

Disallow: /bad_directory/

The "User-agent: *" means this section applies to all robots. The " Disallow: /bad_directory/

" tells the robot that it should not visit the directory bad_directory.

If you want to disallow your entire website then just use a single / like this:

Disallow: /

There are two important considerations when using /robots.txt:

  • robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  • the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.

This method is not full proof but normally works great with most robots.

 

TWO METHODS TO REMOVE A URL(s) from a search engine’s index.

1)  <meta type="robots" content="noindex,nofollow" />


By adding this between the <head></head> tags, it will tell robots to go away making it impossible for any data to be indexed.  Some say that this is not totally reliable but my personal experience as been very good. I have always had 100% success in telling a robot to go away with this method.

2) Google Web Master Tools (URL REMOVAL TOOL)

With  the  Google Web Master  Tools  you can use the URL removal tool to remove a single URL. This method has been proven to work very well.

3) rel=”nofollow”

"Nofollow" provides a way for webmasters to tell search engines "Don't follow links on this page" or "Don't follow this specific link."

Originally, the nofollow attribute appeared in the page-level meta tag, and instructed search engines not to follow (i.e., crawl) any outgoing links on the page. For example:

 <meta name="robots" content="nofollow" />

Before nofollow was used on individual links, preventing robots from following individual links on a page required a great deal of effort (for example, redirecting the link to a URL blocked in robots.txt). That's why the nofollow attribute value of the rel attribute was created. This gives webmasters more granular control: instead of telling search engines and bots not to follow any links on the page, it lets you easily instruct robots not to crawl a specific link. For example:

 <a href="signin.php" rel="nofollow">sign in</a>
 
The use of this is clear, you can specify individual links instead of an entire page.
 
NOTE: YOU CAN ALSO USE THE ROBOTS.TXT FILE TO DISALLOW INDIVIDUAL LINKS, NOT JUST DIRECTORYS.
 
Below is a link to remove a URL via YAHOO REMOVAL TOOL
LINK

 

Was this article helpful?

Yes No

Category: Search Engine Optimization (SEO)

Last updated on December 19, 2010 with 11022 views