nsForum logo

Welcome Guest ( Log In )

 
Reply to this topicStart new topic
> Robots File, Dissallowing certain files
Tigger
post Jun 19 2009, 08:20 PM
Post #1





Group: Verified NS Member
Posts: 266
Joined: 9-August 08
From: Philadelphia, Pa
Member No.: 1,843



Hi,

I notice that the GsiteCrawler is picking up the /login.aspx?review&product=210 (number on the end is different for each product)

Is there a way to put a dissallow in the robots.txt so that they won't be craweled?
Go to the top of the page
 
+Quote Post
agkits
post Jun 19 2009, 10:44 PM
Post #2





Group: Verified NS Member
Posts: 666
Joined: 26-October 07
From: Syracuse NY
Member No.: 193



I would think Disallow: /login.aspx would do the trick... but you already have it in your robots.
Go to the top of the page
 
+Quote Post
Tigger
post Jun 20 2009, 09:26 AM
Post #3





Group: Verified NS Member
Posts: 266
Joined: 9-August 08
From: Philadelphia, Pa
Member No.: 1,843



QUOTE (agkits @ Jun 19 2009, 11:54 PM) *
I would think Disallow: /login.aspx would do the trick... but you already have it in your robots.


Yep, and it's not doing the trick lol.
Go to the top of the page
 
+Quote Post
ddavisNS
post Jun 20 2009, 01:33 PM
Post #4


QA


Group: Administrators
Posts: 1,751
Joined: 10-August 07
Member No.: 6



If it is in your robots.txt that just means when your page is spidered the spider won't navigate to that page. It won't prevent the spider from picking up links TO that page, just links FROM the contents of that page.

gsitecrawler has some sort of filtering sytem that allows you to ignore or filter out certain URLs. You should be able to set it to ignore urls containing login.aspx.
Go to the top of the page
 
+Quote Post
Tigger
post Jun 20 2009, 02:12 PM
Post #5





Group: Verified NS Member
Posts: 266
Joined: 9-August 08
From: Philadelphia, Pa
Member No.: 1,843



QUOTE (ddavisNS @ Jun 20 2009, 02:43 PM) *
If it is in your robots.txt that just means when your page is spidered the spider won't navigate to that page. It won't prevent the spider from picking up links TO that page, just links FROM the contents of that page.

gsitecrawler has some sort of filtering sytem that allows you to ignore or filter out certain URLs. You should be able to set it to ignore urls containing login.aspx.


Thanks, I'll look into it.
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic
Tags
No Tag inserted yet

1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 

RSS Lo-Fi Version    Network Solutions © 2009 Time is now: 21st November 2009 - 09:47 PM
Domain Names | Web Hosting | Web Design | Shopping Cart Software | Online Marketing | SSL Certificates