nsForum logo

Welcome Guest ( Log In )

 
Reply to this topicStart new topic
> Avoiding Duplicate Content Issues
gumball
post Feb 11 2009, 01:43 PM
Post #1





Group: Verified NS Member
Posts: 37
Joined: 14-February 08
Member No.: 658



1. We are on 7.4 and have a custom HTML page for our home page.

Entering http://www.EXAMPLE.com or https://www.EXAMPLE.com, or http://EXAMPLE.com on the address line serves up our home page but according to the search engines 3 pages exist with the same content which can get us banned for spam.

We spoke with the top "SEO guy" at NetSol and he was able to redirect the https://www.EXAMPLE.com to http://www.EXAMPLE.com using a 200 redirect (we are waiting to hear back about http://EXAMPLE.com)... but with a 200 on both pages, the search engine wants to index both the start page and the target page - this also is a known spam method.

Our closest competitor is on the Yahoo platform and trying this for their site yields "Page not found" for all but http://www.EXAMPLE.com. We think this or a 301 would be a preferred method and would like to see a solution.

Unless NS can fix this, we suggest you avoid using the custom HTML for your home page (post date 02/10/09)

2. When you load an item and visit your shopping cart, assuming you have SSL, you are in secure mode (HTTPS). If you decide to go back to the main site to add another product, you transition from secure (HTTPS ) to non-secure (HTTP). This transition uses temporary 302 redirect and is also tells the SE's that there are 2 pages with the same content (spam trigger). What is needed is a 301 permanently moved redirect.

We called NS support about this numerous times and were told that there was not fix for the Custom HTML page and that #2 is a non-issues because the SE's are prevented from crawling cart.aspx due to robots.txt.

We are not comfortable with these explanations and would like to see a fix. Posting this today to ensure that NS support/development teams are aware.
Go to the top of the page
 
+Quote Post
agkits
post Feb 11 2009, 02:25 PM
Post #2





Group: Verified NS Member
Posts: 734
Joined: 26-October 07
From: Syracuse NY
Member No.: 193



A 301 Redirect would be the best option. Number 2 is a non issue. You are disallowing cart.aspx in your robots.txt file so the SE's cannot crawl it.
Go to the top of the page
 
+Quote Post
nsAaron
post Feb 11 2009, 03:14 PM
Post #3





Group: Administrators
Posts: 307
Joined: 29-August 07
Member No.: 40



QUOTE (gumball @ Feb 11 2009, 01:01 PM) *
1. We are on 7.4 and have a custom HTML page for our home page.

Entering http://www.EXAMPLE.com or https://www.EXAMPLE.com, or http://EXAMPLE.com on the address line serves up our home page but according to the search engines 3 pages exist with the same content which can get us banned for spam.

We spoke with the top "SEO guy" at NetSol and he was able to redirect the https://www.EXAMPLE.com to http://www.EXAMPLE.com using a 200 redirect (we are waiting to hear back about http://EXAMPLE.com)... but with a 200 on both pages, the search engine wants to index both the start page and the target page - this also is a known spam method.

Our closest competitor is on the Yahoo platform and trying this for their site yields "Page not found" for all but http://www.EXAMPLE.com. We think this or a 301 would be a preferred method and would like to see a solution.

Unless NS can fix this, we suggest you avoid using the custom HTML for your home page (post date 02/10/09)

2. When you load an item and visit your shopping cart, assuming you have SSL, you are in secure mode (HTTPS). If you decide to go back to the main site to add another product, you transition from secure (HTTPS ) to non-secure (HTTP). This transition uses temporary 302 redirect and is also tells the SE's that there are 2 pages with the same content (spam trigger). What is needed is a 301 permanently moved redirect.

We called NS support about this numerous times and were told that there was not fix for the Custom HTML page and that #2 is a non-issues because the SE's are prevented from crawling cart.aspx due to robots.txt.

We are not comfortable with these explanations and would like to see a fix. Posting this today to ensure that NS support/development teams are aware.


Hello gumball,

When you place pages on the site outside of the cart (ex. index.html page), you step outside security and SEO blanket in which the cart was designed.

1.) The first issue you see with your homepage and variations of https and www could actually extend beyond what you've listed. It can include the trailing /index.html as well giving you 6 different ways to have your homepage indexed. This issue is commonly referred to as canonicalization. The cart by design eliminates 4 variations to minimize potential impacts. We allow for 2 variations to ensure our customers still have the flexibility the require. It is up to the end user to use each variation as appropriate.

To appease your concerns, the search engines won't treat this as "spam" but rather as a canonicalization issue. You won't get banned for canonicalization issue of this type. It may diminish your ability to effectively rank for the right page as it causes confusion in the search engines, but it won't get you banned.

If you choose to keep using the custom index.html page you can create a Webmaster account at Google and dictate what version of your homepage you want to be your homepage. How To from Google: http://googlewebmastercentral.blogspot.com...red-domain.html.

The only real viable option you have to prevent: http://example.com form occurring is to ensure that every link going to that page includes the www. However you won't have control over anyone linking to your website without the www.

The preferred method to address all of your issues it to utilize the homepage included in the cart (index.aspx) and utilize the built in CSS flexibility to hide the portions of the site you don't want showing up. By utilizing the "display: none;" attribute on the homepage you can effectively hide any portion of the homepage you don't want displayed, therefore float anything you want there (your index.html content).


2.) Depending how you have your cart configured. Once you add a product you can reload the page, reload the page with an alert, go to cart details, go to checkout, or go to the homepage. Depending on this setting you may or may not enter an https area of the site.

As you mentioned, explicitly instruct search engine spiders to not crawl, index or follow any pages that would have you enter an https area via the robots.txt file. As a second level of instruction we've also added to any https page instructs the spiders to not index the page.

While the cart may be utilizing a 302 to transition a user from a https version to a http version of your site, search engine spiders have been instructed to not even get to that page, therefore preventing the issue.

It sounds like much of your pain is coming from using the index.html page for your homepage, therefore preventing the cart from blocking all of the pages you don't want indexed. I'd strongly suggest utilizing the index.aspx page with the "display: none" and JavaScript option for the containers you don't want to see displayed. It will reduce if not eliminate all the issues you listed.

Let me know if we can further assist.
-Aaron Eversgerd
Go to the top of the page
 
+Quote Post
gumball
post Feb 11 2009, 06:49 PM
Post #4





Group: Verified NS Member
Posts: 37
Joined: 14-February 08
Member No.: 658



QUOTE (OLM.Aaron @ Feb 11 2009, 02:32 PM) *
Hello gumball,

When you place pages on the site outside of the cart (ex. index.html page), you step outside security and SEO blanket in which the cart was designed.

1.) The first issue you see with your homepage and variations of https and www could actually extend beyond what you've listed. It can include the trailing /index.html as well giving you 6 different ways to have your homepage indexed. This issue is commonly referred to as canonicalization. The cart by design eliminates 4 variations to minimize potential impacts. We allow for 2 variations to ensure our customers still have the flexibility the require. It is up to the end user to use each variation as appropriate.

To appease your concerns, the search engines won't treat this as "spam" but rather as a canonicalization issue. You won't get banned for canonicalization issue of this type. It may diminish your ability to effectively rank for the right page as it causes confusion in the search engines, but it won't get you banned.

If you choose to keep using the custom index.html page you can create a Webmaster account at Google and dictate what version of your homepage you want to be your homepage. How To from Google: http://googlewebmastercentral.blogspot.com...red-domain.html.

The only real viable option you have to prevent: http://example.com form occurring is to ensure that every link going to that page includes the www. However you won't have control over anyone linking to your website without the www.

The preferred method to address all of your issues it to utilize the homepage included in the cart (index.aspx) and utilize the built in CSS flexibility to hide the portions of the site you don't want showing up. By utilizing the "display: none;" attribute on the homepage you can effectively hide any portion of the homepage you don't want displayed, therefore float anything you want there (your index.html content).


2.) Depending how you have your cart configured. Once you add a product you can reload the page, reload the page with an alert, go to cart details, go to checkout, or go to the homepage. Depending on this setting you may or may not enter an https area of the site.

As you mentioned, explicitly instruct search engine spiders to not crawl, index or follow any pages that would have you enter an https area via the robots.txt file. As a second level of instruction we've also added to any https page instructs the spiders to not index the page.

While the cart may be utilizing a 302 to transition a user from a https version to a http version of your site, search engine spiders have been instructed to not even get to that page, therefore preventing the issue.

It sounds like much of your pain is coming from using the index.html page for your homepage, therefore preventing the cart from blocking all of the pages you don't want indexed. I'd strongly suggest utilizing the index.aspx page with the "display: none" and JavaScript option for the containers you don't want to see displayed. It will reduce if not eliminate all the issues you listed.

Let me know if we can further assist.
-Aaron Eversgerd



Aaron,

Hmm...perplexed as to why NS changed the title of my posting? Maybe this version is more relevant.

1. I was not aware that we could customized index.aspx page only and will look into this. Can you tell me where I might find more info about customizing it?

Network Solutions should discourage its customers from making and using custom index.htm pages (see why below).

2. Assuming our site was NEVER indexed using HTTPS, you're correct, it's a non-issue. Our site was indexed using HTTPS by MSN... their cache shows 2 versions of our home page - so I am very concerned this issue has caused problems with our search engine rankings. We've worked hard on SEO in the past year and the results don't seem to show. For some keywords our Google rankings have dropped significantly, from first page, after we switched from a non-optimized site with little content to the NS platform with loads of new content and relevant links.

I believe it is NS's responsibility to do the job correctly for it's customers. Using 302 for the purpose of directing users from the cart to the main site is clearly not correct... and stating that the problem couldn't happen assumes far too much about the community of NS websites. We're simply asking NS to follow what the broad SEO community says is right. Why cant NS change the redirect from 302 to 301?
Go to the top of the page
 
+Quote Post
ArcoJedi
post Feb 12 2009, 09:58 AM
Post #5


Jedi Master


Group: Verified NS Member
Posts: 1,142
Joined: 10-August 07
From: Galaxy Far, Far Away...
Member No.: 13



QUOTE (gumball @ Feb 11 2009, 06:07 PM) *
I believe it is NS's responsibility to do the job correctly for it's customers. Using 302 for the purpose of directing users from the cart to the main site is clearly not correct... and stating that the problem couldn't happen assumes far too much about the community of NS websites. We're simply asking NS to follow what the broad SEO community says is right. Why cant NS change the redirect from 302 to 301?
This has already been addressed in a post above but you may have missed it:
QUOTE (agkits @ Feb 11 2009, 01:43 PM) *
A 301 Redirect would be the best option. Number 2 is a non issue. You are disallowing cart.aspx in your robots.txt file so the SE's cannot crawl it.
Our SEO and online marketing experts have reviewed the cart software in depth and we are already doing a great job for our clients when it comes to the built-in optimize-capable features. You mentioned the broader SEO community, but it's good to keep in mind that SEO is an art and a science (and not an exact science).

I've read that 301 redirects are better than 302 redirects in most cases for pages that you want indexed, but I fail to see why it will help on the /cart.aspx page which is restricted in /robots.txt. I'm not the top-most expert here though, so if you could help point me to a relevant article or reference that could help our development team, I'd appreciate it. Thanks for your feedback.
Go to the top of the page
 
+Quote Post
AndyT - MC
post Feb 12 2009, 10:37 AM
Post #6





Group: Verified NS Member
Posts: 979
Joined: 22-October 07
From: St. Louis, MO
Member No.: 170



Most people who use an index.html page block the index.aspx page in robots.txt.

I am absolutely perplexed as to why your cart.aspx page is indexed, because there are entries in both robots.txt and in meta tags on the page that are supposed to prevent spiders from indexing the page. It seems Google is not playing by their own rules.
Go to the top of the page
 
+Quote Post
nsAaron
post Feb 12 2009, 01:41 PM
Post #7





Group: Administrators
Posts: 307
Joined: 29-August 07
Member No.: 40



QUOTE (gumball @ Feb 11 2009, 06:07 PM) *
Aaron,

Hmm...perplexed as to why NS changed the title of my posting? Maybe this version is more relevant.

1. I was not aware that we could customized index.aspx page only and will look into this. Can you tell me where I might find more info about customizing it?

Network Solutions should discourage its customers from making and using custom index.htm pages (see why below).

2. Assuming our site was NEVER indexed using HTTPS, you're correct, it's a non-issue. Our site was indexed using HTTPS by MSN... their cache shows 2 versions of our home page - so I am very concerned this issue has caused problems with our search engine rankings. We've worked hard on SEO in the past year and the results don't seem to show. For some keywords our Google rankings have dropped significantly, from first page, after we switched from a non-optimized site with little content to the NS platform with loads of new content and relevant links.

I believe it is NS's responsibility to do the job correctly for it's customers. Using 302 for the purpose of directing users from the cart to the main site is clearly not correct... and stating that the problem couldn't happen assumes far too much about the community of NS websites. We're simply asking NS to follow what the broad SEO community says is right. Why cant NS change the redirect from 302 to 301?



gumball,

You have free reign over the containers of the cart by manipulating the .page-header, .page-footer, .page-column-left, .page-column-right CSS attributes (there are quite a few others, but you need to sift through the source code to find exactly what you are looking for on the site. Just look for the "class" names).

Something like <style>.page-column-left {display: none;}</style> in the HTML at the top of your homepage will prevent it from displaying the left column all together. Please note that it will still be in the source code, therefore a spider will still be able to see it. There are a couple of groups of thought about potential implications from taking this approach, when suppressing text via CSS/AJAX/JavaScript/etc. Sometimes the search engines are ok with it, other times they scoff at the thought you could be doing something potentially malicious.

With that said, you do have the ability to manipulate the CSS as you see fit. So rather than hide a column all together, you might consider adjusting its position, styling, etc on the page. This way you don't eliminate one SEO issue and open another one.

You can see an example of both here: http://www.southsidesportsstl.com/testpage.aspx. I suppressed the left column, and modified the text in the footer to be big and red.


I hope that helps put you down the correct path,
-Aaron
Go to the top of the page
 
+Quote Post
gumball
post Feb 12 2009, 04:18 PM
Post #8





Group: Verified NS Member
Posts: 37
Joined: 14-February 08
Member No.: 658



QUOTE (AndyT - MC @ Feb 12 2009, 09:55 AM) *
I am absolutely perplexed as to why your cart.aspx page is indexed, because there are entries in both robots.txt and in meta tags on the page that are supposed to prevent spiders from indexing the page. It seems Google is not playing by their own rules.



To my knowledge Google does not have cart.aspx cached. What we have found is the two versions of our home page indexed in MSN... if this is possible with MSN our concern is that its possible with the other SERPs.

A realistic scenario is a site links to us using HTTPS and Google follows that link and finds a 302 redirect... going forward we could have two versions in Google. There is much literature available about the proper use of redirects and the NS platform should conform to these standards.

No one has addressed why the technology cant handle a 301 vs. 302.
Go to the top of the page
 
+Quote Post
AndyT - MC
post Feb 12 2009, 04:42 PM
Post #9





Group: Verified NS Member
Posts: 979
Joined: 22-October 07
From: St. Louis, MO
Member No.: 170



It isn't that it can't handle it, it just wasn't designed to do so. Pages that are required to be secure have 2 layers of blocking, so they should never be indexed. All other pages are content pages, and as such, are not expected to be linked to using https.

We were only just recently made aware of this, so it will likely be addressed in an upcoming release.
Go to the top of the page
 
+Quote Post
gumball
post Feb 27 2009, 12:57 AM
Post #10





Group: Verified NS Member
Posts: 37
Joined: 14-February 08
Member No.: 658



QUOTE (AndyT - MC @ Feb 12 2009, 04:00 PM) *
It isn't that it can't handle it, it just wasn't designed to do so. Pages that are required to be secure have 2 layers of blocking, so they should never be indexed. All other pages are content pages, and as such, are not expected to be linked to using https.

We were only just recently made aware of this, so it will likely be addressed in an upcoming release.


Matt Cutts addressed a way to handle this problem in a post on his blog on 2/25/09...

<link rel="canonical" href="http:// example.com/page.aspx"/>

See his video for details: http://www.mattcutts.com/blog/canonical-link-tag-video/
Go to the top of the page
 
+Quote Post
AndyT - MC
post Mar 9 2009, 01:51 PM
Post #11





Group: Verified NS Member
Posts: 979
Joined: 22-October 07
From: St. Louis, MO
Member No.: 170



QUOTE (gumball @ Feb 27 2009, 01:15 AM) *
Matt Cutts addressed a way to handle this problem in a post on his blog on 2/25/09...

<link rel="canonical" href="http:// example.com/page.aspx"/>

See his video for details: http://www.mattcutts.com/blog/canonical-link-tag-video/

Just to note, we added the canonical <link> element in the 7.5 release. This should alleviate concerns over search engines inadvertently indexing https pages.
Go to the top of the page
 
+Quote Post

Reply to this topicStart new topic
Tags
No Tag inserted yet

1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 

RSS Lo-Fi Version    Network Solutions © 2011 Time is now: 16th May 2012 - 02:37 PM
Domain Names | Web Hosting | Web Design | Shopping Cart Software | Online Marketing | SSL Certificates