|
Author: Mike Valentine Article source: http://www.business-support.co.uk/. Used with author's permission.
There has been endless webmaster speculation and worry about
the so-called "Google Sandbox" - the indexing time delay for
new domain names - rumored to last for at least 45 days from
the date of first "discovery" by Googlebot. This recognized
listing delay came to be called the "Google Sandbox effect."
Ruminations on the algorithmic elements of this sandbox time
delay have ranged widely since the indexing delay was first
noticed in spring of 2004. Some believe it to be an issue of
one single element of good search engine optimization such
as linking campaigns. Link building has been the focus of
most discussion, but others have focused on the possibility
of size of a new site or internal linking structure or just
specific time delays as most relevant algorithmic elements.
Rather than contribute to this speculation and further
muddy the Sandbox, we'll be looking at a case study of a
site on a new domain name, established May 11, 2005 and the
specific site structure, submissions activity, external and
internal linking. We'll see how this plays out in search
engine spider activity vs. indexing dates at the top four
search engines.
Ready? We'll give dates and crawler action in daily lists and
see how this all plays out on this single new site over time.
* May 11, 2005 Basic text on large site posted on newly
purchased domain name and going live by days end. Search
friendly structure implemented with text linking making
full discovery of all content possible by robots. Home
page updated with 10 new text content pages added daily.
Submitted site at Google's "Add URL" submission page.
* May 12 - 14 - No visits by Slurp, MSNbot, Teoma or Google.
(Slurp is Yahoo's spider and Teoma is from Ask Jeeves)
Posted link on WebSite101 to new domain at Publish101.com
* May 15 - Googlebot arrives and eagerly crawls 245 pages
on new domain after looking for, but not finding the
robots.txt file. Oooops! Gotta add that robots.txt file!
* May 16 - Googlebot returns for 5 more pages and stops.
Slurp greedily gobbles 1480 pages and 1892 bad links!
Those bad links were caused by our email masking meant
to keep out bad bots. How ironic slurp likes these.
* May 17 - Slurp finds 1409 more masking links & only 209
new content pages. MSNbot visits for the first time and
asks for robots.txt 75 times during the day, but leaves
when it finds that file missing! Finally get around to
add robots.txt by days end & stop slurp crawling email
masking links and let MSNbot know it's safe to come in!
* May 23 - Teoma spider shows up for the first time and
crawls 93 pages. Site gets slammed by BecomeBot, a spider
that hits a page every 5 to 7 seconds and strains our
resources with 2409 rapid fire requests for pages. Added
BecomeBot to robots.txt exclusion list to keep 'em out.
* May 24 - MSNbot has stopped showing up for a week since
finding the robots.txt file missing. Slurp is showing up
every few hours looking at robots.txt and leaving again
without crawling anything now that it is excluded from
the email masking links. BecomeBot appears to be honoring
the robots.txt exclusion but asks for that file 109 times
during the day. Teoma crawls 139 more pages.
* May 25 - We realize that we need to re-allocate server
resources and database design and this requires changes
to URL's, which means all previously crawled pages are
now bad links! Implement subdomains and wonder what now?
Slurp shows up and finds thousands of new email masking
links as the robots.txt was not moved to new directory
structures. Spiders are getting errors pages upon new
visits. Scampering to put out fires after wide-ranging
changes to site, we miss this for a week. Spider action
is spotty for 10 days until we fix robots.txt
* June 4 - Teoma returns and crawls 590 pages! No others.
* June 5 - Teoma returns and crawls 1902 pages! No others.
* June 6 - Teoma returns and crawls 290 pages. No others.
* June 7 - Teoma returns and crawls 471 pages. No others.
* June 8-14 Odd spider behavior, looking at robots.txt only.
* June 15 - Slurp gets thirsty, gulps 1396 pages! No others.
* June 16 - Slurp still thirsty, gulps 1379 pages! No others.
So we'll take a break here at the 5 weeks point and take note
of the very different behavior of the top crawlers. Googlebot
visits once and looks at a substantial number of pages but
doesn't return for over a month. Slurp finds bad links and
seems addicted to them as it stops crawling good pages until
it is told to lay off the bad liquor, er that is links by
getting robots.txt to slap slurp to its senses. MSNbot visits
looking for that robots.txt and won't crawl any pages until
told what NOT to do by the robots.txt file. Teoma just crawls
like crazy, takes breaks, then comes back for more.
This behavior may imitate the differing personalities of the
software engineers who designed them. Teoma is tenacious and
hard working. MSNbot is timid and needs instruction and some
reassurance it is doing the right thing, picks up pages slowly
and carefully. Slurp has addictive personality and performs
erratically on a random schedule. Googlebot takes a good long
look and leaves. Who knows whether it will be back and when.
Now let's look at indexing by each engine. As of this writing
on July 7, each engine also shows differing indexing behavior
as well. Google shows no pages indexed although it crawled
250 pages nearly two months ago. Yahoo has three pages indexed
in a clear aging routine that doesn't list any of the nearly
8,000 pages it has crawled to date (not all itemized above.)
MSN has 187 pages indexed while crawling fewer pages than
any of the others. Ask Jeeves has crawled more pages to date
than any search engine, yet has not indexed a single page.
Each of the engines will show the number of pages indexed if
you use the query operator "site:publish101.com" without the
quotes. MSN 187 pages, Ask none, Yahoo 3 pages, Google none.
The daily activity not listed in the three weeks since June 16
above has not varied dramatically, with Teoma crawling a bit
more than other engines, Slurp erratically up and down and
MSN slowly gathering 30 to 50 pages daily. Google is absent.
Linking campaign has been minimal with posts to discussion
lists, a couple of articles and some blog activity. Looking
back over this time it is apparent that a listing delay is
actually quite sensible from the view of the search engines.
Our site restructuring and bobbled robots.txt implementation
seems to have abruptly stalled crawling but the indexing
behavior of each engine displays distinctly differing policy
by each major player.
The sandbox is apparently not just Google's playground, but
it is certainly tiresome after nearly two months. I think I'd
like to leave for home, have some lunch and take a nap now.
Back to class before we leave for the day kiddies. What did
we learn today? Watch early crawler activity and be certain
to implement robots.txt early and adjust often for bad bots.
Oh yes, and the sandbox belongs to all search engines. Mike Banks Valentine is a search engine optimization specialist
who operates http://WebSite101.com and
will continue reports of
case study chronicling search indexing of http://Publish101.com
Hiring Tips for Business Owners Have you ever been frustrated by someone because he/she is not doing the job you tell him/her to do? You tell this and they do totally the opposite of what you ...
How Small Business Benefits From Supporting Labor Unions Most people think of labor and management as adversaries, but it really shouldn't be that way. Instead they should be working as partners to accomplish mutual ...
Reading Labels:Understanding Fiber, Fat, Carbohydrates, Etc How to understand the labels on todays foods can be complicated, this article should help to you understand the labels better
Dreams Keep Us Going Dreams are what keep us going when the going gets tough. But we must have an actual defined dream to follow. We need to know where we are going.
7 Strategies for Handling Last Minute Meetings Have you ever found yourself having to scramble to organize a meeting at the last minute? Wouldn't it be nice that if and when this daunting situation arose, yo...
Prevent Foot Problems When Walking According to a NSGA Survey, 71 million American adults are
exercise walkers, making walking the top sport in the United
States. Unfortunately, many sedentary ...
Weight Loss for Good: The Cold Hard Truth! Weight loss for good is about deep motivated and desires, determining your true beliefs behind the behaviors, and developing a great support system.
How To Get Your Pet To Strike The Pose: Tips For Photographing Your Pet If you have ever tried to take a picture of your dog, you will realize how hard it is to capture the pups full personality and beauty in a photograph. Dogs are...
MLM and the Internet, What A Perfect Match! Are you making use of the most powerful form of leverage to build your business? Internet allows you to do a lot with very little time and effort. There is no b...
The Most Important Things to Know When Choosing a Treadmill Do you know which treadmill would best suit your needs. You need to consider size, durability, space, and also your goals before you purchase. This article will...
The Taffy Pull (A Story and a Recipe) One year when I was growing up on our Wisconsin dairy farm, the Brownie leaders had announced we were going to make some extra-special candy at our next meeting...
Retaining An Expert -- What Every Business Owner Needs To Know Hiring an expert or consultant can be an excellent way to turbo-charge your business. This article gives valuable advice on choosing an expert, including six c...
The Intricacies of a Compound Microscope A compound microscope uses light to illuminate the sample or object so that you can see it with your eye. It has two lenses that are used in combination to gi...
Making Money with Credit Cards Instead of Spending Money Is it possible to actually make money with credit cards instead of just using them to spend money? Well, of course for the large credit card companies and
bank...
Little Known Tips To Wipe Out Day Trading Losses Guaranteed Studies have shown that you should never risk more than 2% of your float on any trade. Why 2%?
In Leadership, The Critical Convergence Drives Great Results Leaders can achieve more results if they create an environment in which people are ardently committed to the leaders' cause. A key factor in creating this envi...
MLM - Sure Thing or Scam? Multi-level Marketing has been praised for allowing financial freedom; and it's been criticized for being a big scam. As with any business venture, it's buyer b...
How to Do a Simple SPLIT TEST and Improve Your Marketing, Guaranteed One great thing about marketing on the Internet is it's all very trackable, if you do it right. Learn how to do a SPLIT TEST so you can quickly and easily figu...
Home Mortgage Refinancing - Should I Refinance? It is better to refinance if you can get an interest rate at least two percentage points lower than what you are currently paying.
Heros Journey and Creative Structure One quite significant benefit of the Hero's Journey is that it provides the screenwriter with a formidable structure.
Pay Per Click Advertising Can Be Very Risky Business It's probably safe to say that pay per click advertisers are going to have to accept a certain level of click fraud as just a cost of doing business.
Golf Swing Simulators The wonderful thing about a full swing indoor simulator is that it is not weather dependent and you can play year-round
The Leadership Talk As A Living Hologram A growing number of research scientists are persuaded that the universe is not made of separate things but on a deep level is a single entity. This view is cal...
Health Savings Accounts (HSA): Do I Still Submit My Bills to Insurance When Using HSA Money? The Health Savings Account (HSA) is proving to be an excellent tool to help Americans reduce the cost of health insurance. This article examines whether or not...
|