The SEMpai SEM Blog

My Site Is Not Listed In Google. Help!

May 27th, 2008

One of the most common questions webmasters ask when they release a new site is “how do I get to number one on Google?”, but what many find is that they are not even listed at all. This tutorial will take you through a number of key checks that you can do to help you get your site back on track.

Are you really not listed?

Many people think that they are not listed in Google, but are really just not showing very prominently in the search results. The first check is to see if Google has your site in its index at all.

To do this, go to Google and enter the search query site:yoursitename.com.

If you see results like this:

Google Site Search - no results

… then your site is definitely not listed in Google, and you need to look through the issues below. However if you see results like this:

Google Site Search - results found

… then your site has been picked up by Google, and you are just not showing well for the terms that you are interested in. In this case, you need to have a close look at your site content, but the problem is probably not one of the ones on this page.

Time Factors

The first thing to consider is the time that your site has been live. If your site has only been running for a couple of days, then wait for a week or so and then check back later. Google isn’t an instant directory of the web - it takes time to discover pages.

Backlinks

Has your site been running for a few weeks? Then the next thing to look at is backlinks.

Links are the lifeblood of the modern internet. Google doesn’t just know that a site exists - it needs to find it, and to find it, it needs people to link to it from sites it does know.

When you launch your site you need to work on getting links back from around the web. You could try the following basic methods:

  • Directory Sites - There are a number of directories around the web. Find ones that are relevant to your site, and submit your site to them.
  • Social Bookmarks - If your site content is relevant, submit it to social bookmarking sites like Digg, Reddit or StumbleUpon. Also look for niche vertical social bookmarking sites that could be more relevant for your site
  • Other Webmasters - Got other sites, or has your friend got a relevant site? Ask them for a link.

Once you’ve got some links, give Google some time to discover them and start adding your site. This could take up to a week.

There is no quick and reliable way to check if you have backlinks registered in Google (the functionality that they do have is pretty broken), but Yahoo can help show you any backlinks that they have discovered. Go to Yahoo search and enter the query link:yoursitename.com - it will show you all pages linking back to your site.

Note that Yahoo, like Google, does take time to discover links, so give your link building efforts a week or so before trying the query above.

Robots.txt

The robots.txt file is a file that may be hosted on your site at yoursitename.com/robots.txt. Not all sites have one. If you get a “page not found” error when you try visiting that link then you don’t need to worry - just skip to the next step.

If you do have a robots.txt then you need to look closely at what it says. If, for example, it says this:

User-agent: *
Disallow: /

… then Google and all other search engines are being completely banned from indexing your site. Removing the robots.txt file, or replacing it with:

User-agent: *
Allow: /

… will allow Google to start indexing your site. Make sure you get the permission of the site administrator before you do this! The file could be there for a good reason.

More information on the robots.txt file can be found at Robotstxt.org. If you have values in your robots.txt file that you don’t understand then their resources should help.

META robots

You can also ban Google from indexing your pages by using a special bit of HTML code called a META tag. Go to your site, view the source code (right click on an empty part of the page, click View Source) and look for either of the following lines in your HTML code:

<meta name="robots" content="noindex,nofollow"> or
<meta name="robots" content="noindex">

If they exist then these lines of code are telling Google not to index your pages. Removing them will fix your problem.

Header Response Codes

Header response codes are technical messages that your server sends to browsers and search engines. If the wrong messages are being sent then Google will not index your pages.

To check your site, go to our free HTTP response checker, enter your URL and press Go. Have a look at the first line of the results.

Does it say something like this?

HTTP/1.0 2.0 OK

If it does, then you’re ok. If it says something like this:

HTTP/1.0 404 Not Found

Then you have a problem - your server is returning the wrong information to Google about your pages. It is saying that they don’t actually exist. You need to contact your technical resource and ensure that your pages are returning a 200 response.

More information on understanding response codes can be found in our post HTTP Response Codes and SEO - An Introduction.

Load Times and Site Reliability

How long does it take for your site to load? Is it down a lot of the time? If you site has serious speed or reliability issues then search engines will be reluctant or unable to index it.

Try signing up for a site monitoring service like the free one at SiteUptime and watching it over a period of time. If your site is down regularly then you need to contact your web host for help, or change to a different hosting service.

Invalid HTML

Very few sites have fully valid HTML code, but serious issues with your HTML can mean that the search engines cannot read your content. Try the HTML validator at W3.org and look for serious HTML errors that could be preventing your site from working.

Google Webmaster Tools

Have you tried all of the above and are still not being indexed? How about asking Google themselves for some extra information. Go to Google Webmaster Tools and register your site. Give Google a day or so to gather information about your site, and then check out the various reports available:

  • The first page, “Overview”, will give you some basic index information:
    Webmaster Tools index stats
  • “Diagnostics > Web Crawl” will tell you if you have problems on your site like missing pages, unreachable pages or pages restricted by robots.txt
  • “Tools > Analyze robots.txt” will tell you how robots.txt file is affecting Google’s indexes
  • The “Removed Content” tab on “Tools > Remove Content” will tell you if a removal request has been submitted to Google for the site

Look for spammy activity

Google has some strict rules to block spam. If none of the above has worked then it could mean that your site has been banned. Check out the Google Webmaster Guidelines and make sure that you’re not contravening any of the rules listed there. Examples of common issues include:

  • Keyword stuffing - Filling your site with keywords to help your search rankings. These days, this usually affects your listings in a negative way. Instead write for people, and people alone.
  • Cloaking - Hiding content from the search engines by detecting who they are and showing them different content.

If you have been contravening any of Google’s guidelines, fix the issues and then follow these instructions to request reconsideration.

Bad Neighbourhoods

Have you inherited your URL from someone else, or bought a previously used URL? If so, the old site could have been spammy and been banned, or could have a large number of links back to it from “bad neighbourhoods”. Try following the site reconsideration instructions to get your site re-evaluated, and see if you can get in touch with the people linking back to your site to get the links removed.

In the worst case scenario, it may be necessary to get a new domain that doesn’t have such negative history.

Conclusion

As seen, there are many reasons for a site not to appear in Google, and there are even more beyond the scope of this blog. Hopefully the points above have helped you on your quest to get your site listed in Google. If not, then there will be a number of experts willing and eager to help at forums like WebmasterWorld.

Good luck!

HTTP Response Codes and SEO: An Introduction

May 26th, 2008

Sometimes the most technical aspects of SEO are the most important, because they can dictate whether or not your pages end up in the search engine indexes at all. It doesn’t matter how well your write your copy or optimise your pages - if you can’t be indexed, you can’t be found.

HTTP headers are one such topic - they can be hard to understand for the non-technical SEO, but can completely decide the fate of your site in the SERPs.

An Introduction to HTTP Requests

When you point your browser in the direction of a website, the first thing it does is send a request to that website. This request details exactly what data it wants, in what format it will accept a response and generally, who it is. The request may look like this:

GET /sem-blog/ HTTP/1.1
Accept:*/*
Accept-Language: en-gb
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b1) Gecko/2007110703 Firefox/3.0b1
Host: www.thesempai.com
Connection: Keep-Alive

This request is saying: “I am the Firefox web browser v3.0b1. I want the page called /sem-blog/ from your site www.thesempai.com. I would prefer it if you return it in British English, and if you want to compress it and make the file size smaller, that’s fine be me.”

The HTTP Response

The web server receives the request from the browser, and forms a response. This response is broken down into two parts - the header and the content.

The header tells the browser a bit about the web server, and a bit about what it thinks of the request. The content is the HTML code that makes up a web page.

The header of the response may look like this:

HTTP/1.1 200 OK
Date: Mon, 26 May 2008 18:24:21 GMT
Server: Apache/1.3.34 Ben-SSL/1.55
X-Pingback: http://www.thesempai.com/sem-blog/xmlrpc.php
X-Powered-By: PHP/4.4.8
Connection: close
Content-Type: text/html; charset=UTF-8

This says: “Your response was ok and the page is below, in HTML format, encoded in UTF-8. According to me, the date is the 26th of May 2008 and the time is 18:24:21 GMT. I’m running the Apache web server, and I’m powered by PHP 4.4.8. If you want to ping my blog use this URL…”.

Response Codes

From an SEO point of view, quite a lot of the response information can be useful, but the most important bit is the line:

HTTP/1.1 200 OK

This says “your request was ok - the document was found”. That is called a 200 response code.

There are a number of different response codes that can be returned by an HTTP server, but the most important ones are:

  • 200 - Everything is ok, the document was found and is below
  • 404 - The document could not be found. Instead the document below is what you should show to the user to tell them what to do next
  • 301 - The document has moved permanently. Go to the following web address instead.
  • 302 - The document has moved temporarily. Go to the following web address instead.

How does this affect SEO?

The way search engine spiders react when browsing your site very much depends on the response codes that they get back from your server. A badly configured server sending back the wrong response codes can stopy our site from ever being indexed.

Some common examples of really bad server configurations are:

  • The server always returns the response code 404 - Just because you can see the site and browse it fine, doesn’t mean that your server isn’t returning a 404 response instead of a 200 response. Some badly programmed scripts that give sites “search engine friendly URLs” return 404 values instead of 200. In this case, the search engines won’t index these pages at all.
  • The server never returns 404, even when a page is not found - Say you go to a nonsense URL on your web server - chances are you’ll get a page telling you the content cannot be found. This is great for the user, but if this page isn’t returning a 404 code then the spider will assume the page is ok. This means that whenever you remove a page, or if content on your site expires, then the page will still be indexed in the search engines, but with the “Sorry this page cannot be found” text instead of the original content. This page could compete with your other pages in the search results, and creates unnecessary duplicate content throughout your listings.
  • The server redirects pages using 302, not 301 - Lets say you have a special campaign which has a short URL like “/springoffer/” and it redirects to another page on your site like /garmin-nuvi-spring-offer/. It should use a 301 redirect to tell the search engines that the real page is /garmin-nuvi-spring-offer/, not /springoffer/. Using a 302 will confuse the search engines as you are saying “The real page is /springoffer/ and /garmin-nuvi-spring-offer/ is just a temporary page”. This will make it hard for /garmin-nuvi-spring-offer/ to be listed properly in the search results.

As you can see, the examples above can have a quite drastic impact on your search results. It is therefore extremely important to understand what response codes your site is giving in different situations.

Finding out response codes

There are numerous tools available which will allow you to enter a web address and find the server response header. One of them is on this site: our HTTP response checker.

These web based tools are great, because you can access them from any PC - for example on site with a client. However for your desktop, you probably need something a little more interactive, like the Firefox Live HTTP Headers add-on. This Firefox plugin allows you to view the headers being received by your browser in real time while you surf the web:

The Firefox Plugin, Live HTTP Headers

Because Live HTTP Headers follows you as you surf, you can see step by step exactly what any visit to your site looks like, in great detail.

I’ve found a problem, what next?

So you’ve gone through your site with Live HTTP Headers or our HTTP Response Codes tool, and you’ve found a few bad pages - what next? Hopefully you have some clever developer nearby who can help you fix the issue, but if not and its down to you, then you need to make some changes. Unfortunately this is a bit more complex than just changing your HTML code, and depends on what sort of setup you have on your web server.

The most common server setups are Apache with PHP, or Microsoft IIS with ASP. Here’s a few resources for further reading:

Other Forms of Redirect

Just to make life more complicated, redirects do not just necessarily happen using HTTP headers. They can happen in HTML code too. Common examples are:

  • META redirects - As an example, the page http://www.somesillysite.com/the-first-page.html has a line of code such as:

    <meta http-equiv="Refresh" content="0; URL=http://www.somesillysite.com/the-real-destination.html" />

    Once the web page the-first-page.html has loaded, this code will cause the browser to jump instantly to the page the-real-destination.html.

  • JavaScript redirects - The page contains some JavaScript code that runs once it has fully loaded, and redirects the browser to a new page.
  • iFrame redirects - Although not strictly a redirect, an unfortunately common method of acheiving the same result is to host one page within an iFrame on another page. For example, you want to use my-friendly-page.html to point to some-really-unfriendly-url.html then you create my-friendly-page.html with an iFrame containing the contents of some-really-unfriendly-url.html.

These three methods of redirecting are often used by developers simply because they are easier to implement than a proper 301 redirect. However they are very difficult for a search engine spider to follow, and they provide no extra information to the spider about why the redirect is occurring. Is it permanent or is it temporary? Which content should they pay attention to? The spiders have to make their best guess.

In all of these cases the server would have returned a 200 response code, before passing the user on to a second page, also with a 200 response code.

In an ideal world, the developer would have used none of these method, and instead used a 301 redirect to point from the initial page to the new page.

Please note: META and JavaScript redirects are bad for SEO, but the iFrame method is a messy botch job and I advise you avoid it like the plague.

PPC Keyword Expansion Tutorial

May 26th, 2008

Keyword expansion is a vital part of any PPC campaign. Sure, you can cover all of the variations you want by broad matching on “widgets”, but you’ll just be able to bid one amount across the board, show one set of ad creatives and land the user on just one page. By expanding your campaign you can use different CPC values for “cheap blue widgets” and “quality red widgets” depending on conversion rate and average transaction value, and you can also refine your ad text and landing pages to ensure that you capture and maintain the buyer’s interest throughout their journey from click to order.

Keyword expansion is a surefire way to reduce campaign costs, increase traffic and optimise conversion.

Planning Your Campaign

The first step in every keyword expansion project is to decide on the structure of your campaigns - just hammering out keywords straight away is not going to get you the best results. Sit down and plan how you want to lay out your adgroups. In the example of a widget shop, you might have campaigns like this:

  • Generic widgets - plain “widget” related keywords. “widgets”, “cheap widgets”, “widgets online” etc.
  • Red widgets - keywords related to red widgets - so “red widget”, “red widgets”, “cheap red widget”, “red widgets uk”.
  • Green widgets - keywords related to green widgets, similar to Red widgets above.
  • Widget accessories - keywords such as “widget sprockets”, “cheap sprockets”, “widget accessories”
  • Brand - keywords related to the name of your site “widgetsite.com”, “widget site com”, “the widget site”.

Obviously a real shop is going to have a vastly bigger range of products and categories, and you’ll need to think carefully how to cover all of the variations.

Keyword Research Tools

Once you’ve got your account structure finished you can start filling in base keywords for each different section. This is where comprehensive keyword research is vital - you need to ensure that you are covering all of the keywords in your niche’s particular universe, and that you are using language that your users relate to, not just terms that are used within your company. Some great keyword research tools include Keyword Discovery, WordTracker, and Hitwise. You can also get comparitive traffic volumes for key terms from Google Trends and Alexa WebRankings.

Try and fill in a set of core keywords for each of the campaigns and adgroups in your account design. Make sure you include synonyms and spelling errors.

The Red Widgets Example

Once you’ve got there, you’re ready to start generating keywords for each adgroup in earnest. For our example, lets use the “red widgets” adgroup. The base keywords we’ve come up with are:

  • Red Widgets
  • Red Widget
  • Red Wigets
  • Red Wiget
  • Red Widjets
  • Red Widget
  • RedWidgets
  • RedWidget

Keyword Generation

There are many pieces of keyword generation software available on the web - some for free, some for a small charge. You need to chose the one that is most appropriate for your needs, but for our example we’ll use our free keyword generator web application. Fire this up, and put your base keywords in the third column:

Filling in the base keywords

In the first column, lets put some call to action keywords, in this simple case, “buy” and “find” are a good start. Not all users looking for “red widgets” will type “buy” or “find”, but some will - so select the “Optional” tickbox:

Filling in the first column keywords

In the second column, lets put some adjectives that users might use to describe the sort of red widgets they’re looking for. The market for red widgets is competitive, so a value proposition is a good one, but quality is also important. Lets go for “cheap”, “cheapest”, “low price”, “low cost”, “discount”, “best”, “quality”. You also might like to consider words like “newest” or “latest”. Again, select this column as “Optional”.

Filling in the second column keywords

In the fourth column lets capture users that want to buy online, so lets add “online”, “on line”, and users that specify that they’re looking for a shop, so “shop”, “shops”, “store”, “stores”. Again this will be “Optional”.

Filling in the fourth column keywords

Finally, in the fifth column, lets capture locational searches, so “uk”, “england”, “britain”, “britian”, “london”. Again, this column should be “Optional”. Note that if your searchers are very specific about the location that they want to buy from then you may want to create a separate adgroup for each area - you can then refer to their location in the ad text.

Filling in the fifth column keywords

Now before we generate our keywords, we need to select what match types we want to use. Again, generally speaking, the more the merrier, so lets select all three: “Standard Match”, “Phrase Match” and “Exact Match”. Your form should look like this:

Ready to generate keywords

Now click on “Generate Keywords”. Wait a little while.

The generated keywords

You’ll see that the Keyword Generator has now generated 24,192 keywords! Probably you’d want to refine your choices a bit based on the learnings from your keyword research. Once you’re done, simply copy and paste these into Adwords or the Adwords Editor, and away you go. You’ll now have a fully expanded red widgets adgroup!

The SEMpai Released

May 26th, 2008

Hi and welcome to the first SEMpai blog post. This site is dedicated to providing free information and tools for SEO and PPC professionals. Hopefully you’ll see the site develop over the next few months as new tools are added and the blog expands.

The initial toolset is as follows:

  • Keyword Generator - this keyword generator is very similar to the Free PPC Keyword Generator application for Windows, but has the some additional features (especially the vital “Optional” flag”) and is web based so can be used from anywhere.

    The SEMpai's keyword generator

  • Page Keywords - searches a page for a keyphrase and returns all phrase match and partial match instances in different parts of the HTML. Great for seeing how relevant a page is to a particular theme.
  • HTTP Response Checker - returns full header information for any HTTP query, useful for finding out information about the web server, and especially for seeing what HTTP response codes are returned.
  • Multi URL HTTP Response Checker - similar the above, but allows you to enter a list of up to 50 URLs at a time. Returns simpler information - just the response code at present.
  • ALT Tag List - lists all images on a page, including a preview, the URL, the TITLE tag and the ALT tag. Great for checking that ALT tags and TITLE tags are being used correctly on the page.
  • Page Link List - lists all outgoing links on a page, including TITLE tags and anchor text. This is good for spotting dodgy links that are not supposed to be there, and making sure that the anchor text and TITLE tag variables are fully optimised.

Have you got ideas for tools that you want us to develop? We have a comprehensive development list here which we’ll be working on over the next months, but we’d love to hear from you!.


Free PPC/SEO Tools

- keyword generator
- page keywords
- http response check
- multi http response
- alt tag list
- page link list

Site Sections

- home
- sem blog