Tag Archive: Web


With this blog post, I am pleased to announce that A Few Guys Coding has created a service with a different twist. Wedding customs and traditions are changing all the time. Whether it is the customs, music, the attire or even certain colors such as a wedding I once went to that had black and purple as the main colors. I had even heard of online RSVPs. This software stack, WeddingWare, was born out of necessity. For my own wedding, 5 months away now, we needed a way for guests to RSVP. Since we didn’t want to spend al the money on response cards, we are environmentally friendly (it saves trees!) and I am a software engineer, I suggested that we come up with online RSVPs (the guest will still receive a paper invitation with an invitation code). After a couple brain storming sessions later and much, much code, WeddingWare is officially available for license. While the primary focus is custom wedding RSVPs, the software can also handle shower invitations as well has bachelorette party invitations. Being highly customized is a main focus of this software because after all, who wants to go to a URL that looks like this: www.theknot.com/user/123456/mywedding/rsvp.html. Our software is simple and we will work with clients directly to setup a RSVP site where the look is clean and the URL is custom (for example, ours will be, eventually, at www.davidheartsrachel.com). The software has a full-featured backend that allows the party hosts to login and 1) create new parties and invitations 2) have multiple events 3) get metrics on guests that are attending (such as outstanding invitations vs. people who have responded). Guests will be directed to a friendly, intuitive site that the hosts have fully customized, down the color scheme even. After RSVPing, the guest will receive confirmation emails with hotels, driving directions, maps, etc. The site is mobile browser friendly and we are working on coming up with a matching iOS application. Pricing is still to be deteremined – yet will be affordable and based on the number of guests. A live demo is available at http://dhr.afewguyscoding.com.

Disclaimer: I am not a “web guy” by nature. I have recently start experimenting more and more with it because eventually I want to become proficient with web and design. Therefore, this may be a simple tips for “web people” but a typical “desktop application” person might struggle with this.

  1. Debugging in Chrome, Firefox and Safari is easy, debugging in IE is hard: Chrome, Firefox and Safari all make it extremely easy to debug issue related to HTML, CSS and Javascript (AJAX included!) because they Chrome and Safari have built in “inspectors.”  Firefox requires an additional plugin, Firebug, but it is just as useful once you have it.  IE makes this process as painful, however, in IE 8 (I haven’t verified anything before or after this version), the Tools menu has a “Developer Tools” window that attempts to do the same thing as Firebug and the built-in inspector for Chrome and Safari.  While it is kind of like using rock knives compared to the other tools, it did work and I was able to debug my CSS in IE (Javascript is much harder). Microsoft has a nice writeup on this tool here.  In fact, go ahead and use Photoshop or Illustrator to create your designs – that’s fine.  However, when it comes time, always test first in IE since it has the poorest support.  Chances are that if it works in IE first time, it’ll work in the other browsers first time as well.  Oh yeah, and get CSSEdit (sorry Windows people – you’re on  your own).
  2. RAILS_ENV != RACK_ENV in Rails 3: I was having a devil of a time getting my staging server to work with my staging database.  Passenger and Apache would connect and use the proper staging files directory but it would constantly connect to the production database.  The trick is that in Rails 3 (only!!!) you need to set ENV[‘RACK_ENV’] = “staging” or ENV[‘RACK_ENV’] = “production”.  In Rails 2.x.x, you still use the RAILS_ENV to tell it which environment to use.
  3. Having a good deployment strategy and capistrano recipe is key: Many months earlier, I had already put up a plain HTML site (no framworks, *gasp*) at www.davidheartsrachel.com at a really cheap host, iPage (they were $30 for the year and believe me, you get what you pay for but that was OK because it was just text and images). Everything was working there just fine there while I was developing this Rails app. Initially, (before I switched to Rails from Google App Engine), I thought that I would host the application at some place like Heroku, since their deployment looks dead simple. As it turned out, it would be much cheaper, by about $40/per month, to have my primary host (the one this site is using) and just buy some additional upgrades in RAM to my VM to suppose multiple Passenger instances.Anyways, I knew that I would need to start testing outside of a development environment. In addition to the standard development, test and production environments given to you in Rails, I converted the test environment into a local version of a production environment (cache classing, actual mail delivery, etc.), however I kept my database adapter as SQLite. I also created an additional environment called staging. The theory behind this is that I would run exactly as it would in production (using MySQL, class caching, session stored via :mem_cache, using delayed_job to deliver mail, etc.) with test fixtures and data in the database. Obviously production would be production environment with live data. This allowed me to iron out any bugs that might crop up during switching to a more production-like environment and catch any assumptions I had made in development that wouldn’t be valid in production (when switching adapters, etc.). I dedicated a subdomian to deploying my staging environment and password protected it. If you are interested in my Capistrano file, environments file and Apache vhost file, they are here (with the obvious implementation and sensitive path/password redactions).
  4. Radio buttons: I was making some AJAX calls that should have selected radio buttons in the same group when the call returned based on :success or :error AJAX callback. I was having trouble getting this to happen but this code here is solid:
    // this example had a yes/no. it can be extended to any number of radio buttons
    var radios = $('input:radio[name=radio_group_name]');
    radios.filter('[value=val1]').attr('checked', false);
    radios.filter('[value=val2]').attr('checked', true);
  5. Rails UJS: Rails 3 embraces a more MVC way of thinking by abstracting the Javascript choice (whether it is jQuery, Prototype, etc.). Before the implementation was fairly specific and you sometimes had to mix specific JS code into your Ruby code. With UJS, implementation in Rails 3 takes benefit of the new HTML5 data-*@ attributes. So Rails doesn’t spit out Prototype-based Javascript inline (the old helpers are still there). Rather, the helpers just define an appropriate data attribute in the tag, and that’s where Javascript comes in. Obviously this is a more flexible and reusable pattern. You can also plug and play different frameworks without the headache of writing a lot of the same code over again (I swapped Prototype and jQuery in in several minutes – just remember to get the new rails framework Javascript files, such as rails.js, in jQuery or you’ll have weird errors that are hard to track down.  Check out jqueryrails.). Since I am far from a RoR or Javascript expert, this, this and this has a more expert breakdown (why re-invent the wheel).
  6. Remote form_for (AJAX) and :onsubmit(): In Rails 2? and at least 3, form_form, :remote => true overrides the :onsubmit => ‘func();’ method to do the actual form submission.  If you want to bind something to your form before it gets submitted (or during or after!), bind the form using jQuery.bind() and then observe the AJAX callback functions to do what you need. <script type="text/javascript"&gt;
      function get_address_information() {
        // check jquery for all the possible callbacks. there is also success and error. compete get calls after both success and error
        var pars = 'param1=x&amp;param2=y&amp;param3=z';
        $.ajax({ type: "POST",
          url: "/get_invite_address",
          data: pars, dataType: 'script',
          beforeSend: function() {
            // do your before send checking here
          },
          complete: function(data, status) {
            // do some post processing ehre
          }
        });
      }
    &lt;/script&gt;
  7. Rails is as good as everyone says (and other languages/frameworks are as bad as you think): Truly. The only drawback is the lack of internationalization. However, in terms of use, the language, setup, data access Rails is superior in every way. Rails setups up for testing (as well as performance testing) from the word go. The framework allows you to concentrate on coding while it does the heavy lifting.

Report card from here

This research article was co-authored by David Stites (dstites[at]uccs.edu) and Jonathan Snyder (jsynder4[at]uccs.edu)

Abstract—A common problem for many web sites is appearing as a high ranking in search engines.  Many searchers make quick decisions about which result they will choose and unless a web site appears within a certain threshold of the top rankings, the site may have a low throughput from search engines. Search engine optimization, while widely regarded as a difficult art, provides a simple framework for improving the volume and or quality of traffic to a web site that uses those techniques.  This paper is a survey of how search engines work broadly, SEO guidelines and a practical case study using the University of Colorado at Colorado Springs’ Engineering and Science department home page.

Index Terms—search engine optimization, search indexing, keyword discovery, web conversion cycle, optimized search, organic search, search marketing

1.  Introduction

Search Engines are the main portal for finding information on the web.  Millions of users each day make searches on the internet looking to purchase products or find information.  Additionally, users generally only look at the first few results.  If a website is not on the first couple pages it will rarely be visited.  All these factors give rise to many techniques employed to raise a website’s search engine ranking.

Search engine optimization (SEO) is the practice of optimizing a web site so that the website will rank high for particular queries on a search engine.  Effective SEO involves understanding how search engines operate, making goals and measuring progress, and a fair amount of trial and error.

Discussed in the following sections is a review of search engine technology, a step by step guide of SEO practices, lessons learned during the promotion of a website, and recommendations for the Engineering and Applied Science College of the University of Colorado at Colorado Springs.

In this survey, we spend our research efforts determining how SEO affects organic search results and do not consider paid inclusion.  For more information on paid inclusion SEO, see [1].

2.  A Review of Search Engine Technology

Search engine optimization techniques can be organized into two categories: white hat, and black hat.  White hat practices seek to raise search engine rankings by providing good and useful content.  On the other hand, black hat techniques seek to trick the search engine into thinking that a website should be ranked high when in fact the page provides very little useful content.  These non-useful pages are called web spam because from the user’s perspective they are undesired web pages.

The goal of the search engine is to provide useful search results to a user.  The growth of web search has been an arms race where the search engine develops techniques to demote spam pages, and then websites again try to manipulate their pages to the top of popular queries.  Search engine optimization is a careful balance of working to convince the search engine that a page is relevant and worthwhile for certain searches while making sure that the page is not marked as web spam.

The accomplishment of this objective requires understanding of how a search engine ranks pages, and how a search engine marks pages as spam.  Although the exact algorithms of commercial search engines are unknown, many papers have been published giving the general ideas that search engines may employ.  Additionally, empirical evidence of a search engine’s rankings can give clues to its methods.  Search engine technology involves many systems such as crawling, indexing, authority evaluation, and query time ranking.

2.1  Crawling

Most of the work of the search engine is done long before the user ever enters a query.  Search engines crawl the web continually, downloading content from millions of websites.  These crawlers start with seed pages and follow all the links that are contained on all the pages that they download.  For a url to ever appear in a search result page, the url must first be crawled.  Search engine crawlers present themselves in the HTTP traffic with a particular user-agent when they are crawling which websites can log.

In the eyes of search engine crawlers not all websites are created equal.  In fact some sites are crawled more often than others.  Crawlers apply sophisticated algorithms to prioritize which sites to visit.  These algorithms can determine how often a webpage changes, and how much is changing. [7]  Additionally, depending on how important a website is, the crawler may crawl some site more deeply.  One goal of a search engine optimizer is to have web crawlers crawl the website often and to crawl all of the pages on the site.

Search crawlers look for special files in the root of the domain.  First they look for a “robots.txt” file which tells the crawler which sites not to crawl.  This can be useful to tell the crawler not to crawl admin pages because only certain people have access to these pages.  The robots.txt file may contain a reference to an XML site map.  The site map is an XML document which lists every page that is a part of the website.  This can be helpful for dynamically generated websites where the sites may not all necessarily have links.  An example of this is on the Wikipedia site.  Wikipedia has a great search interface, but crawlers do not know how to use the search or what to search for.  Wikipedia has a site map which lists all the pages it contains.  This enables the crawler to know for sure that it has crawled every page.  When optimizing a website, it is important to make it easy for a site to be completely crawled.  Crawlers are timid to get caught in infinite loops of pages or download content that will never be used in search result pages. [5]

2.2  Indexing

After the search engine has crawled a particular page, it then analyzes the page.  In its most simple form the search engine will throw away all the html tags and simply look at the text.  More sophisticated algorithms look more closely at the structure of the document to determine which sections relate to each other, which sections contain navigational content, which sections have large text, and weight the sections accordingly. [3]  Once the text is extracted, information retrieval techniques are used to create an index.  The index is a listing of every keyword on every webpage.  This index can be thought of as the index in a book.  The index tells which page the subject is mentioned; however, in this case the index tells which website contains particular words.  The index also contains a score for how many times the word appears normalized by the length of the document.  When creating the index, the words are first converted into their root form.  For example the word “tables” is converted to “table”, “running” is converted to “run”, and so forth.

In order for a page to appear in the search engine results page, the page must contain the words that were searched for.  When promoting a website, particular queries should be targeted, and these words should be put on the page as much as possible.  However, this kind of promoting is abused.  Indeed, some sites are just pages and pages of keywords with advertisements designed to get a high search engine ranking.  Other tactics include putting large lists of keywords on the page, but making the keywords only visible to the search engine.  In fact some search engines actually parse the html and css of the page to find words that are being hidden to the user.  These kinds of tactics can easily flag a site as being a spam site.  Therefore one should make sure that the targeted keywords are not used in excess and text is not intentionally hidden.

One problem that many pages have is using images to display text.  Search engines do not bother trying to read the text on images.  This can be damaging to a site especially when the logo is an image.  A large portion of searches on the internet are navigational searches.  In these searches, people are looking for a particular page.  For these kinds of searches, it is hard to rank high in the search engine result page when the company brand name is not on the page except in an image.  One alternative is to provide text in the alt field of the image tag.  This may not be the best option however because the text in the alt field is text that the user does not normally see and is therefore prone to keyword abuse; hence, suspicious to the search engine.

2.3  Authority Evaluation

One of the major factors that sets Google apart from other search engines is its early use of PageRank as an additional factor when ranking search result. [3]  Google found that because the web is not a homogeneous collection of useful documents, many searches would yield spam pages, or pages that were not useful.  In an effort to combat this, Google extracted the link structure of all the documents.  The links between pages are viewed as a sort of endorsement from one page to another.  The idea is that if Page A links to Page B, then the quality of Page B is as high or higher than that of Page A.

The mathematics behind this model is a model of a random web surfer.  The web surfer starts on any page on the internet.  With the probability of 85% the random web surfer will follow a link on the page.  With the probability of 15%, the web surfer will go to a random page on the internet.  The PageRank of a given page is the probability that the user will be visiting that page.

One of the problems with this model is that any page on the internet will have some PageRank, even if it is never linked to.  This baseline page rank can be used to artificially boost the PageRank of another page.  Creating thousands of pages that all link to a target page can generate a high PageRank for the target page.  These pages are called link farms.

Many techniques have been proposed to combat link farms.  In the “Trust-rank” algorithm [4], a small number of pages are reviewed manually to determine which pages are good.  These pages are used as the pages that the web surfer will go to 15% of the time.  This effectively eliminates the ability to create link farms because a random page on the internet must have at least one page with trust-rank linking to it before it has any trust-rank.   Another technique is to use actual user data to determine which sites are spam.  By tracking which sites a user visits and how long the visits are, a search engine can determine which are the most useful pages for a given query. [6]

2.4  Result Ranking

With traditional information retrieval techniques, the results are ranked according to how often the query terms appear on a page.  Additionally, the query terms are weighted according to how often they occur in any document.  For example, the word “the” is so common in the English language that it is weighted very low.  On the other hand the word “infrequent” occurs much less often, so it would be given a higher weight.
During the beginning of the World Wide Web, search engines simply ranked pages according to the words on the page compared to the page’s length and relative frequency of the words.  This algorithm is relative simple to fool.  One can optimize landing pages to contain high frequencies of the targeted query terms.  To combat this, the relevance score and the authority score are combined to determine the final ranking.  In order for a document to appear in the search engine result page it must match the keywords in the query, but the final ranking is largely determined by the authority score.

3. Guidelines to SEO

Equipped with the knowledge of how search engines index and rank web sites, one has to tailor content in such a way that gives the best opportunity for being ranked highly.  There are many guidelines to SEO, but we attempt to distill the most important ones into our survey here.  For a more complete picture of SEO best practices and information see, [1].

3.1  Refine The Search Campaign

The first thing that the reader must consider is why are they going to do SEO on their web site?  In a broad sense, the purpose is to get more traffic volume or quality to one’s web site, but one needs to break this down into smaller subgoals.  We will look at defining the target audience to determine why people use search in the first place and then we will examine how one can leverage the searchers intent to design a web site that will best serve the audience.

3.2  Define The Target Audience

Typically searchers aren’t really sure on what they are looking for.  After all, that is why they are performing a search in the first place.  When designing a web site, the webmaster must be cognizant of the searcher’s intent.

Generally, a searcher’s intent can be broken down into 3 different categories.  Know which category a particular searcher might fall into is important in how one designs your web site and which keywords they might choose.

  • Navigational searchers want to find a specific web site, such as JPMorgan Chase and might use keywords such as “jp morgan chase investments web site.”  Typically, navigational searchers are looking for very specific information “because they have visited in the past, someone has told them about it because they have heard of a company and they just assume the site exists.  Unlike other types of searchers, there is only one right answer.” [1]  It is important to note that typically navigational searchers just want to get to the home page of the web site – not deep information.  With navigational searches, it is possible to bring up multiple results with the same name (i.e. A Few Guys Painting and A Few Guys Coding).
  • Informational searchers want information on a particular subject to answer a question they have or to learn more about a particular subject.  A typical query for a informational searcher might be “how do I write iPhone applications.”  Unlike navigational searchers, informational searchers typically want deep information on a particular subject – they just don’t know where it exists.  Typically, “information queries don’t have a single right answer.” [1]  By far, this type of search dominates the types of searches that people perform and therefore the key to having a high ranking in informational searches is to choose the right keywords for your web site.
  • Transactional searchers want to do something, whether it is purchase an item, sign up for a newsletter, etc. A sample transactional search query might be “colorado rockies tickets.”  Transactional queries are the hardest type of query to SEO for because “the queries are often related to specific products.” [1]  The fact that there are many retailers or companies that provide the same services only complicates matters.  In addition, it can be hard to dipher between whether the searcher wants information or a transaction (i.e. “Canon EOS XSi”).

After understanding the target searcher, one has to consider how the searcher will consume the information they find.  According to Hunt, et. al., “nearly all users look at the first two or three organic search results, less time looking at results ranked below #3 and far less time scanning results that rank seventh and below.  Eighty-three percent report scrolling down only when they don’t find the result they want within the first three results.”

So what does this mean for people doing SEO?  Choosing the correct keywords and descriptions to go on your site is probably the most important action one can take.  Searchers choose a search result fairly quickly, and they do so by evaluating 4 different pieces of information, included with every result, which include: URL, title, the snippet, and “other” factors.


Graph 3.1 – Identifying the percentages of the 4 influencing factors of a search result. [1]

3.3  Identify Web Site Goals

Now that we have identified the types of searchers that are looking for information, lets consider why one wants to even attempt SEO in the first place.  For that, one needs to identify some goals, including on deciding the purpose of your web site.  Generally, there six types of goals including:

  • Web sales.  This goal is selling goods or services online.  Note that this could be a purely online model (as in the case of Amazon.com or Buy.com) or it can be a mix between online and brick and mortar stores.  Web sales can be broken down further in specifying 1) whether or not the store is a retailer that sells many different products from different manufactures or if is a manufactures’ site and 2) what type of delivery is available (instant or traditional shipping).  In all cases, the ultimate desire is to increase the number of sales online.  This type of site would benefit from optimizing their site with transactional searchers in mind.
  • Offline sales. This goal involves converting web visitors into sales at brick and mortar stores. While this type of goal would benefit from transactional optimization, the benefits from SEO are much harder to calculate because there isn’t as good a measure on sales success.  The most important concept that a webmaster of an offline sales site can remember is to always emphasize the “call-to-action”, which the thing that you are trying to get someone to do, in this case, convert from web to physical sale.
  • Leads. Leads are conceptually the same as offline sales, however leads are defined by when the customer switches to the offline channel.  Customers who do some research online and then switch to offline are leads while customers who know model numbers, prices, etc. are offline sales.  With leads, the search optimization and marketing strategy needs to be different because customers who are leads are typically informational searchers instead of transactional searchers as in the case of web and offline sales.  Therefore, people who want to optimize for leads need to attract customers who are still deciding on what they want.
  • Market awareness. If a goal for one’s web site is market awareness, this is an instance of where paid placement could help boost your page rank quicker than organic results simply due to the fact that the product or service isn’t well know yet.  With market awareness, the web site mainly exists to raise awareness, so you would want to the site for navigational and informational searchers.
  • Information & entertainment. These sites exist solely to disseminate information or provide entertainment.  They typically don’t sell anything, but might raise revenue through ad placement or premium content, such as ESPN Insider.  Sites that focus on information and entertainment should focus almost exclusively on optimizing their site for informational searchers.
  • Persuasion.  Persuasion websites are typically designed to influence public opinion or provide help to people.  They usually not designed to make money.  In order to reach the most people, sites like these should be designed for a informational searcher.

3.4  Define SEO Success

A crucial element of undertaking SEO is to determine what should be used to measure the “success” of the efforts performed.  Depending on what type of goal the webmaster or or marketing department has defined for the website (web or offline sales, leads, market awareness, information and entertainment and persuasion), there are different ways to measure success.  For example, the natural way for a site that does web or offline sales to determine success is to count the conversions (i.e. the ratio of “lookers” to “buyers”).  If a baseline conversion ratio can be established prior to SEO, you can measure the difference between the old rate and the new rate after SEO.

This logic can be applied to any of the goals presented.  For example, with market awareness, you could measure conversion by sending out surveys to consumers or perhaps including some sort of promotion or “call-to-action” that would be unique to the SEO campaign.  If a consumer were to use that particular promotion or perform that action, it would give an indication of a conversion.

Another way to determine success is to analyze the traffic to the web site and there are a couple of different metrics that can be used for this including “page views and visits and visitors.” [1]  Using the most basic calculation with page views, you could determine a per-hour, per-day, per-week, per-month or even per-year average count of visitors to your website.  Using some tracking elements (such as cookies or JavaScript), you could determine the rise (or fall) or visitors to the site before and after SEO was performed.  The amount of information that this simple metric would provide would be invaluable because you could also determine peak visiting hours, which pages had the most hits at a specific time, visitor loyalty (unique vs. returning visitors), visitor demographics (location, browser and operating system, etc.) and even the time spent per visit.

3.5  Decide on Keywords

Keywords are the most important element of SEO.  It is important that during SEO, one focuses on the keywords that searchers are most likely to use while searching.  When choosing keywords, it is important to consider several different factors such as keyword variations, density, search volume and keyword “competition.”

Keyword variations play an important role in choosing keywords because one has have to consider how an average searcher may try to find information.  For example, certain keywords such as “review or compare” might be able to be used interchangeably in a search query.  In addition, some keywords are brand names that people automatically associate with products such as “Kleenex” for tissues and “iPod” for MP3 player.  If someone is able to have more variations on a certain sequence of keywords, they have a higher likelihood of being found.

Search volume also an big factor in determining which keywords to choose because one wants to choose keywords that people are actually searching for.  If a particular keyword only has 3,000 queries a month, it is much better to use a keyword that has 20,000 queries a month because 1) the searchers are associating the higher volume keyword with whatever the subject is instead of the lower volume keyword and 2) using the higher volume keyword will reach a larger audience.  However, this is not to say that low-volume keywords have no value.  In fact, the opposite is true.  Mega-volume keywords, such as brand names, often used by companies to compete to have the highest ranking.  If one were able to achieve the same or nearly same results with lower contention keywords, the SEO process would be much easier because fewer companies target them making it easier to achieve a high ranking with them.

Lastly, keyword density is an important factor.  When deciding  where to rank your page, the search engines have it figured out that anywhere from 3%-12% of the page is a good density for keywords, with 7% being optimal.  If it detects a higher percentage than this, the search spider might consider the page spam, simply because one is trying to stuff as many high volume keywords into the page as possible without any relevant context or content to the user.  Typically, this results in a lower ranking, no ranking or sometimes the page is even removed from the index or blocked.

3.5.1  Create Landing Pages for Keyword Combinations

The last step in assessing a site under consideration for SEO would be to identify pages on your site that you want your specific keyword queries to lead to when a searcher enters that query.  For example, P&G might want “best laundry detergent” to lead to their Tide home page.  Landing pages are the pages that these queries lead to and are “designed to reinforce the searchers intent.” [1]  Each of the keywords or phrases identified in section 3.4 must lead to a landing page and those pages must be indexed.  If there isn’t a landing page already for some of those keywords, then they must be created.

3.6  Page Elements That Matter And Don’t Matter

Now that the target audience, goals, keywords and landing pages are identified, it is crucial to consider the design of the web page.

  • Eliminate popup windows.  The content in popup windows are not indexed by spiders.  If important content, navigation  or links are contained inside the popup window, then it will not be seen by the spider.  The content needs to be moved outside the popup window.
  • Pulldown navigation.  Pulldown navigations suffers from the same problem as popup windows.  Since spiders cannot see these elements (mouse-over or click on them), they cannot index the content, which creates a large problem if the navigation is done with pulldown menus.  Either the pulldown menu must be done in a compatible way or the site has to allow for some other alternative means of navigation.
  • Simplifying dynamic URLs. Pages that use dynamic content and URLs must be simplified for the spiders to crawl.  The nature of a dynamic URL means that a spider could spend an infinite amount of time attempt to crawl all the possible URLs and that would produce a lot of duplicate content.  To deal with this, spiders will only crawl dynamic URLs if the URL has less than 2 dynamic parameters, is less than 1,000 characters long, does not contain a session identifier and every valid URL is linked from another page.
  • Validating HTML.  Robots are very sensitive to correctly formed documents.  It is important that the HTML is valid for the spider to get a good indication on what the page is truly about
  • Reduce dependencies.  Some technologies, such as Flash, make it impossible for the spider to index the content inside them.  The spider cannot view this particular content, so any important keywords or information inside it, is lost.  The content should be moved outside to allow indexing to take place.
  • Slim down page content.  A spider’s time is valuable and typically spiders don’t crawl all the pages of a bloated web site.  Google and Yahoo! spiders stop at about 100,000 characters. [1]  The typical cause of HTML page bloat is embedded content such as styling and other content such as JavaScript.  A simple way to solve this is to link to Cascading Style Sheet (CSS) so that you are able to make use of re-useable styles.  Another way to reduce JavaScript bloat is to use a program to obfusticate long files which replace long variable names such as “longVariableName”, which much shorter versions such as “A.”
  • Use redirects.  From time to time, pages within a web site move.  It is important to make accurate use of the correct type of redirects within your site so that when spiders attempt to visit the old URL, they are redirected to the new URL.  If they find that the server returns a 404 (Unavailable) for the old URL, the spider might remove that particular page from the index.  Instead, the proper way to indicate that the page has moved permanently is to use a server-side redirect, also known as a “301 redirect.”  This is returned to the spider when it attempts to navigate to the old URL and it is then able to update the index with the new URL.  A sample implementation might look like the following:

Redirect 301 /oldDirectory/oldName.html http://www.domain.com/newDirectory/newName.html

Note that spiders cannot follow JavaScript or Meta refresh directives. [1]  Additionally one can use a “302 redirect” for temporarily moved URLs.  See Fielding et. al for more information on “302 redirect.”

  • Create site maps. Site maps are important for larger sites because “it not only allows spiders to access your site’s pages but they also serve as very powerful clues to the search engine as to the thematic content of your site.” [1]  The anchor text used for the link could provide some very good keywords to the the spider.
  • Titles and snippets.  Together, the title and the snippet that the search spider extracts account for a large part of how they index a particular page.  The title, the most important clue to a spider on the particular subject of a page, is the most easily fixed element.  The title is a great place to use keywords that were previously decided on.  For example, the title element for StubHub, a ticket brokerage site is “Tickets at StubHub! Where Fans Buy and Sell Tickets.” Additionally, the snippet, or summary that the spider comes up with to describe their result is important as well.  Typically, the spider uses the first block of text that they run across to use for a snippet.  For example, for WebMD, Googlebot uses the following snippet “The leading source for trustworthy and timely health and medical news and information. Providing credible health information, supportive community, …”  In both of these examples, it is clear that having essential keywords present in both are highly correlated to having high page ranks.
  • Formatting heading elements.  Using traditional HTML subsection formatting elements, such as <h1>, <h2>, <h3>, etc. to denote important information can help give context clues to spiders on what text is important on a particular page.

3.6.1  The Importance of Links

Links, both internal and external, play a big role in SEO and in page ranking.  Search engines place a certain value on links because they can use these link to judge the the value of the information.  Similar to how scientific papers have relied on citations to validate and confer status upon an authors work, links apply status and value to a particular site.  “Inbound links act as a surrogate for the quality and “trustworthiness” of the content, which spiders cannot discern from merely looking at the words on the page.”  [1]

Several factors can influence how an algorithm ranks the link popularity on a particular page including 1) link quality, link quantity, anchor text and link relevancy.

Using the 4 link popularity factors from above, search engines use a theory of hub and authority pages to create link value.  Hub pages are web pages that link to other pages on a similar subject.  Authority pages are pages that are linked to by many other pages on a particular subject.  Therefore, search engines usually assign a high rank to these pages because they are most closely related to a searcher’s keywords. Using this model, it is easy to see why the harder an inbound link is to get, the more valuable it might be in terms of value.  For more information on inbound and outbound link importance and their value, see Hunt et. al.

3.7  Getting Robots To Crawl The Site

In addition to all of the goals and HTML elements above, it is crucial to have a “robots.txt” file that will allow the robot to crawl the site.  This file can give instructions to web robots using the Robots Exclusion Protocol.  Before any indexing occurs at a site, the robot will check the top-level directory for the “robots.txt” file and will index the site accordingly.  A sample “robots.txt” file might look like the following.

User-agent: *
Disallow:

This instructions applies to all robots or “User-agents” and they are to index the whole site because nothing is listed under the “Disallow” directive.  It is possible to tailor this file to individual robots and server content.  While the protocol is completely advisory, it is highly recommended to improve the search quality of what the robots index.  Note that the robot does not have to obey the “robots.txt” file, however most non-malicious robots do obey the instructions.

3.8  Dispelling SEO Myths

  • SEO is a one time activity.  SEO is a continual activity that must be revisited from time to time to ensure that the most up-to-date keywords and content are being used.
  • SEO is a quick fix process for site traffic.  Generating high quality organic traffic that will help with conversion is a slow process.  For each change that one makes to the web page, spiders must re-index that page and then calculate the new rankings.  It could take several months to years depending on your goal of achieving a higher ranking, passing a competitor in rankings or achieving the top results spot.
  • META tags help with page ranking. META tags were abused by keyword spammers by packing in as many highly searched keywords as possible, early in the days of developing search engines and many spiders now give META tags little to no credence.

4.  Promotion Of AFewGuysCoding.com

In order to test the authors hypothesis, we chose to perform our SEO experiment with AFewGuysCoding.com.  A Few Guys Coding, LLC is a small, author-owned company that provides contract engineering services mainly for mobile platforms (iPhone, iPod Touch, iPad, Android) but also does web and desktop applications.   This web site has never had SEO performed on it and was not designed with any such considerations in mind.

4.1  Initial Investigation For AFewGuysCoding.com

In order to determine what keywords we should focus on, we created a survey and asked the following question to people who are not engineers: “Suppose you were a manager of a business that had a great idea for a mobile phone application. You knew that you had to hire an iPhone, iPod or iPad developer because you didn’t directly know anyone who could do this work for you. What words, terms or phrases might you consider searching for in Google to find this person?”  Table 4.1 represents the range of responses that were provided.  In analyzing this data, the words provided were stemmed to account for different endings.  In addition, stop words, or words that are ignored because they don’t provide any significance to the query, were also filtered out and not considered.

Looking at the results and given the question posed to the survey audience, the percentages of the top three results did not surprise the authors.  What did surprise the authors were the relative high number of results for the keywords “technician”, “help”, “inventor/invention”, “creator” and “market/marketing” and the relatively low number of results for “mobile”, “phone”, “software” and “programmer.”

After careful considering, the authors chose to incorporate some of the keywords suggested by the survey audience into the web page.

Keyword Count Percentage
Developer 81 19.33%
Application 72 17.18%
iPhone 51 12.17%
Apple 24 5.73%
Phone 21 5.01%
iPod 17 4.06%
Mobile 17 4.06%
iPad 17 4.06%
Programmer 17 4.06%
Technician 11 2.63%
Help 10 2.39%
Mac/Macintosh 9 2.15%
Technology 6 1.43%
Market/Marketing 6 1.43%
Software 7 1.67%
Inventor/Invention 5 1.19%
Creator 5 1.19%
Business 4 0.95%
iTunes 3 0.72%
Designer/Designing 3 0.72%
Hire 3 0.72%
Handset 3 0.72%
Top 3 0.72%
Computer 2 0.48%
Company 2 0.48%
Contractor 3 0.72%
AT&T 1 0.24%
Store 1 0.24%
3G 1 0.24%
OS 1 0.24%
Code 1 0.24%
File 1 0.24%
Resource 1 0.24%
Sale 1 0.24%
Science 1 0.24%
Graphics 1 0.24%
Analyst 1 0.24%
Creative 1 0.24%
Devoted 1 0.24%
Energetic 1 0.24%
Smart 1 0.24%
Engineer 1 0.24%
Objective-C 1 0.24%
Total 419 100.00%

Table 4.1 – Responses from a search audience regarding possible keywords for AFewGuysCoding.com

4.2  Baseline Rankings

Prior to performing SEO on AFewGuysCoding.com, the Google page rank was 1/10.  The website was indexed by major search engines, such as Google, Yahoo!, Bing, AOL and Ask, but was not crawled on a regular basis.

Before performing an SEO activities, the amount of traffic that was coming from search engines was a little over 7% overall.  Most traffic was coming from direct referrals, for example when the user entered the address from a business card they had gotten (see graph 5.1).

In addition, the titles and snippets of the page were not as good as they could be (i.e. they didn’t include keywords or readily extractable content for the snippet).  For example, the title of the home page for AFewGuysCoding.com was “Welcome to A Few Guys Coding, LLC.”

Moreover, the web site was a transition from an older web site, www.davidstites.com, so many of the links already in Google referred to old pages that no longer existed.  When those pages were clicked on in the Google results, it took the user to a “404 Unavailable” page.


Graph 5.1 – Sources of traffic for AFewGuysCoding.com before SEO activities

In addition to the work done with keyword densities and titles, the author also started a blog and Twitter account, (http://blog.afewguyscoding.com, http://www.twitter.com/afewguyscoding), that addressed computer science and programming topics.  In the blog, we linked to other parts of the main site when a topic was referenced in a blog post that A Few Guys Coding dealt with, such as iPhone applications.  A summary feed of this blog was placed on the main page AFewGuysCoding.com web site and the authors noticed an increase in crawler traffic after the spider determined the content was changing frequently enough to warrant additional visits to the site to re-calculate PageRank.

4.3  Rankings After SEO

After an initial pass of SEO was performed to the web site, the traffic from search engines increased dramatically and the page rank increased 1 point to 2/10.  The authors believe that this increase was largely from changing the titles of the web pages themselves, changing keyword densities and tying keywords to landing pages.   Using Google Analytics, over a two month period, traffic for AFewGuysCoding.com was up 81.24%.

They word densities are not quite as high as 3%-11%, however, it is a marked improvement from before, when the important keyword densities were all 1% and below.  See Table 5.2.

Keyword Count Density Page
iPhone 10 2.55% /services
iPad 6 1.53% /services
iPod 6 1.53% /services
Code 10 2.55% /services
Software 2 1.32% /
Application 10 2.55% /services
Develop/Developer 4 1.42% /services

Table 5.2 – Keyword densities for certain pages on AFewGuysCoding.com

Graph 5.2 – Sources of traffic for AFewGuysCoding.com after SEO activities (April 2010)

Pages Page Views % Pageviews Avg. Time On Page
/ 387 66.38% 1:42
/learnmore 59 10.12% 2:09
/services 41 7.03% 1:08
/portfolio 44 7.55% 1:19
/contact 35 6.00% 0:45
/services/iphone 8 1.37% 0:39
/getaquote 6 1.03% 0:15
/services/ipad 3 0.51% 0:27
Totals 583 100.00% 1:03

Table 5.3 – Page View Overview for top 8 most visited pages in April, 2010

Graph 5.3 – Number of visitors per day for April, 2010 against Time on Site goal achievement rate

Another action that was taken by the authors was using Apache mod_rewrite and a redirect file, we we’re able to direct the spider to update their index to the new pages (from the older site) using a “301 redirect”.  We were able to transform the URL using mod_rewrite to match the current top-level domain.  This ensured that pages were not removed from the crawlers index due to being status 404.

Lastly, the authors set several goals for the web site (besides the increase in PageRank), including a “Time on Site” measure that would help measure SEO success.  If a particular user stayed on the site for longer than 5 minutes and/or had 10 or more page views, we considered this criteria meeting the goal.  See Graph 5.3 for a comparison of visitors to goals met.

4.3  Conclusions

Clearly, making simple changes to content (such as keywords and titles) can have a large effect on search engine ranking and the amount and quality of traffic that is directed to a site that has had SEO.  The difference in search engine traffic over a month represented an 400% increase. It might be reasonable to infer that an additional increase of 3-4% in keyword density might generate search engine referrals up to 50%-60%. The authors would like to see the effects of continuing SEO at a 3, 6, 9 and 12 month period.

5.  Recommendations To UCCS

Based on the research that we performed, we would like to make some suggestions to the UCCS EAS webmaster that would allow the UCCS EAS site to be ranked higher than it currently is.  Currently, the URL http://eas.uccs.edu/ has a Google PageRank of 4/10. By following the suggestions below, it may be possible to raise the Page Rank 1 or 2 points to 5/10 or 6/10.  We have broken down our recommendations into a four step process: define the scope and goals, understand the decision making process of a potential EAS student, create content, and build links.

5.1  Define the Scope and Goals

The first step is to define the scope and goals of the Search Engine Optimization efforts.  One question to ask is: Is this effort only related to getting more EAS students, or is it to promote UCCS as a whole?  How much effort can be put into this project?  The answers to these questions will determine the scope of the project.  One potential goal is to recruit more students to the college, but a new student may have attended even without the SEO campaign.  Consequently, a plan for measuring how students became interested in the college will be important.

5.2  Understanding the Decision Making Process

Key to understanding how to boost traffic through SEO practices is understanding the content target users are searching for.  This can be discovered through surveys or interviews with students that have gone through the process of choosing a college to attend.  This can be current UCCS students or students of other universities.  From my own experience I would guess that a typical decision making process would involve finding answers to these questions: Should I go to college? What is the best school for me?  Why should I go to UCCS?  Which major should I pick? How do I apply?  Where can I find help with my application?  Each of these questions is a good area for creating good content.

5.3  Create Content

Once the decision making process is understood, content can be created to answer the questions that people are searching for.  Special attention should be made to use keywords that people would search for when they have that question.  A quick look at the current EAS site reveals that there is little content related to recruitment.  Additionally, the current pages have few keywords that are searched frequently.  Additional recommendations are listed below that highlight some of the deficiencies of the current EAS pages.

  • Eliminate non-indexable content. The Adobe Flash content on the main landing page, http://www.eas.uccs.edu/ is unable to be indexed by spiders and robots.  All the information contained in that Flash element is lost.  Additionally, any images that contain content, are non-indexable as well.
  • Remove “expandable” navigation.  Spiders are unable to “click” on these individual sections to expand them and are therefore unable to crawl the linked pages.  The navigation should be reworked so that all links are accessible without needing to perform any special interface actions.
  • Choose better titles for individual pages.  Despite the actual pages content or purpose, all pages under the main landing page have the title “UCCS | College of Engineering and Applied Science.”  These titles should change depending on the subject or content of the page so that spiders and robots are able to create a more accurate index of the page.
  • Better use of heading formatting.  Headings should use valid heading tags such as <h1>, <h2>, <h3>, etc. so that the robots that crawl the site can extract main ideas and content from the page for the index.
  • Check and adjust keywords and keyword densities.  The densities of the keywords on the page are low should reflect what the page is about.  For example, on the application page for EAS, the keywords concerning admission and applying are low.  In fact, out of the top 18 keywords on the page only 4 or 5 have anything to do with admission and the densities are low, ranging from .79% to 2.37%.
Keyword Count Density
Engineering 63 5.54%
Science 43 3.78%
Computer 39 3.43%
Department 39 3.43%
Admission 27 2.37%
Application 23 2.02%
Colorado 20 1.76%
Electrical 19 1.67%
UCCS 18 1.58%
Mechanical 18 1.58%
Form 15 1.32%
Aerospace 14 1.23%
Springs 12 1.05%
College 12 1.05%
Applied 12 1.05%
Student 11 0.97%
Application 10 0.88%
Financial 9 0.79%
  • Include a site map.  A site map would help ensure that all the pages that were meant to be accessible to a web visitor are also accessible to a spider crawling the content.

5.4  Build Links

Lastly, it is important to build the authority score by getting other pages to link to the content pages.  Links can be built within organizations that already have relationships with the university.  For example the City of Colorado Springs, engineering organizations, and businesses that recruit from the EAS college could all be great sources of links.  Publishing press releases to news organizations of new websites could also be helpful in generating links.  Another form of link building could be simply link to the target pages heavily from other pages on the EAS site.

Another thing to consider is that people understand when content is on the EAS site it is going to be biased towards the EAS college.  Prospective students are not going to trust the document as much as if it were coming from a third party site.  One way to overcome this is to write content for other websites on the web.  This provides two benefits.  First it creates a seemingly unbiased source of information that can be slanted towards recruiting as EAS, and the content can link back to the EAS website providing a good link for building the authority score of a page.

By following these guidelines, the EAS college can succeed in generating more traffic on its recruitment pages and therefore seeing more students attend the college.

6.  Conclusion

In this research project, we have investigated the implementation of search engines.  We have also presented different elements that affect search ranking.  Based on our research and case study with AFewGuysCoding.com, we have provided recommendations to the UCCS EAS department to improve their PageRank within Google and other major search engines.  Indeed, search engine optimization is an important technique that any web master must master so that their site can be indexed as high in the rankings as possible.

References

  1. Hunt, B, Moran, M.  Search Engine Marketing, Inc,. Driving Search Traffic to Your Company’s Web Site, 2nd ed. (2008). IBM Press.
  2. Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., Berners-Lee., T.  RFC 2616 – HTTP/1.1 Status Code Definitions.  1999 [Online]. http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
  3. S. Brin and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems. 1998. pp. 107-117.
  4. Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating Web Spam with TrustRank. Very Large Data Bases. Vol. 30, pp. 576-587. 2004.
  5. O. Brandman, J. Cho, H. Garcia-Molina, N. Shivakumar. Crawler-Friendly Web Servers. ACM SIGMETRICS Performance Evaluation Review. Vol. 28, No. 2, pp. 9-14. 2000.
  6. G. Rodriguez-Mula, H. Garcia-Molina, A. Paepcke.  Collaborative value filtering on the Web. Computer Networks and ISDN Systemes. Vol. 30, No. 1-7, pp. 736-738. 1998.
  7. J. Cho, H. Garcia-Molina.  The Evolution of the Web and Implications for an Incremental Crawler.  Very Large Data Bases. pp. 200-209. 2000.
All code owned and written by David Stites and published on this blog is licensed under MIT/BSD.