Tag Archive: Software


PREVENTING USER AND HARDWARE TRACKING IN MOBILE DEVICES
by 
David Robert Stites

B.S., Purdue University, 2007

Thesis directed by Professor Rory Lewis

A thesis submitted to the Graduate Faculty of the University of Colorado at Colorado Springs in partial fulfillment of the requirements for the degree of Master of Science
Department of Computer Science

© Copyright By David Robert Stites 2012 
All Rights Reserved

Mobile devices, such as smartphones or PDAs, have become increasingly popular with consumers and often provide essential functionality in their everyday life. Usually these mobile devices contain a great deal of sensitive information such as addresses, contacts, ingoing/outgoing call logs, SMS messages and, on the latest models, a calendar, emails and potentially the user’s current location. A smartphone or mobile device today can be as powerful as a desktop or laptop in some respects and, while the latest models feature a complete OS, for many users these devices are “just phones” so there is an underestimation of the risk connected to mobile device privacy. There is a currently existing privacy problem associated with user and hardware tracking in mobile devices. Users can be tracked without their knowledge and consent and have rich profiles built about them using their hardware interface address regarding their location and preferences. This information can be potentially cross correlated to other existing datasets to build advertising profiles for these users. The mitigation to this problem using a framework to support randomly generated, disposable hardware addresses.

Full text of the document can be found here: PDF

The author surveyed over 230 applications (the full list of applications ,can be found in A Survey Of Mobile Device Security: Threats, Vulnerabilities and Defenses), including applications in the “Top” categories on the iTunes store to determine what type of information could be extracted from auditing packet streams. The results were quite surprising.

To perform this audit, the author launched one application at a time and used WireShark to capture and analyze packets. The experiment was performed on an open network that the author created. The access point was a Cisco Small Business router (WAP4410N) and was configured using a hidden SSID and MAC address authentication to prevent outside users from associating with the access point and introducing outside, extra packets. While the author realizes that hidden SSIDs and MAC address authentication are easily defeated mechanisms, it was used to prevent casual users from using the access point. The mobile devices used were an Apple iPod Touch 4G, an Apple iPad 1G and an iPhone 4, configured with iOS 5.0.1.

For reasons of classification, the authors created several different levels of potential security breaches. The levels are defined as:

  • None: This level is defined as having no potential security breaches and no exposure of confidential information.
  • Low: This level is defined as having a few potential security breaches or exposure of confidential information that could not directly affect the user, such as device IDs that could be used in tracking users (in iOS, these are called UUIDs).
  • Medium: This level is defined as having several potential security breaches or exposure of confidential information that is potentially serious or if information is exposed such that an attacker would be able to identify the user on an individual basis, such as addresses, latitudes or longitudes, etc.
  • High: This level is defined as having multiple potential security breaches or exposure of extremely confidential information, such as account numbers, PINs, and username/password combinations.

For more information on the specific application, including the version number of the application with the vulnerability, see Appendix A for a full listing.

Application

Level

Risks Found

GrubHub

Low

UUID

The Weather Channel

Low

Geocoded location

Path

Low

UUID

Handmade

Low

UUID

iHeartRadio

Low

Reverse Geocoded location

TabbedOut

Low

UUID, Platform

Priceline

Low

UUID, Geocoded location, “Search” API is unencrypted

Free WiFi

Low

Geocoded location

Coupious

Medium

Geocoded location, UUID, coupon redemption codes

Delivery Status

Medium

UPS transmits reverse geocoded locations and tracking numbers.

Color

Medium

Reverse geocoded location and photos taken and shared by users

Cloudette

Medium

Username in plaintext and password, hashed with MD5

Gas Buddy

Medium

Username and password, hashed with MD5

Ness

Medium

Reverse geocoded location

Southwest Airlines

High

Username and password in plaintext

Minus

High

Username and password in plaintext

WordPress

High

Username and password in plaintext

Foodspotting

High

Username and password, Geocoded location

ustream

High

Username and password, UUID, geocoded location

Labelbox

High

Username and password, geocoded location

 

The majority of the applications that were surveyed encrypted the exchanges of confidential or sensitive information, such as usernames, passwords and account numbers via SSL/TLS.

However, many applications performed some sort of tracking or storing of analytic information, such as passing the UUID in a call to a web service.  In some of the instances, this identifying information was not encrypted.  While not potentially dangerous in the sense that an attacker could use this information to “identify” a particular person, none of the applications let users know that their information such as UUID, phone OS and model, was being used or recorded, nor did they let the user “opt-out.”

The largest single potential security breach was with the Southwest Airlines application.  Due to the fact that the username and password were submitted to a web server via a POST operation in plaintext, an attacker could simply sniff for this data.  If an example was captured, one could use those credentials to log into a particular account and book travel, use award miles and possibly change information in the victims profile.  This not only obviously worrisome from the standpoint of a potential attacker fraudulently using a victims account and credit card information, but also due to the possibility of a terrorists threats in air travel.

For example, consider the possibility of a person who is currently (and rightfully) on the Department of Homeland Security’s “No-Fly” list.  If this person were able to capture a victim’s credentials and create a fake ID, he could pass through TSA security without being stopped.

Of the 253 applications surveyed, 91.7% had no risk found, 3.1% had a low risk, 2.3% had a medium risk and 2.3% had a high risk.  While it would be desirable to have no applications in the “Medium” or “High” category, the number of applications the authors found presented a security risk was both surprising and far too numerous.  There are over 500,000 applications on the iOS App Store, so extrapolating the results, there could be at least 15,500 applications in the “Low” risk category and 11,500 applications in the “Medium” and “High” risk category.

Overall, the number of applications with some sort of security risk is low.  This is not very surprising to the authors as many of these applications are in the “Top” applications list and any potential security flaws would have already been found.

Due to the fact that iOS does not have a robust privilege system, there is no way that a user could know their information was being used in a dangerous or insecure way.  While there is support for showing users there is network traffic by  using a spinning “network activity indicator”, it is certainly not mandatory for them to do so.  In fact, a legitimate or malware application could access the network interfaces, sending and receiving information and never alert the user on iOS.

Developers typically do not follow the principle of least privilege.  If an application needs a set of privileges for functionality, they will request them up front, not just when they are needed.  This is particularly dangerous because this could be an entry point for an attacker to compromise the application.

[19] performed research where they surveyed 940 Android applications and found that more than 50% required 1 extra unnecessary permission and 6% required more than 4 unnecessary permissions.  The reasons that developers may request more permissions than are necessary could be because 1) they don’t understand the importance of security and least privilege, 2) they are planning on future releases that will require these privileges and 3) they don’t fully understand how to work with the platform and make the code function correctly.

Since mobile devices and smartphones are unique in that they have a built-in billing system, there must be ongoing education of developers with emphasis on security and privacy or additional built-in measures in the OS to enforce security over code the developers write or the permissions for which they ask.

Here is the full list of applications tested.

Bold applications represent applications bundled with iOS from Apple.

Application

Version

Application

Version

Messages 5.0.1 RedLaser Classic 2.9.8
Calendar 5.0.1 eBay 2.4.0
App Store 5.0.1 Craigslist 3.033
Settings 5.0.1 Key Ring 5.4
Spotify 0.4.21 Coupious 1.4.1
Contacts 5.0.1 Cars 1.6.1
Notes 5.0.1 Amazon PriceCheck 1.2.1
Newstand 5.0.1 Linode 1.0.6
Reminders 5.0.1 Unfuddle 1.1.1
Find My Friends 1.0 MiniBooks 1.0.2
Videos 5.0.1 iTC Mobile 2.4
Vlingo 2.1.1 Blueprint viewer 1.7
Photos 5.0.1 Square 2.2
Camera 5.0.1 WordPress 2.9.2
Instagram 2.0.5 Maps 5.0.1
iMovie 1.2.2 FlightTrack 4.2.2
DashOfColor 3.1 Kayak 19.0.6
ColorSplash 1.7.2 Southwest 1.8.2
UStream Broadcaster 2.1 American 1.3.3
TiltShiftGen 2.02 Fly Delta 1.6
Gorillacam 1.2.2 Flysmart 2.5.25
CameraPlus 2.4 Priceline Negotiator 5.6
PS Express 2.03 Free WiFi 1.1.2
Dropcam 1.4.3 Google Earth 3.2
Chase 2.14.5799 Translator 3.1
Citibank 3.7 Phone 5.0.1
Discover 2.1 Mail 5.0.1
Fidelity 1.6.1 Safari 5.0.1
TD Trader 115.12 Music 5.0.1
PayPal 3.6 Flixster 5.02
Mint.com 2.0 Boxee 1.2.1
Stock 5.0.1 redbox 2.3.1
thinkorswim 115.12 Youtube 5.0.1
Geico 2.0.2 Fandango 4.5
Dropbox 1.4.6 XFINITY TV 1.8
1Password 3.6.1 IMDb 2.3.1
Alarm Clock 1.1 i.TV 3.4.1
Planets 3.1 MobiTV 1.0
Dictation 1.1 Netflix 1.4
Inrix Traffic 3.5.1 VNC 3.2.1
Adobe Ideas 1.2 RDP 2.8
IP-Relay 1.2 TouchTerm 2.1
iLlumination 1.0.1 Scorekeeper 4.1
Fake-a-call 5.05 Statware 1.0.3
HeyTell 2.3.2 NIKE+ GPS 3.2.1
Weather 5.0.1 MiLB Triple A 1.1.0
The Weather Channel 2.1.1 Pandora 3.1.16
Calculator 5.0.1 Shazam 4.8.4
Clock 5.0.1 Soundhound 4.1.1
Compass 5.0.1 iHeartRadio 4.0.1
Voice Memos 5.0.1 Last.fm 3.2.0
AroundMe 5.1.0 Songify 1.0.6
myAT&T 2.1.2 iTunes 5.0.1
WeddingWire 3.1 Virtuoso 1.0
LogTen 3.3.1 I Am T-Pain 2.0.0
French 1.0 Scrabble 1.13.78
Binary Calc 1.4 Harbor Master 2.1
Amazon 1.8.0 Zombie Duck 4.1
Groupon 1.5.7 Zombieville 1.7
LivingSocial 3.2.2 Table Tennis 4.1.0
Yowza 2.5 iFighter 1.9
Coupons Hired Gun 1.8
Airport Utility 1.0 Lock n’ Roll 3.0
Walgreens 3.0.2 Sneezies Lite 1.3
MyHumana 3.0.2 Pad Racer 1.1
Nike + iPod Uno 2.0.0
Gold’s Gym Spotter 1.2 CamWow 2.2
Lose It! 3.7.2 Labelbox 1.3.1
FitnessTrack 1.5.5 Photosynth 1.1.2
LIVESTRONG 1.0.1 Color Effects 3.1
MyFitnessPal Saturation 1.0
Nutrisystem 2.3 Peppermint 1.1
Kindle 2.8.5 FlickStackrXP 1.9.6
Instapaper 4.0.1 Minus 2.1.3
iBooks 5.0.1 Gallery 2.0.1
Zinio 2.2 Handmade 1.1
Twitter 4.0 StubHub 2.4.1
Facebook 4.0.3 Pushpins 2.0.1
Google+ 1.0.7.2940 Black Friday 2.0
foursquare 4.1.2 Sam’s Club 2.1.1
LinkedIn 4.2 Cyber Monday 2.1.0
Meebo 1.95 Words With Friends 4.1
Yelp 5.5.0 Ultimate Free Word Finder 1.01
PingChat Mad Gab 2.0
Bump 2.5.6 Metal Storm 4.0.2
Color 1.1 Need For Speed 1.0.11
Cloudette 1.0.1 Madden NFL 2010
soundtracking 2.0.2 Shizzlr 3.2.1
Free RSS 3.4 Flashlight 5.1
NetNewsWire 2.0.5 Tip Calculator 1.3.1
FOX News 1.2.4 PCalc Lite 2.4.3
OpenTable 3.4.2 Fake Call 1.1
Urbanspoon 1.17 To Do 3.2
Epicurious 3.0.1 Google 1.0.0.8117
WinePhD 1.2 Evernote 4.1.6
TabbedOut 2.3.3 Coin Flip 2.2
Foodspotting 2.7 Grades 2 2.03
GrubHub 2.20 Sundry Notes 3.2
RecipeGrazer 1.3 OneNote 1.2
Starbucks 2.1.1 Enigmo 4.1
Starbucks Mobile Card Angry Birds 1.6.3
Ness 1.1 JellyCar 1.5.4
iDisk 1.2.1 Runway 1.6
Remote 2.2 RockBand Free 1.3.49
Apple Store 2.0 Game Center 5.0.1
Find iPhone 1.3 App For Cats 1.1
Pages 1.5 PadRacer 1.1
Places 1.31 Implode 2.2.4
TripAdvisor 5.9 Astronut 1.0.1
Google Latitude 2.2.1 Monopoly 1.2.9
Gas Buddy 1.10 Deliveries 4.5
Maplets 2.2.2 Skype 3.5.454
iTranslate 5.1 Units 2.1.2
Translate 1.6.2 NCAA Football 2011
KG Free ESPN ScoreCenter 2.2.2
Wikipedia 2.2 Ski Report 2.2.1
White Noise 5.0.3 EpicMix 2.0.1
Sleep Machine Lite 2.0.1 MLB At Bat 4.6.1
Inception 1.6 Purdue 3.0
Sleep 2.0.1 NASA 1.43
Night Stand 1.0.4 80,000 Wallpapers 1.98
Geico BroStache 1.0.1 Wedding 911 1.06
CamCard 2.6.0.4 Path 2.0.2
Offline Pages 1.5.2 Facebook Messenger 1.5.2
GPS Tracker 1.2.2 Quora 1.1
TextPics Free 2.2 Big Button Box 3.0
Peel 2.0

By David Stites and Anitha Tadimalla {dstites, atadimalla}@uccs.edu
University of Colorado at Colorado Springs

Abstract— Mobile devices, such as smartphones and PDAs have become increasingly popular with consumers and often provide essential functionality in our everyday life. Usually these mobile devices contain lots of sensitive information, such as addresses, contacts, ingoing/outgoing call logs, SMS messages, and on latest models, a calendar, emails and potentially our current position. A smartphone or mobile device today is as powerful as a desktop or laptop and while the latest models feature a complete OS, for many users these devices are “just phones”, so there is a underestimation of the risk connected to mobile device security. This makes mobile devices an interesting target for malicious users. Damages that a user can sustain are financial loss, privacy and confidentiality, slowdown of processing speed, battery life.

Index Terms—mobile, security, malware, defense

  1. INTRODUCTION

Mobile devices, especially cellphones, have changed a great deal from their counterparts from the 1990s. Gone are the days of brick-sized phones, with a 1-line displays, 9 analog buttons and several KB of memory. In recent years there has been an explosion [2] of powerful mobile computing devices. These new smartphones and tablets, small enough to fit in your pocket or backpack, hold an immense amount of computing power. Information is available at a simple touch or finger flick and many users use these devices to access all sorts of data or services such as email, personal contacts, websites and even performing tasks normally reserved for a desktop system, such as video conferencing, watching movies or listening to music.

These devices are able to access the internet, download additional software from the internet, send and receive email, browse websites and send and receive SMS messages from other users. In addition to these capabilities, many subscribers use their cell phone as a primary method of communication, storing their personal contacts information (which include address, email addresses, phone numbers, etc.) as well as photographs they have taken. Additionally, many of these devices have a built-in GPS that allows the user to “geo-tag” photographs and use Location Based Services, such as FourSquare, Twitter and Facebook in addition to the basic mapping and GPS functionality.

Devices that run iOS (iPhone, iPad and iPod Touch), Android and Windows Mobile (which represent the majority of the market share) present a brand new computing paradigm in terms of availability, user interface and security. These devices are being targeted as never before by attackers [1]. Today more than 300 kinds of malware – among them worms, Trojan horses and other viruses and spyware have been unleashed against the devices [1]. Although desktop systems remain the most widely targeted platform, as mobile computing become more ubiquitous and powerful, the lines between a traditional desktop system and a mobile system will become blurred and these devices will gradually enter the virtual battlefield.

Clearly, these new capabilities mixed with the fact that users store personal information on the devices make it a prime target for attackers. There are three different basic categories of attacks that can be carried out against mobile devices and they include:

  • Confidentiality attacks: Data theft, data harvesting [3]
  • Integrity attacks: Phone hijacking [2]
  • Availability attacks: Protocol based Denial-of-Service attacks, battery draining [1]

These three categories represent a wide spectrum of security issues and there are many different attacks that an attacker could carry out. Any of the aforementioned attacks could range in severity from “low” to “high”, which makes this particular research area a wide open problem.

2. Survey Organization

This research paper is organized in the following manner: section III discusses some notable previous work that has been in this particular field, section IV discusses threats, vulnerabilities and defenses of mobile device security and sections V and VI presents some original work of the authors that show current and potential security threats to security from mobile device applications.

3. Previous Work

There has been a significant amount of previous work done in the area of mobile security. In [2], the authors show that since many of the mobile devices are based on similar codebases as their desktop counterparts (as in the case of Windows Mobile and iOS), rootkits, or stealthy malware that affects system programs and files, can equally affect the mobile version of these systems. In addition to showing that the systems can be exploited, they also show the implications of being able to compromise these devices.

In [4], the authors demonstrated the ability to deploy “a Trojan with few and innocuous permissions, that can extract a small amount of targeted private information from the audio sensor of the phone. Using targeted profiles for context-aware analysis, Soundcomber intelligently pulls out sensitive data such as credit card and PIN numbers from both tone- and speech-based inter- action with phone menu systems.”

In [5], the authors show that even though Android has an advanced permissions system with a sandboxed execution environment, a “genuine application [can be] exploited at runtime or a malicious application can escalate granted permissions and imply that Android’s security model cannot deal with a transitive permission usage attack and Android’s sandbox model fails as a last resort against malware and sophisticated runtime attacks.”

In [3], the author shows many of the attacks that occur on iOS have to do with theft or procurement of personal or private information. For example, iOS applications would have access to a user’s address book that includes phone numbers, addresses and email addresses. Additionally, an application can also access other data such as call history, carrier information and photographs.

4. Threats, Vulnerabilities and Defenses to Mobile Device Security

A) Attacks and Vulnerabilities

Since there has been explosive growth in the mobile device market and the demand for smartphones, tablets and other integrated devices have increased dramatically. According to Nielsen [5], “in the third quarter of 2009, smartphones accounted for 40% of new phones sold in the period, up from 25% in the prior quarter. And in the third quarter, for the first time, more people accessed the Internet from smartphones than regular phones. Assuming that 150 million people will be using smartphones by mid-2011, that means 120 million will be on the mobile Internet and 90 million, or 60%, will be watching video” according to Nielsen projections based on current data trends.

In addition to a enormous user base, people are spending more time on their mobile devices than ever before. “As of the second quarter, Nielsen has previously reported that some 15 million U.S. mobile subscribers watch video on their phones for an average of three hours, 15 minutes each month” [5].

Previously attacks such as the ones detailed later in this section haven’t been seen until now. Securing mobile devices present unique challenges to researchers, attackers, defenders and users. This “uniqueness” is due to the fact that computers don’t typically have some the interfaces that mobile devices do, such as GPS hardware, or similar data storage paradigms, such as storage in the cloud. Several potential reasons mobile malware and attacks are becoming more popular is because people are using their mobile devices for more day-to-day tasks such as banking, email and web surfing. This might cause users to store more “valuable” information on these devices and this represents a prime target for attackers due to the wealth of information they could obtain. In addition, while users may have good security habits when it comes to more traditional systems, they may not realize that their mobile device could be just as vulnerable as their servers, desktops and laptops.

I) History and Current Day Data

The first known cell phone virus, Cabir (EPOC.cabir and Symbian/Cabir), occurred in 2004, written by the group 29A [1, 8]. It affected devices that had Bluetooth modules and spread using Bluetooth via OBEX (object exchange). While it was originally designed as a proof-of-concept that would affect Symbian OS devices, it could spread to any Bluetooth enabled device such as desktop computers and printers.

The virus would activate each time the phone was turned on and immediately start looking for other hosts to infect. However, Cabir required the victim user to accept the transfer before any transfer could start. If an attacker were able to masquerade as someone the victim trusted, then they could easily socially engineer their way into infecting the victim’s phone.

Since the virus was released as a proof-of-concept to shock the security community into focusing on the importance of mobile malware, it did nothing other than drain the battery of the infected device while it constantly scanned for new targets to which to send the virus. A variant of Cabir, Mabir, was released that infected phones not only via Bluetooth but also by SMS, so it could use the carrier as an attack vector. This increased the infection potential of this virus exponentially since you didn’t have to be within the 10 meter range of Bluetooth to be infected.

Other historic, notable mobile malware include Duts (a virus for the PocketPC platform), Skulls (a trojan horse that infects all applications, including SMS/MMSes) and Commwarrior (a worm that used MMS messages and Bluetooth to spread to other devices). A more complete list of mobile viruses, trojan horses, worms and malware exist at [1, 7, 8].

Clearly, since mobile devices represent the future of computing, the fact that mobile malware is becoming more prolific shouldn’t be a surprise. In fact, F-Secure reported an almost 400% increase in mobile malware within a two year period from 2005-2007 [2].

Additionally, [9] reports that the amount of Android malware jumped 37% in the third quarter of 2011 and that Android is now the “exclusive platform for all new mobile malware. While the Symbian OS remains the platform with the all-time greatest number of malware, Android is clearly today’s target.” Today, there are currently over 1,200 known malware samples [9].

In [10], the authors performed a survey of mobile malware in the wild. They determined, that between 2009 and 2011, there were 46 pieces of malware released, including 4 for iOS, 24 for Symbian and 18 for Android. They also determined that the most common malicious activities were confidentiality attacks (collecting user information, 61%) and integrity attacks (sending premium-rate SMS messages, 52%).

I) Reasons for an increase in attacks

There are a number of reasons that the community is now experiencing an increase in malware including:

  • Increased computing power and storage capabilities: While many consumers may not recognize mobile devices as being equivalent in power to their larger counterparts (laptops and desktop PCs) due to their size, many smartphones, tablet devices and other PDA type devices have a rich set of hardware interfaces. Many smart phones, such as iPhones, Android-based phones, have powerful dual-core processors and a large amount of storage space to accommodate music, movies, documents and other types of media that can be consumed on the go.

In addition, the available software applications that come pre-installed on the device by the device manufacturer, such as web browsers, email clients, messaging applications allow the user to interact much more with the physical and virtual world than previous mobile devices. Furthermore, additional software applications can be downloaded, installed from the Internet and run by the user. These third-party applications are able to access this advanced hardware as well as GPS and network interfaces (3G, WiFi and Bluetooth).

This provides mobile malware and crimeware authors a much larger array of possibilities to carry out their attacks. In addition, more sophisticated hardware and software could make it easier for these attackers to “hide” their attack by ensuring that it only consumes a small portion of the resources, thereby modeling a legitimate application.

  • Increased network connectivity: There is a widespread availability of 802.11 WLANs and high-speed broadband data access (3G, WiMAX). These services allow users to stay constantly connected to services such as email and messaging at home, at work and in foreign places, such as coffee shops. In addition, many applications utilize network connections to either request or send data. For example, a game application might transmit a user’s high score to a web server for storage.

Additionally, some applications rely on the fact that the phone will have a network connection to receive and send data, such as the Amazon.com shopping application. Lastly, many recent applications utilize location-based services, such as Facebook, Twitter, and FourSquare. These applications provide additional functionality if they are able to access the network and a user’s current location.

Lastly, many applications can make use of social data, such as the friends one might have on Facebook. This Facebook data could be stored within the application. While Facebook might maintain rigorous security standards on who and what can access the user’s data, other third party applications might not be so careful with the user’s data.

  • Standardization of OS and interfaces: The OS is consistent on all the same family of devices, so malware applications would have more effect, being able to exploit the same security vulnerability across many devices. In addition, many manufacturers give third party developers access to the system to write applications for the platform. For example, one can freely download the Android and iOS SDK. Using this provided SDK, one could craft a virus or some other piece of malware and then submit it for inclusion in the respective application storefronts.
  • Enterprise integration: Many mobile devices, such as Android, iOS and Blackberry, support standards to be integrated into an enterprise environment. For example, many of the same devices have support for Virtual Private Networks (VPNs) as well as Exchange server integration. Thieves and malware authors recognize that this will greatly enhance the infection potential if a mobile virus, trojan horse or worm is able to spread from a mobile device to a corporate environment.

Consider the case of an employee with a mobile device becoming infected while at a coffee shop and taking that device back to the corporate environment where it could spread throughout the organization. Where previously an attack might have only stolen information pertaining to the particular victim, now the attacker could have potentially obtained information on many different people as well as corporate information.

  • Other reasons, social engineering and hacktivism: The number of socially-engineered attacks are becoming more prevalent and more sophisticated. [9] reports “that targeting content works based on cultural and sociological differences between geographic regions. Hacktivism have become part of the mainstream in 2011 due to groups such as Anonymous and LulzSec.”

I) Attack Vectors, Motivations and Types of Attacks

There are a number of attack vectors that exist for compromising confidentiality, integrity or availability. In fact, many of the attack vectors are the same ones that are available to desktop applications. Mobile attack vectors often spread via interfaces and services as well as interfaces unique to smart phones, including SMS and MMS.

While the motivation behind such attacks can be varied and numerous, as well as being outside the scope of this paper, [10] identifies some current and future incentives including:

  • Novelty and amusement
  • Financial Gain
  • Political Gain
  • Damage resources

In addition to numerous attack vectors and motivations, there are many different types of attacks that a malicious individual could attempt to carry out. [10] defined three main categories of attacks including 1) malware attacks 2) grayware attacks and 3) spyware attacks.

Malware attacks: “Malware attacks are attacks that gain access to a device for the purpose of stealing data, damaging the device, annoying the user, etc. The attacker defrauds the user into installing the malicious application or gains unauthorized remote access by taking advantage of device vulnerabilities. These particular types of threats provide no notice to the user and typically includes worms, Trojan horses and viruses [10].”

  • RF attacks: In this particular type of attack, an attacker could compromise confidentiality, integrity and availability. For example, if an attacker were to have the correct equipment, the attacker could sniff the air (WLAN and RF) for user data that the attacker could steal if it were unencrypted, such as usernames, passwords and account numbers. This particular type of attack is made easy if: 1) the network the user is on doesn’t utilize encryption and, 2) the application transmits confidential information in plaintext.

As we’ll see later on in section V, there are a number of iOS applications today that transmit private information without encryption. Additionally, if an attacker were able to successfully carry out a “mis-association” attack where the mobile device joins the wrong access point, the attacker could attack the integrity of communications by performing a “man-in-the-middle” attack, if the data were not validated or signed. Consider an example where a victim user sent a SMS message that said “Transfer $1000 to account X.” The attacker could alter the SMS to say “Transfer $10,000 to account Y”, where Y was his account.

Lastly, an attacker could also compromise the availability of RF services. Consider an example where an aircraft was WiFi enabled (passengers could access the WiFi during flight). An attacker could carry out a “disassociation” attack, where he transmits 802.11 management frames that cause the wireless clients to disassociate from the access point. The denial-of-service would continue until the attacker stopped transmitting the packets.

  • Bluetooth attacks: This particular class of attack has many different possible different attacks, similar to RF attacks, that can compromise confidentiality and integrity of data. In “blue-jacking”, an attacker’s malware application could insert contacts or SMS messages into a victim user’s mobile device. Additionally, in “blue-snarfing”, a user’s data is again under attack by allowing the attacker to steal or transfer the victim’s data. Yet another type of bluetooth attack is “blue-tracking” where an attacker could follow the victim’s movements. Lastly in “blue-bugging”, an attacker could listen in on conversations by activating the attack software and having the phone call them back. When the attacker answers the call back, the attacker would then be able to listen in to a conversation.
  • SMS attacks: “SMS spam is used for commercial advertising and spreading phishing links. Commercial spammers are incentivized to use malware to send SMS spam because sending SMS spam is illegal in most countries” [10]. In addition to sending and receiving regular-rate SMS messages, SMS can also be used as an attack vector by exploiting vulnerabilities in the software stack, such as performing SMS fuzzing.
  • GPS/Location attacks: In this type of attack, the attacker can access the GPS hardware to monitor the user’s movement and current location. This data can be used to create a profile of a particular user. In addition, this information could be sold to to other companies for purposes of marketing and sending advertisements to the user. A more insidious use of a GPS/location attack would allow a criminal to track when a victim leaves their residence and then the attacker could rob the person while they are gone.
  • Application masquerading and personal data attacks: This particular type of attack is as simple as accessing a user’s private data and saving, sending or using it in an unauthorized manner. For example, in [3], the author shows multiple examples of information that an application could access that could potentially be sensitive or confidential, such as a user’s phonebook, keyboard and location cache and photo albums.

In addition, [3] also showed that it would be relatively trivial for an application to traverse the file system of a mobile device, recording information that it could find valuable. This can all occur in the background and the user would never know that it is happening. This data could be sent off to a server and stored for later misuse.

  • Phone “Jailbreaking” and 3rd party application stores: While not encouraged by the manufacturer, (and in some cases against the carrier terms and agreements), many users prefer to “jailbreak” their mobile devices. Jailbreaking is the process of removing the security limitations that are imposed by the operating system, such as the ability to only run signed applications and install additional extensions and themes. This also allows the user to bypass application sandboxing mechanism and install applications from unofficial application stores. Users find this desirable because they can add functionality that didn’t previously exist or could get from any existing application on the application stores. However, users may not realize that this could potentially be a problem if they were to install a malicious application that would normally be killed by the OS on an un-jailbroken device. For example, one of the available exploits for iOS is when a user jailbreaks their iDevice, installs the OpenSSH package and doesn’t change the root password for the device, which is “alpine”. This would allow a hacker to login to their device as the root user and exploit the system because they would have full access to read, write and execute any command.
  • Premium-rate attacks: Premium rate services can deliver valuable content to a user’s mobile device. When used in a legitimate manner, a user could receive financial information, technical support or even adult services. These services can cost as much as several dollars per message or minute. In [10], they identified 24 of the 46 pieces of malware surveyed as sending premium-rate SMS messages. In one piece of malware, [10] found that applications purporting to be a Russian adult video player sent premium-rate SMS to an adult service.

Another piece of malware, Geinimi, sent premium SMS messages to numbers specified by the remote command servers. This is potentially a very large security concern, because these premium SMS messages don’t require a user’s permission to send, so they potentially could go unnoticed until an attacker has racked up hundreds or even thousands of dollars on a victim user’s phone bill. In addition to premium-rate SMS message attacks, [10] also found that 2 of the 46 malicious applications contained premium-rate phone call attacks.

  • Power management attacks: [11] describes three different classes of power management attacks. A power management attack is a form of a denial-of-service attack that affects the availability of the mobile device by draining the battery more quickly than it would under normal operation. The attacker’s goal is to maximize the difference in power consumption between active and sleep states and keep the device from sleeping.

The three classes of attacks include 1) service request attacks 2) benign power attacks 3) malignant power attacks. In service request attacks, repeated requests are made to the victim for services, typically over a network. Even if the request is ultimately not granted, the power must be expended by the device to determine whether or not to grant the service request. In benign power attacks, the mobile system executes valid, but power hungry tasks (such as displaying a hidden animated gif or executing a JavaScript). In malignant power attacks, attackers create or modify binary executables that force the mobile system to consume more power than it normally would. For example, consider an application that plays a silent audio track in the background while the application is running. This type of attack might be difficult for the user to detect because they may think that their battery can no longer hold a charge.

  • Time-activated and location-activated attacks: An attacker may choose to activate attacks at certainly locations or at a pre-determined or random time in the future. When the victim arrives at the location and uses the software, it could activate whatever is the intended malignant function. This would be fairly easy to do as many applications have access to the GPS. This type of attack may also sneak by any analysis of the application as it wouldn’t run every time the application was launched, but rather only under certain conditions.

In addition to malware attacks, [10] also defines several other types of “attacks” including:

Grayware attacks: In [10], the authors classified grayware attacks as “some legitimate applications collect user data for the purpose of marketing or user profiling. Grayware spies on users, but the companies that distribute grayware do not aim to harm users. Pieces of grayware provide real functionality and value to the users. The companies that distribute grayware may disclose their collection habits in their privacy policies, with varying degrees of clarity. Grayware sits at the edge of legality; its behavior may be legal or illegal depending on the jurisdiction of the complaint and the wording of its privacy policy. Unlike malware or personal spyware, illegal grayware is punished with corporate fines rather than personal sentences. Even when the activity of grayware is legal, users may object to the data collection if they discover it. Application markets may choose to remove or allow grayware when detected on a case-by-case basis.”

For example, in 2009, bloggers raised concern over the PinchMedia LLC analytics framework. This framework provided third party application developers usage information about their users. The users were not informed and they were not given an option to opt-out [3]. In addition, Storm8 and MogoRoad also faced legal issues when it was discovered that they were collecting users’ contact information without informing them and then transmitting the collected information in plaintext [3].

Spyware attacks: The last category that [10] detailed was “Spyware attacks.” “Spyware collects personal information such as location or text message history over a period of time. With personal spyware, the attacker has physical access to the device and installs the software without the user’s knowledge. Personal spyware sends the victim’s information to the person who installed the application onto the victim’s device, rather than to the author of the application. For example, a person might install personal spyware onto a spouse’s phone. It is legal to sell personal spyware in the U.S. because it does not defraud the purchaser (i.e., the attacker). Personal spyware is honest about its purpose to the person who purchases and installs the application. However, it may be illegal to install personal spyware on another person’s smartphone without his or her authorization [10].”

For more on platform specific exploits, in the papers we surveyed, please reference [1, 2, 3, 4, 6, 10, 11, 18, 20].

A) Defenses

When defending against mobile phone security threats and mobile malware, there are two main categories: prevention and detection and recovery. The majority of the work that has been already implemented falls into the class of prevention. However, detection and recovery is becoming a more popular research topic. We discuss some of the major defenses below:

  • Code analysis (static and dynamic): There are two main techniques of determining an application’s characteristics: statically and dynamically. Both have certain advantages and disadvantages.

In static analysis, many techniques may be used to determine how the program works, such as decompilation, decryption, pattern matching and static system call analysis. The central idea behind static analysis is finding signatures of malicious code in a fast and easy manner. In [3], the author describes part of the App Store review process for applications and detail that using static analysis, one can dump the strings in a binary file and check them against a black list of forbidden classes, method names and file paths. However, static analysis can be circumvented with obfuscation techniques. Additionally, some languages, such as Objective-C allow developers to lookup classes and methods by name at runtime. This feature of the language adds additional opportunity to “hide” malware functionality.

The primary disadvantage of static code analysis is that malicious code patterns have to be known in advance, making it impossible to automatically detect new malware or malicious polymorphic code without an intervention of a human expert [23].

Dynamic analysis is a set of techniques which involve running an application in a controlled environment and monitoring its behavior. Various heuristics can be used to capture behavior of the application such as monitoring file changes, network activity, processes, threads and system call tracing [23].

One large difference between the iOS App Store and the Android Marketplace is that applications that are released to the iOS App Store are reviewed by a human. This accomplishes an important goal in that it rejects applications that are in a “legal gray zone such as casino gambling or collecting personal data.” The Android Marketplace allows an author to post apps without needing any a priori review but rather it relies on “crowd-sourcing” to review the applications and post comments, either positive or negative.

  • ASLR and DEP: ASLR (address space layout randomization) is a computer security method that involves re-arranging the positions of key data areas, including the position of libraries, the heap and the stack in the process’ address space. This technique makes it more difficult for an attacker to predict target addresses [21]. For example, an attacker who is attempting to overflow the stack, would first have to locate the address of the stack before they could overflow it. If an attacker were to guess incorrectly and the program terminated, the next time the program were launched, the stack address will have moved again and the attacker would have to start all over again in their attempt to find it. While the full description of how ASLR is implemented and its effectiveness is outside the scope of this paper, the reader may learn more at [21].

In addition to ASLR, DEP (data execution prevention) is another security feature that prevents an application or service from executing code from a non-executable memory-region. This protection can be enforced in hardware and/or software. Again, while the full description of how DEP is implemented and its effectiveness is outside the scope of this paper, the reader may learn more at [22].

The platforms that support ALSR and DEP include iOS and Windows Mobile. Android has plans to support ASLR but the Blackberry platform has neither.

  • Application sandboxing: “Sandboxing consists of running mobile code in a restricted environment called a sandbox [17].” A sandbox can be characterized by two different mechanisms: 1) confining code, either through type checking, language properties or the use of protection domains to prevent the subversion of trusted code and 2) enforcing a fixed policy for the execution of code [17].

When applied to the real world, each application will have its own environment. Other applications should not be able to interfere with another applications environment nor should that particular application be able to interfere with other applications’ environments. Both iOS and Android implement application sandboxes. In the iOS model, each application only has read-write access to a few directories (the application’s own directory and a temporary directory) and applications cannot read or write to any other directories (including other applications and system directories) than for which it is authorized.

In the iOS sandbox, all applications share the same sandbox rules and they’re allowed any action any application could ever need [18]. Compared to the Android sandbox, “applications must explicitly share resources and data and do this by declaring permissions they need for additional capabilities not provided by the basic sandbox. These additional permissions are granted or denied by the user at install time only. [13]”

  • Permission systems: In Android, there is the additional protection of a permission system. This permission system “treats all applications as potentially buggy or malicious so they are assigned a low-privilege ID that can only access their own files [19].” Depending on what the application wants to do, it can request an elevation of permissions from the user, at install-time, which the user will ultimately grant or deny.

For example, there are several different levels of permissions including Normal (permissions that protect access that could annoy but not harm the user, such as setting the wallpaper), Dangerous (permissions that protect access that potentially harm the user such as gathering private information), and System (permissions that need access to the most dangerous privileges, such as deleting applications).

In addition to the permissions system, Android also has the Intents system. Intents are typed interprocess messages that are directed to particular applications or systems services, or broadcast to applications subscribing to a particular intent type [20]. Access to the Intents system is funneled through ActivityManagerService which restricts intents only being sent by applications with the appropriate permissions and processes with a UID that match the systems [19].

iOS does have a rudimentary concept of a permissions system but not one that is as in depth as the Android model. For example, when an application attempts to access the user’s location via the GPS hardware, the application will confirm with the user that this is an acceptable action, which the user can confirm or deny. However, for the most part, Apple relies on a number of other techniques to handle security issues, such as Sandboxing, ASLR/DEP and code analysis.

  • ACLs and capability lists: Both iOS and Android implement standard UNIX type permissions with users and groups. This allows the systems to implement the concept of “least privilege” where accounts and users are only able access data to which they are properly privileged to access. In iOS, files have permissions and third party applications runs as the user “mobile”, instead of root. In addition, certain operations on Android require the proper permissions, such as accessing as network interfaces. Typically, when a user “roots” or “jailbreaks” their device, they are elevating the permissions of the application to that of a more privileged account to bypass certain security features such as sandboxing.
  • Code signing: Code signing is a process where executable binaries are signed digitally by software authors to guarantee that the code that has not been altered or corrupted [15]. This is an important part of the defense systems in both the Android, Blackberry, Windows Mobile and iOS security model and it is used extensively in both models for validating third party applications. An important distinction between the two is that with iOS code signing, the code must be signed with a certificate validated by a certificate authority or CA (in this case, Apple), whereas with Android, self-signed certificates are acceptable [13, 14]

In addition to validating applications, iOS also uses code signing when booting up. When a device running iOS boots up, the first significant piece of code that runs on a device running iOS is the BootROM or the “SecureROM”, which is read-only [16]. Within this BootROM, an Apple root certificate is embedded such that the firmware can be validated as being official and secure. Once the RSA signature has been checked, control is passed to the low level bootloader or the “LLB.” This module runs several setup routines and checks the signature of iBoot, which is a stage 2 bootloader for all devices running iOS. Once iBoot starts, it allows the device to go into a “recovery” or “DFU” mode that allows devices to be restored from any state. Firmware images from Apple are signed and checked when the upgrade or downgrade occurs. If the device boots normally instead of going into DFU mode, control is passed to Launchd, CommCenter, and Springboard [11].

In addition to validating system software, iOS also signs the code directory structure with SHA-1 hashes of memory pages, and a PKCS#7 signature is embedded in all binaries that are downloaded from the App Store [11]. For binaries that do not validate at run time, they are killed by the OS to prevent any binary that has been altered from running. Lastly, to provide a defense against rootkits, for system binaries, code directories hashes are cached in the kernel [11].

While regular users may not know or care whether a binary is signed properly, this particular process creates accountability and increased trust on the platform if an application can be traced back to a known source.

The astute reader will recognize that a signed binary may not necessarily contain safe or bug-free code – just code verified as coming from a particular known source and has not been subject to tampering. Also, if the system does not strictly enforce running only validated binaries, users could be tricked (through social engineering perhaps) into running code that refuses to validate. Lastly, one potential flaw with Google’s scheme is that Google does not require that a CA sign the application signing certificate – they only record the signatures for book keeping purposes. This additional level that Apple introduces could potentially reduce the amount of malware and spyware on Android.

  • Data encryption: Data encryption and decryption is a very CPU and power intensive activity. Battery life on a mobile device is a very precious commodity because the usage model for a mobile device is such that the devices are meant to be used out in the field where a constant power source isn’t always available. As such, there hasn’t been any commercial implementations of automatic whole disk encryption. However, certain OSes, such as iOS, encrypts particular items such as the Keychain (a password manager and storage framework) and provides the ability to encrypt individual files. Another option for third party applications is to provide their own encryption services framework, using known models, such as PKI, and known programs, such as OpenSSL.

One interesting study of mobile device encryption was done by the authors of [28]. In their research, they explored the use of Field Programmable Gate Arrays or FPGAs, processors and ASIC hardware in the context of finding a framework for encryption on hand-held communication units. They used the IDEA encryption algorithm to show the tradeoffs in the suggested technologies. They measured their results using three different metrics: 1) performance, 2) programmability and, 3) power consumption. They determined that since power consumption is directly related to frequency, FPGAs provided the highest performance (MOPS/watt).

  • Detection and recovery defenses: There has been a lot of notable work done in this category of mobile malware defense. For example, [24] presented various approaches for mitigating malware on mobile devices. The authors implemented and evaluated the suggested approaches on Google Android. The work is divided into the following three segments: a host-based intrusion detection framework; an implementation of SELinux in Android; and static analysis of Android application files.

[24] determined that to provide well-rounded protection, a security suite for mobile devices or smart phones (especially open-source ones such as Android) should include a collection of tools blending various capabilities that operate in synergistic fashion.

In [24], the author’s first approach was an innovative host-based intrusion detection system (IDS) for detecting malware on mobile devices. This framework relies on a lightweight agent (in terms of CPU, memory and battery consumption) that continuously samples various features on a device, analyzes collected data using machine learning and temporal reasoning methods and infers the state of the device. Features belonging to groups such as Messaging, Phone Calls and Applications belong to the Application Framework category and were extracted through APIs provided by the framework; features belonging to groups such as Keyboard, Touch Screen, Scheduling and Memory belong to the Linux Kernel category.

This study on anomaly detection was based on various detection algorithms. The purpose of this study is to understand how a detection algorithm, a particular feature selection method and number of top features can be combined to differentiate between benign and malicious applications which are not included in the training set, when training and testing are performed on different devices and to find specific features that yield maximum detection accuracy. Empirical results suggest that the proposed framework is effective in detecting malware on mobile devices in general and on Android in particular (accuracy of 87.4% with false positive rate of 12.6%).

The author’s second study examined the applicability of detecting malware instances using a light version of the Knowledge-based Temporal Abstraction (KBTA) method that can be activated on resource-limited devices. The new approach was applied for detecting malware on Google Android powered-devices. Evaluation results demonstrated the effectiveness of the new approach in detecting malicious applications on mobile devices (detection rate above 94% in most scenarios) and the feasibility of running such a system on mobile devices (CPU consumption was 3% on average).

This study also proposed the implementation of SELinux in Android in order to harden the Android system and to enforce low-level access control on critical Linux processes that run under privileged users. By choosing this route, the system can be better protected from scenarios in which an attacker exploits vulnerability in one of the high privileged processes.

[25] is another interesting study using malware behavior detection. This paper proposed a behavior-based malware detection system for Windows Mobile platform called WMMD (Windows Mobile Malware Detection system). WMMD uses API interception techniques to dynamically analyze an applications behavior and compare it with a malicious behavior characteristics library using model checking.

The architecture of the proposed framework consists of two modules: a Dynamic Analysis Module and a Behavior Detection Module. Both modules use the API interception technique to obtain software running information. The Dynamic Analysis Module is responsible for analyzing the program’s behavior. The Behavior Detection Module monitors the process’ real time information and compares it with abehavior signature library. Once it detects a mal-behavior, it responds to the user and offers a feedback to construct new behavior model.

All the experiments were done first in Windows Mobile Emulator in PC and then verified in a real mobile phone. The Emulator is Windows Mobile 6.0 professional version and the real mobile phone is HTC PPC6800 with Windows Mobile 6.0 OS.

To test WMMD’s effectiveness towards obfuscation or packing techniques, they used UPX packer to pack five Windows Mobile viruses, and compare WMMD to six other anti-virus software packages (Windows Mobile 6.0) which included an updated virus signatures database. Testing results revealed that all of the other anti-virus products can only detect the virus before packing and they are useless when the virus was packed. However, the WMMD can detect those viruses after packing, since WMMD checks the API call in real execution which cannot be changed. For example, WinCE.Infojack.A is a trojan that binds to popular software installation files and it will extract a file named mservice.exe to the \Windows directory and create mservice.lnk file to the \Windows\StartUp directory. It then start this mservice.exe process. It is apparent that other variants of WinCE.Infojack.A have the same behavior and thus they only need to monitor the CreateFile() and CreateProcess() APIs to check whether their arguments satisfies the specific behavior.

This evaluation on real-world mobile malware shows that behavioral detection can successfully detect malware variants which have certain behavioral patterns with existing patterns in the database, while other anti-virus product cannot detect.

In [26], the authors describe that malware doesn’t always need to be physically installed on the phone to affect the mobile device as they considered defending mobile devices against “proximity malware.” The dynamics of proximity propagation inherently depend upon the mobility dynamics of a user population in a given geographic region. Unfortunately, there is no ideal methodology for modeling user mobility. Traces of mobile user contacts reflect actual behavior, but they are difficult to generalize and only capture a subset of all contacts due to a lack of geographic coverage. Analytic epidemiological models are efficient to compute and scale well, but simplify many details. Synthetic models are flexible and provide the necessary geographic coverage, but lack the full authenticity of user mobility traces.

[26] assumed that devices have a trusted defense software component that can examine messages and files transferred between devices, securely record persistent information about these transfers, and control device hardware when necessary (e.g., disable radio communication). These assumptions may be strong, but not unreasonable given the increasing prevalence of trusted computing modules. However, if malware has the ability to disable defense software, we can predict what the result will be: unchecked propagation through a population.

The first strategy described in [26] simply uses local evidence to detect malware and prevent further dissemination by the device, such as by disabling the Bluetooth or WiFi radio. Preventing further propagation by disabling communication may inconvenience the user, but voice and messaging with the provider network remain possible. Disabling the malware prevents further propagation but makes no attempt to notify other devices or the network about the presence of malware. It serves as a useful baseline for comparison.

The second strategy described in [26] extends local detection with an active mitigation component. In this strategy, each device maintains a table S of signatures of malware files, such as an MD5 hash over the file content. After a device X infers that it is infected, it disables the malware and warns subsequent devices about it. Device X computes a content-based signature s over the file(s) that triggered the infection recognition (e.g., the hash it has used to track file transfers in the first place). When X comes into proximity contact with another device Y, X disseminates the signature s to Y . If Y is infected, it immediately disables the malware. Y then adds s to its signature table S. Whenever another device shares a file with Y, Y will check the file against the signatures in S. The device can then either delete the file, or warn the user about the file.

The third strategy described in [26] relies upon the network provider to disseminate signatures using a broadcast mechanism. In addition to standard unicast messaging, providers are also able to send data packets over broadcast at a low cost. In this strategy, whenever a device decides that it is infected, it sends the malware content to an anti-virus server in the provider network (using MMS). The server, since it presumably contains far greater processing power than the mobile devices, can compute a better quality signature. Also, due to access to anti-virus experts, the server may also be able to compute a patch that contains information on how devices may “cure” themselves and remove the infection from the device. Manual involvement in generating patches is also a possibility.

Lastly, in [27], the authors used data mining techniques to detect malware behavior. This paper proposed a technique of ontology-based behavioral analysis to develop a detection method for smart phone malware. In this experiment, a mobile environment is constructed in the laboratory. The HTC HD2 smart phone with Windows mobile 6.5 operation system was adopted as the main test platform. Then they installed and ran their mobile malware detection system (MMDS) on an HTC HD2 smart phone. Then the other smart phones sent files or messages to the HTC HD2 through MMS or SMS. The MMDS can automatically filter all files and message by extracting their behavioral characteristics. The system will determine the degree of danger of these behaviors. When users have confirmed that this message is in danger of intrusion, the system will refuse the MMS or SMS.

As a result of the experiment in [27], the proposed FPN model can detect most of new mobile malware. However, there exist two pieces of mobile malware that cannot be detected by the FPN model. Since the collection of mobile malware is difficult, one cannot gain various types of mobile malware to test. Thus, the FPN model through the behavior analysis of mobile malware based on the ontology theory may not detect the above-mentioned mobile malware.

I) Future Defense Work

The authors would also like to explore the feasibility of implementing other security features such as:

  • Enhanced permissions systems: It would be worthwhile to spend time researching how one could detect over-privileged applications and adjust their privilege level for only the privileges that are needed. Additionally, research into a permission models that allow finer-grain control on application permissions would allow developers and users to control exactly what information or modules could be used or accessed. Lastly, research into dynamic analysis models to determine malicious activity would be useful as mobile platforms and applications are becoming more complicated and future attacks could possibly target push notification systems and in-application purchasing systems.
  • Trusted computing modules: Trusted computing is a technology that ensures that a computer will consistently behave in an expected way and that those behaviors will be strictly enforced by hardware and software. If these modules could be included as hardware on the phone, we could ensure that hackers couldn’t deploy rootkits on phones because the boot process would be verified and secure.
  • Encryption modules: Many desktop operating sytems implement some form of full hard drive encryption, such as Windows BitLocker and Mac OS X FileVault. Using a hardware encryption module, it would be possible to encrypt the entire hard drive without consuming a large amount of battery power. For example, the user application space could be encrypted in a separate volume that is mounted and decrypted at boot time. In addition to encrypting the entire user application space, applications could also provide their own separate encryption keys to do encryption of application specific data.
  • Firewall modules: Since smartphones and mobile devices are now as powerful as desktop computers, and the trend of consumers using them as a replacement to desktop and laptop systems will continue, more personal and confidential data will be stored on mobile platforms. It would be valuable to invest research time into mobile firewalls and packet filtering in attempt to possibly detect whether or not data harvesting is occuring on the device and being transferred across a network interface.

5. Applications Survey

The author surveyed over 230 applications (the full list of applications ,can be found in Appendix A), including applications in the “Top” categories on the iTunes store to determine what type of information could be extracted from auditing packet streams. The results were quite surprising.

To perform this audit, the author launched one application at a time and used WireShark to capture and analyze packets. The experiment was performed on an open network that the author created. The access point was a Cisco Small Business router (WAP4410N) and was configured using a hidden SSID and MAC address authentication to prevent outside users from associating with the access point and introducing outside, extra packets. While the author realizes that hidden SSIDs and MAC address authentication are easily defeated mechanisms, it was used to prevent casual users from using the access point. The mobile devices used were an Apple iPod Touch 4G, an Apple iPad 1G and an iPhone 4, configured with iOS 5.0.1.

For reasons of classification, the authors created several different levels of potential security breaches. The levels are defined as:

  • None: This level is defined as having no potential security breaches and no exposure of confidential information.
  • Low: This level is defined as having a few potential security breaches or exposure of confidential information that could not directly affect the user, such as device IDs that could be used in tracking users (in iOS, these are called UUIDs).
  • Medium: This level is defined as having several potential security breaches or exposure of confidential information that is potentially serious or if information is exposed such that an attacker would be able to identify the user on an individual basis, such as addresses, latitudes or longitudes, etc.
  • High: This level is defined as having multiple potential security breaches or exposure of extremely confidential information, such as account numbers, PINs, and username/password combinations.

For more information on the specific application, including the version number of the application with the vulnerability, see Appendix A for a full listing.

Application

Level

Risks Found

GrubHub

Low

UUID

The Weather Channel

Low

Geocoded location

Path

Low

UUID

Handmade

Low

UUID

iHeartRadio

Low

Reverse Geocoded location

TabbedOut

Low

UUID, Platform

Priceline

Low

UUID, Geocoded location, “Search” API is unencrypted

Free WiFi

Low

Geocoded location

Coupious

Medium

Geocoded location, UUID, coupon redemption codes

Delivery Status

Medium

UPS transmits reverse geocoded locations and tracking numbers.

Color

Medium

Reverse geocoded location and photos taken and shared by users

Cloudette

Medium

Username in plaintext and password, hashed with MD5

Gas Buddy

Medium

Username and password, hashed with MD5

Ness

Medium

Reverse geocoded location

Southwest Airlines

High

Username and password in plaintext

Minus

High

Username and password in plaintext

WordPress

High

Username and password in plaintext

Foodspotting

High

Username and password, Geocoded location

ustream

High

Username and password, UUID, geocoded location

Labelbox

High

Username and password, geocoded location

The majority of the applications that were surveyed encrypted the exchanges of confidential or sensitive information, such as usernames, passwords and account numbers via SSL/TLS.

However, many applications performed some sort of tracking or storing of analytic information, such as passing the UUID in a call to a web service. In some of the instances, this identifying information was not encrypted. While not potentially dangerous in the sense that an attacker could use this information to “identify” a particular person, none of the applications let users know that their information such as UUID, phone OS and model, was being used or recorded, nor did they let the user “opt-out.”

The largest single potential security breach was with the Southwest Airlines application. Due to the fact that the username and password were submitted to a web server via a POST operation in plaintext, an attacker could simply sniff for this data. If an example was captured, one could use those credentials to log into a particular account and book travel, use award miles and possibly change information in the victims profile. This not only obviously worrisome from the standpoint of a potential attacker fraudulently using a victims account and credit card information, but also due to the possibility of a terrorists threats in air travel.

For example, consider the possibility of a person who is currently (and rightfully) on the Department of Homeland Security’s “No-Fly” list. If this person were able to capture a victim’s credentials and create a fake ID, he could pass through TSA security without being stopped.

Of the 253 applications surveyed, 91.7% had no risk found, 3.1% had a low risk, 2.3% had a medium risk and 2.3% had a high risk. While it would be desirable to have no applications in the “Medium” or “High” category, the number of applications the authors found presented a security risk was both surprising and far too numerous. There are over 500,000 applications on the iOS App Store, so extrapolating the results, there could be at least 15,500 applications in the “Low” risk category and 11,500 applications in the “Medium” and “High” risk category.

Overall, the number of applications with some sort of security risk is low. This is not very surprising to the authors as many of these applications are in the “Top” applications list and any potential security flaws would have already been found.

Due to the fact that iOS does not have a robust privilege system, there is no way that a user could know their information was being used in a dangerous or insecure way. While there is support for showing users there is network traffic by using a spinning “network activity indicator”, it is certainly not mandatory for them to do so. In fact, a legitimate or malware application could access the network interfaces, sending and receiving information and never alert the user on iOS.

Developers typically do not follow the principle of least privilege. If an application needs a set of privileges for functionality, they will request them up front, not just when they are needed. This is particularly dangerous because this could be an entry point for an attacker to compromise the application.

[19] performed research where they surveyed 940 Android applications and found that more than 50% required 1 extra unnecessary permission and 6% required more than 4 unnecessary permissions. The reasons that developers may request more permissions than are necessary could be because 1) they don’t understand the importance of security and least privilege, 2) they are planning on future releases that will require these privileges and 3) they don’t fully understand how to work with the platform and make the code function correctly.

Since mobile devices and smartphones are unique in that they have a built-in billing system, there must be ongoing education of developers with emphasis on security and privacy or additional built-in measures in the OS to enforce security over code the developers write or the permissions for which they ask.

6. Using Mobile Devices as Network Monitors

We also researched the possibility of using WiFi sniffing and cracking utilities on the iPhone, as well as the feasibility of releasing a spyware application into the iTunes App Store and collecting user information. As a side note, all the techniques used here could easily be ported to other mobile platforms, such as Android, but this particular research focused on iOS devices.

To be able to perform basic packet sniffing, there are several critical elements that must be performed. The first element is to be able to put the particular interface into a “promiscuous” mode, in the case of wired network interfaces or “monitor” mode, in the case of wireless network interfaces.

In normal network interface operation, the kernel will discard any packets that are not destined to the specific node. Using this “monitor” mode, we are able to capture and further analyze these packets that would normally be discarded. This was a particularly easy piece of code (see Appendix B) to write (if one understands the BSD subsystem and C). In our particular implementation, we have a utility function to toggle the “promiscuous” or “monitor” bit (IFF_PROMISC) in the command word flag (SIOCGIFFLAGS) as well as a function to list all the interfaces that the OS knows about.

In addition to being able to manipulate the firmware of the network card, we also need to be able to access the bpf files or Berkely Packet Filter devices that are located in /dev. These bpf devices “provide a raw interface to data link layers in a protocol independent fashion. All packets, even those destined for other hosts, are accessible through this mechanism [29].”

The access to these files are restricted to root only. This presents a problem for as root access is restricted for applications that will be available in the App Store. Therefore, to install and distribute this application, one must have a jailbroken iOS device.

To give the programmer a more friendly access to the raw data from the bpf device and the packets, a user space program, libpcap (Unix or Linux) or WinPCap (Windows), is bolted onto these kernel-only devices. Interestingly, the authors discovered that the SDK for Mac OS included a pre-built library, libpcap.dylib. Unfortunately, that library is not available natively for iOS, but it can be cross-compiled for the arm architecture by downloading the source from www.tcpdump.org.

Lastly, to perform useful functions with this data, one must create an interface to analyze, extract and filter packet information at the application level. While, a .pcap file could be created for later analysis, it might not be as useful as having live analysis. This can easily be done with a user space program such as tcpdump or WireShark.

This particular research shows there could be a major potential data confidentiality problem with mobile devices. Assuming that the device is jailbroken and if we were able to release an app into the Cydia App Store that a user could download, we could silently harvest and store personal and sensitive information from that particular user, in addition to any other device that is on an open wireless network (assuming that the user was connected), without alerting the user. In addition the being able to sniff packets, since we are in the Cydia App Store, we would have full use of all APIs, including Apple private APIs to harvest personal information.

One question that the authors would like to explore in future research is if we could also use the aircrack-ng suite on mobile devices.

7. Conclusion

In this paper, we examined the history of mobile threats and vulnerabilities as well as current threats and vulnerabilities against mobile devices. We also researched and examined defense mechanisms that currently exist and proposed future research topics. Additionally, we performed two experiments, one as an audit of mobile application security and the other as the feasibility of turning a mobile device into a RF sniffing and data collection device.

Appendix A: Full Application List

Bold applications represent applications bundled with iOS from Apple.

Application

Version

Application

Version

Messages 5.0.1 RedLaser Classic 2.9.8
Calendar 5.0.1 eBay 2.4.0
App Store 5.0.1 Craigslist 3.033
Settings 5.0.1 Key Ring 5.4
Spotify 0.4.21 Coupious 1.4.1
Contacts 5.0.1 Cars 1.6.1
Notes 5.0.1 Amazon PriceCheck 1.2.1
Newstand 5.0.1 Linode 1.0.6
Reminders 5.0.1 Unfuddle 1.1.1
Find My Friends 1.0 MiniBooks 1.0.2
Videos 5.0.1 iTC Mobile 2.4
Vlingo 2.1.1 Blueprint viewer 1.7
Photos 5.0.1 Square 2.2
Camera 5.0.1 WordPress 2.9.2
Instagram 2.0.5 Maps 5.0.1
iMovie 1.2.2 FlightTrack 4.2.2
DashOfColor 3.1 Kayak 19.0.6
ColorSplash 1.7.2 Southwest 1.8.2
UStream Broadcaster 2.1 American 1.3.3
TiltShiftGen 2.02 Fly Delta 1.6
Gorillacam 1.2.2 Flysmart 2.5.25
CameraPlus 2.4 Priceline Negotiator 5.6
PS Express 2.03 Free WiFi 1.1.2
Dropcam 1.4.3 Google Earth 3.2
Chase 2.14.5799 Translator 3.1
Citibank 3.7 Phone 5.0.1
Discover 2.1 Mail 5.0.1
Fidelity 1.6.1 Safari 5.0.1
TD Trader 115.12 Music 5.0.1
PayPal 3.6 Flixster 5.02
Mint.com 2.0 Boxee 1.2.1
Stock 5.0.1 redbox 2.3.1
thinkorswim 115.12 Youtube 5.0.1
Geico 2.0.2 Fandango 4.5
Dropbox 1.4.6 XFINITY TV 1.8
1Password 3.6.1 IMDb 2.3.1
Alarm Clock 1.1 i.TV 3.4.1
Planets 3.1 MobiTV 1.0
Dictation 1.1 Netflix 1.4
Inrix Traffic 3.5.1 VNC 3.2.1
Adobe Ideas 1.2 RDP 2.8
IP-Relay 1.2 TouchTerm 2.1
iLlumination 1.0.1 Scorekeeper 4.1
Fake-a-call 5.05 Statware 1.0.3
HeyTell 2.3.2 NIKE+ GPS 3.2.1
Weather 5.0.1 MiLB Triple A 1.1.0
The Weather Channel 2.1.1 Pandora 3.1.16
Calculator 5.0.1 Shazam 4.8.4
Clock 5.0.1 Soundhound 4.1.1
Compass 5.0.1 iHeartRadio 4.0.1
Voice Memos 5.0.1 Last.fm 3.2.0
AroundMe 5.1.0 Songify 1.0.6
myAT&T 2.1.2 iTunes 5.0.1
WeddingWire 3.1 Virtuoso 1.0
LogTen 3.3.1 I Am T-Pain 2.0.0
French 1.0 Scrabble 1.13.78
Binary Calc 1.4 Harbor Master 2.1
Amazon 1.8.0 Zombie Duck 4.1
Groupon 1.5.7 Zombieville 1.7
LivingSocial 3.2.2 Table Tennis 4.1.0
Yowza 2.5 iFighter 1.9
Coupons Hired Gun 1.8
Airport Utility 1.0 Lock n’ Roll 3.0
Walgreens 3.0.2 Sneezies Lite 1.3
MyHumana 3.0.2 Pad Racer 1.1
Nike + iPod Uno 2.0.0
Gold’s Gym Spotter 1.2 CamWow 2.2
Lose It! 3.7.2 Labelbox 1.3.1
FitnessTrack 1.5.5 Photosynth 1.1.2
LIVESTRONG 1.0.1 Color Effects 3.1
MyFitnessPal Saturation 1.0
Nutrisystem 2.3 Peppermint 1.1
Kindle 2.8.5 FlickStackrXP 1.9.6
Instapaper 4.0.1 Minus 2.1.3
iBooks 5.0.1 Gallery 2.0.1
Zinio 2.2 Handmade 1.1
Twitter 4.0 StubHub 2.4.1
Facebook 4.0.3 Pushpins 2.0.1
Google+ 1.0.7.2940 Black Friday 2.0
foursquare 4.1.2 Sam’s Club 2.1.1
LinkedIn 4.2 Cyber Monday 2.1.0
Meebo 1.95 Words With Friends 4.1
Yelp 5.5.0 Ultimate Free Word Finder 1.01
PingChat Mad Gab 2.0
Bump 2.5.6 Metal Storm 4.0.2
Color 1.1 Need For Speed 1.0.11
Cloudette 1.0.1 Madden NFL 2010
soundtracking 2.0.2 Shizzlr 3.2.1
Free RSS 3.4 Flashlight 5.1
NetNewsWire 2.0.5 Tip Calculator 1.3.1
FOX News 1.2.4 PCalc Lite 2.4.3
OpenTable 3.4.2 Fake Call 1.1
Urbanspoon 1.17 To Do 3.2
Epicurious 3.0.1 Google 1.0.0.8117
WinePhD 1.2 Evernote 4.1.6
TabbedOut 2.3.3 Coin Flip 2.2
Foodspotting 2.7 Grades 2 2.03
GrubHub 2.20 Sundry Notes 3.2
RecipeGrazer 1.3 OneNote 1.2
Starbucks 2.1.1 Enigmo 4.1
Starbucks Mobile Card Angry Birds 1.6.3
Ness 1.1 JellyCar 1.5.4
iDisk 1.2.1 Runway 1.6
Remote 2.2 RockBand Free 1.3.49
Apple Store 2.0 Game Center 5.0.1
Find iPhone 1.3 App For Cats 1.1
Pages 1.5 PadRacer 1.1
Places 1.31 Implode 2.2.4
TripAdvisor 5.9 Astronut 1.0.1
Google Latitude 2.2.1 Monopoly 1.2.9
Gas Buddy 1.10 Deliveries 4.5
Maplets 2.2.2 Skype 3.5.454
iTranslate 5.1 Units 2.1.2
Translate 1.6.2 NCAA Football 2011
KG Free ESPN ScoreCenter 2.2.2
Wikipedia 2.2 Ski Report 2.2.1
White Noise 5.0.3 EpicMix 2.0.1
Sleep Machine Lite 2.0.1 MLB At Bat 4.6.1
Inception 1.6 Purdue 3.0
Sleep 2.0.1 NASA 1.43
Night Stand 1.0.4 80,000 Wallpapers 1.98
Geico BroStache 1.0.1 Wedding 911 1.06
CamCard 2.6.0.4 Path 2.0.2
Offline Pages 1.5.2 Facebook Messenger 1.5.2
GPS Tracker 1.2.2 Quora 1.1
TextPics Free 2.2 Big Button Box 3.0
Peel 2.0

Appendix B

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
int go_promisc(int);
int get_iface_list(struct ifconf *);
int get_iface_names(void);
int go_promisc(int on) {
int fd;
struct ifreq *ifreq = (struct ifreq *)malloc(sizeof(struct ifreq));
get_iface_names();
strcpy(ifreq->ifr_name, "en0");
fd = socket(AF_INET, SOCK_STREAM, 0);
if(fd < 0) {
perror("opening socket");
return -1;
}
int status = ioctl(fd, SIOCGIFFLAGS, ifreq);
if(status < 0) {
perror("ioctl(SIOCGIFFLAGS)");
status = -1;
}
if(on) {
ifreq->ifr_flags |= IFF_PROMISC;
ifreq->ifr_flags |= IFF_ALLMULTI;
}
else {
ifreq->ifr_flags &= ~IFF_PROMISC;
ifreq->ifr_flags &= ~IFF_ALLMULTI;
}
status = ioctl(fd, SIOCSIFFLAGS, ifreq);
if(status < 0) {
perror("ioctl(SIOCSIFFLAGS)");
status = -1;
}
close(fd);
return status;
}
int get_iface_list(struct ifconf *ifconf) {
int sock, rval;
sock = socket(AF_INET, SOCK_STREAM, 0);
if(sock < 0) {
perror("opening socket");
return (-1);
}
if((rval = ioctl(sock, SIOCGIFCONF, (char *)ifconf)) < 0) {
perror("ioctl(SIOGIFCONF)");
}
close(sock);
return rval;
}
int get_iface_names() {
static struct ifreq ifreqs[20];
struct ifconf ifconf;
int  nifaces, i;
memset(&ifconf, 0, sizeof(ifconf));
ifconf.ifc_buf = (char*) (ifreqs);
ifconf.ifc_len = sizeof(ifreqs);
if(get_iface_list(&ifconf) < 0) {
return -1;
}
nifaces =  ifconf.ifc_len / sizeof(struct ifreq);
printf("Interfaces (count = %d)\n", nifaces);
for(i = 0; i < nifaces; i++) {
printf("\t%-10s\n", ifreqs[i].ifr_name);
}
}
int main(int argc, char *argv[]) {
if(argc != 2) {
printf("usage: promisc [ON | OFF]");
return -1;
}
get_iface_names();
if(strcmp(argv[1], "ON") == 0) {
return go_promisc(1);
}
else if(strcmp(argv[1], "OFF") == 0) {
return go_promisc(0);
}
else {
printf("usage: promisc [ON | OFF]");
return -1;
}
return 0;
}

References

[1] M. Hypponen, “Malware Goes Mobile”, November 2006, Scientific American Magazine. Pages 70–77. http://www.cs.virginia.edu/~robins/Malware_Goes_Mobile.pdf

[2] J. Bickford, O. O'Hare, A. Baliga, V. Ganapathy, and Iftode. L, “Rootkits on Smartphones: Attacks, Implications and Opportunities”. ACM, In the Workshop on Mobile Computing Systems and Applications. Annapolis, MD. Feb. 2010.

[3] N. Seriot, “iPhone Privacy”, Black Hat DC 2010. Arlington, Virginia, USA. http://seriot.ch

[4] R. Schlegel, K. Zhang, X. Zhou, M. Intwala, A. Kapadia, X. Wang, "Soundminer: A Stealthy and Context-Aware Sound Trojan for Smartphones”. In Proceedings of the 18th Annual Network & Distributed System Security Symposium (NDSS) Feb. 2011.

[5] J. Rocha. “The Droid: Is this the smartphone consumers are looking for?”. Nov. 2011. http://blog.nielsen.com/nielsenwire/consumer/the-droid-is-this-the-smartphone-consumers-are-looking-for/

[6] A. Bose. “Propagation, detection and containment of mobile malware”. 2008. http:/hd1.handle.net/2027.42/60849

[7] http://www.symbianpoint.com/types-latest-list-mobile-viruses.html

[8] A. Gostev. “Mobile Malware Evolution: An Overview, Part 1”. Sep. 2006. http://www.securelist.com/en/analysis?pubid=200119916

[9] McAfee. “McAfee Threats Report, Third Quarter 2011”. 2011. www.mcafee.com/us/resources/reports/rp-quarterly-threat-q3-2011.pdf

[10] A. Felt, M. Finifter, E. Chin, S. Hanna, D. Wagner. “A Survey of Mobile Malware in the Wild.” ACM CCS Workshop on Security and Privacy in Smartphones and Mobile Devices (SPSM). Oct. 2011.

[11] T. Martin, M. Hsiao, D. Ha, J. Krishnawami. “Denial-of-Service Attacks on Battery Powered Mobile Computeres”. Proc. of Second IEEE Annual Conference on Pervasive Computing and Communications (PERCOM). 2004.

[12] K. Aras. “Jailbreaking iOS - How an iPhone breaks free.” Stuttgart Media University.

[13] Google. “Application Signing”. 2011. http://developer.android.com/guide/publishing/app-signing.html

[14] Apple. “Configuring Development Assets”. 2011. http://developer.apple.com/library/IOs/#documentation/Xcode/Conceptual/ios_development_workflow/100-Configuring_Your_Developmet_Assets/identities_and_devices.html

[15] “Code Signing.” 2011. http://en.wikipedia.org/wiki/Code_signing

[16] The iPhone Wiki. 2011. http://www.theiphonewiki.com/wiki/index.php

[17] S. Loueiro, R. Molva, Y. Roudier. “Mobile Code Security”. Institut Eurecom. 2011. http://www.eurecom.fr/~nsteam/Papers/mcs5.pdf

[18] C. Miller, “Mobile Attacks and Defenses”, IEEE Security and Privacy. Vol. 9, Issue 4, Pages 68-70, July-Aug 2011

[19] A. Felt, E. Chin, S. Hanna, D. Song, D. Wagner. “Android Permissions Demystified”. Proceedings of ACM conference on COmputer and Communications Security (CCS 2011). 2011.

[20] W. Enck, D. Octeau, P. McDaniel, S. Chaudhuri. “A Study of Android Application Security.” Proceedings of the 20th USENIX Security Symposium. 2011.

[21] H Shacham, M. Page, B. Pfaff, E. Goh, N. Modadugu, D. Boneh, “On the effectiveness of address-space randomization”. Proceedings of the ACM conference on Computer and Communications Security (CCS’04). New York. 2004.

[22] R. Hund, T. Holz, F. Freiling. “Return-oriented rootkits: bypassing kernel code integrity protection mechanisms”. Proceedings of ACM 18th Conference on USENIX Security Symposium (SSYM’09). Berkeley, CA. 2009.

[23] T. Blasing, L. Batyuk, A.D. Schmidt, S. A. Camtepe. S. Albayrak. “An Android Application Sandbox System for Suspicious Software Detection.”

[24] A. Shabtai. “Malware Detection on Mobile Devices”. Proceedings of IEEE on Mobile Data Management. Pages 289-290. 2010.

[25] S. Dai, Y. Liu, T. Wang, T. Wei, W. Zou. “Behavior based malware detection on mobile phones”. IEEE Wireless Communications Networking and Mobile Computing (WiCOM). Pages 1-4. 2010.

[26] G. Zyba, G. Voelker, M. Liljenstam, A. Mehes, P. Johansson. “Defending mobile phones from proximity malware”. Proceedings of IEEE INFOCOM. 2009.

[27] H.S Chiang, W.J. Tsaur. “Identifying smartphone malware using data-mining technology”. Proceedings of IEEE Computer Communications and Networks (ICCCN). Pages 1-6. 2011.

[28] O. Mencer, M. Mar, M.J. Flynn. “Hardware Software Tri-Design of Encryption for Mobile Communications Units.” Proceedings of IEEE Acoustics, Speech and Signal Processing. 1998.

Normally, I think that simpler is better. I enjoy simple things and not just necessarily things like the smell of cut grass, an ice cold beer at a friends BBQ, the first snowfall of winter or reading a book. After all, Occam was on to something (the law of succinctness is a principle which generally recommends selecting the competing hypothesis that makes the fewest new assumptions, when the hypotheses are equal in other respects). Simple is good – less moving parts, less things to break.

I also enjoy simple software and hardware interfaces. Things that I can just pick up and know instantly how to use without reading an instruction manual. Something so simple that Grandma could use it. Apple products always have incredibly simple, unified interfaces: “One Interface to rule them all, One Interface to find them, One Interface to bring them all, and in the darkness bind them.” Simple people living in a simple world. However, simple does NOT imply intuitive.

Ever since the dawn of the electronics age, electronics have had buttons or some sort of interface that had something you could push, pull, or manipulate in some other fashion. Now, there are rumors that iPad2 and iPhone5 will not have home buttons.  In Johnathan Geller’s post:

“We just got some pretty wild information … while it’s hard to believe at first, it does make sense. … The iPad will be losing the home button. … Instead of button taps, you will use new multitouch gestures to navigate to the home screen and … launch the app switcher.
We’re told that this change will make its way over to the iPhone as well. … Apple employees are already testing iPads and iPhones with no home buttons on the Apple campus. … Steve Jobs didn’t want any physical buttons on the original iPhone … it looks like he may soon get his wish.”

So what will be replacing the home button?  In iOS 4.3, which was seeded to developers yesterday, there are new gestures that be done with four or five fingers.  This is particularly a stretch since previous iOS releases supported UIGestureRecognizer to simplify recognizing things like taps, pinch-in and pinch-out.  You could customize the number of fingers needed to perform the gesture and this is simply an extension of existing functionality.  In iOS 4.3, a four or five finger pinch brings you to the home screen,  a swipe up or down reveals the multi-tasking bar and a swipe left or right allows switching between apps.

I am going to do what a devoted Applephite would never do and disagree with what Steve wants.  I think that this is a bad decision because, unlike Steve, regular users actually like buttons!  They like the tactile feel of pushing something, the affirmative “click” that something was done or something will happen.  Consider some software that would process information for whole minutes at a time without providing feedback in the form of a spinner or a progress bar.  Most users would get fed up and try to quit the application.  Without the feedback, user’s may not know what to do.  While these gestures may be fine for people who have been around technology their whole life (think GenY and younger), there are a lot of users who would have problems picking  up the device and straight away being able to use it.  I would even go so far to say that they may never discover some of the features without reading the manual or browsing forums for tips.  Apple has always pride themselves, on simple AND intuitive, easy to use software.  I don’t think regular users are ready for this.

To make my point even clearer, there are still people I know with iOS 4.2 that don’t know that double-tapping the home button brings up the task bar.  They literally have over 75 applications “running” and if it weren’t for me, they would have continued on into their own blissful world, not knowing of that functionality.  And that’s WITH a button.  These gestures should be included in the software in addition so user’s can have their choice of methods of how they want to interact with the device.

Usability aside, how would we perform operations such as recovering crashed phones?

Personally, I would rather see support for a 4G network and WiFi-less FaceTime support.  Although, who knows what the new feature set will be because apparently iOS 4.3 drops support for arm6 (3G).

This research article was co-authored by David Stites (dstites[at]uccs.edu) and Jonathan Snyder (jsynder4[at]uccs.edu)

Abstract—A common problem for many web sites is appearing as a high ranking in search engines.  Many searchers make quick decisions about which result they will choose and unless a web site appears within a certain threshold of the top rankings, the site may have a low throughput from search engines. Search engine optimization, while widely regarded as a difficult art, provides a simple framework for improving the volume and or quality of traffic to a web site that uses those techniques.  This paper is a survey of how search engines work broadly, SEO guidelines and a practical case study using the University of Colorado at Colorado Springs’ Engineering and Science department home page.

Index Terms—search engine optimization, search indexing, keyword discovery, web conversion cycle, optimized search, organic search, search marketing

1.  Introduction

Search Engines are the main portal for finding information on the web.  Millions of users each day make searches on the internet looking to purchase products or find information.  Additionally, users generally only look at the first few results.  If a website is not on the first couple pages it will rarely be visited.  All these factors give rise to many techniques employed to raise a website’s search engine ranking.

Search engine optimization (SEO) is the practice of optimizing a web site so that the website will rank high for particular queries on a search engine.  Effective SEO involves understanding how search engines operate, making goals and measuring progress, and a fair amount of trial and error.

Discussed in the following sections is a review of search engine technology, a step by step guide of SEO practices, lessons learned during the promotion of a website, and recommendations for the Engineering and Applied Science College of the University of Colorado at Colorado Springs.

In this survey, we spend our research efforts determining how SEO affects organic search results and do not consider paid inclusion.  For more information on paid inclusion SEO, see [1].

2.  A Review of Search Engine Technology

Search engine optimization techniques can be organized into two categories: white hat, and black hat.  White hat practices seek to raise search engine rankings by providing good and useful content.  On the other hand, black hat techniques seek to trick the search engine into thinking that a website should be ranked high when in fact the page provides very little useful content.  These non-useful pages are called web spam because from the user’s perspective they are undesired web pages.

The goal of the search engine is to provide useful search results to a user.  The growth of web search has been an arms race where the search engine develops techniques to demote spam pages, and then websites again try to manipulate their pages to the top of popular queries.  Search engine optimization is a careful balance of working to convince the search engine that a page is relevant and worthwhile for certain searches while making sure that the page is not marked as web spam.

The accomplishment of this objective requires understanding of how a search engine ranks pages, and how a search engine marks pages as spam.  Although the exact algorithms of commercial search engines are unknown, many papers have been published giving the general ideas that search engines may employ.  Additionally, empirical evidence of a search engine’s rankings can give clues to its methods.  Search engine technology involves many systems such as crawling, indexing, authority evaluation, and query time ranking.

2.1  Crawling

Most of the work of the search engine is done long before the user ever enters a query.  Search engines crawl the web continually, downloading content from millions of websites.  These crawlers start with seed pages and follow all the links that are contained on all the pages that they download.  For a url to ever appear in a search result page, the url must first be crawled.  Search engine crawlers present themselves in the HTTP traffic with a particular user-agent when they are crawling which websites can log.

In the eyes of search engine crawlers not all websites are created equal.  In fact some sites are crawled more often than others.  Crawlers apply sophisticated algorithms to prioritize which sites to visit.  These algorithms can determine how often a webpage changes, and how much is changing. [7]  Additionally, depending on how important a website is, the crawler may crawl some site more deeply.  One goal of a search engine optimizer is to have web crawlers crawl the website often and to crawl all of the pages on the site.

Search crawlers look for special files in the root of the domain.  First they look for a “robots.txt” file which tells the crawler which sites not to crawl.  This can be useful to tell the crawler not to crawl admin pages because only certain people have access to these pages.  The robots.txt file may contain a reference to an XML site map.  The site map is an XML document which lists every page that is a part of the website.  This can be helpful for dynamically generated websites where the sites may not all necessarily have links.  An example of this is on the Wikipedia site.  Wikipedia has a great search interface, but crawlers do not know how to use the search or what to search for.  Wikipedia has a site map which lists all the pages it contains.  This enables the crawler to know for sure that it has crawled every page.  When optimizing a website, it is important to make it easy for a site to be completely crawled.  Crawlers are timid to get caught in infinite loops of pages or download content that will never be used in search result pages. [5]

2.2  Indexing

After the search engine has crawled a particular page, it then analyzes the page.  In its most simple form the search engine will throw away all the html tags and simply look at the text.  More sophisticated algorithms look more closely at the structure of the document to determine which sections relate to each other, which sections contain navigational content, which sections have large text, and weight the sections accordingly. [3]  Once the text is extracted, information retrieval techniques are used to create an index.  The index is a listing of every keyword on every webpage.  This index can be thought of as the index in a book.  The index tells which page the subject is mentioned; however, in this case the index tells which website contains particular words.  The index also contains a score for how many times the word appears normalized by the length of the document.  When creating the index, the words are first converted into their root form.  For example the word “tables” is converted to “table”, “running” is converted to “run”, and so forth.

In order for a page to appear in the search engine results page, the page must contain the words that were searched for.  When promoting a website, particular queries should be targeted, and these words should be put on the page as much as possible.  However, this kind of promoting is abused.  Indeed, some sites are just pages and pages of keywords with advertisements designed to get a high search engine ranking.  Other tactics include putting large lists of keywords on the page, but making the keywords only visible to the search engine.  In fact some search engines actually parse the html and css of the page to find words that are being hidden to the user.  These kinds of tactics can easily flag a site as being a spam site.  Therefore one should make sure that the targeted keywords are not used in excess and text is not intentionally hidden.

One problem that many pages have is using images to display text.  Search engines do not bother trying to read the text on images.  This can be damaging to a site especially when the logo is an image.  A large portion of searches on the internet are navigational searches.  In these searches, people are looking for a particular page.  For these kinds of searches, it is hard to rank high in the search engine result page when the company brand name is not on the page except in an image.  One alternative is to provide text in the alt field of the image tag.  This may not be the best option however because the text in the alt field is text that the user does not normally see and is therefore prone to keyword abuse; hence, suspicious to the search engine.

2.3  Authority Evaluation

One of the major factors that sets Google apart from other search engines is its early use of PageRank as an additional factor when ranking search result. [3]  Google found that because the web is not a homogeneous collection of useful documents, many searches would yield spam pages, or pages that were not useful.  In an effort to combat this, Google extracted the link structure of all the documents.  The links between pages are viewed as a sort of endorsement from one page to another.  The idea is that if Page A links to Page B, then the quality of Page B is as high or higher than that of Page A.

The mathematics behind this model is a model of a random web surfer.  The web surfer starts on any page on the internet.  With the probability of 85% the random web surfer will follow a link on the page.  With the probability of 15%, the web surfer will go to a random page on the internet.  The PageRank of a given page is the probability that the user will be visiting that page.

One of the problems with this model is that any page on the internet will have some PageRank, even if it is never linked to.  This baseline page rank can be used to artificially boost the PageRank of another page.  Creating thousands of pages that all link to a target page can generate a high PageRank for the target page.  These pages are called link farms.

Many techniques have been proposed to combat link farms.  In the “Trust-rank” algorithm [4], a small number of pages are reviewed manually to determine which pages are good.  These pages are used as the pages that the web surfer will go to 15% of the time.  This effectively eliminates the ability to create link farms because a random page on the internet must have at least one page with trust-rank linking to it before it has any trust-rank.   Another technique is to use actual user data to determine which sites are spam.  By tracking which sites a user visits and how long the visits are, a search engine can determine which are the most useful pages for a given query. [6]

2.4  Result Ranking

With traditional information retrieval techniques, the results are ranked according to how often the query terms appear on a page.  Additionally, the query terms are weighted according to how often they occur in any document.  For example, the word “the” is so common in the English language that it is weighted very low.  On the other hand the word “infrequent” occurs much less often, so it would be given a higher weight.
During the beginning of the World Wide Web, search engines simply ranked pages according to the words on the page compared to the page’s length and relative frequency of the words.  This algorithm is relative simple to fool.  One can optimize landing pages to contain high frequencies of the targeted query terms.  To combat this, the relevance score and the authority score are combined to determine the final ranking.  In order for a document to appear in the search engine result page it must match the keywords in the query, but the final ranking is largely determined by the authority score.

3. Guidelines to SEO

Equipped with the knowledge of how search engines index and rank web sites, one has to tailor content in such a way that gives the best opportunity for being ranked highly.  There are many guidelines to SEO, but we attempt to distill the most important ones into our survey here.  For a more complete picture of SEO best practices and information see, [1].

3.1  Refine The Search Campaign

The first thing that the reader must consider is why are they going to do SEO on their web site?  In a broad sense, the purpose is to get more traffic volume or quality to one’s web site, but one needs to break this down into smaller subgoals.  We will look at defining the target audience to determine why people use search in the first place and then we will examine how one can leverage the searchers intent to design a web site that will best serve the audience.

3.2  Define The Target Audience

Typically searchers aren’t really sure on what they are looking for.  After all, that is why they are performing a search in the first place.  When designing a web site, the webmaster must be cognizant of the searcher’s intent.

Generally, a searcher’s intent can be broken down into 3 different categories.  Know which category a particular searcher might fall into is important in how one designs your web site and which keywords they might choose.

  • Navigational searchers want to find a specific web site, such as JPMorgan Chase and might use keywords such as “jp morgan chase investments web site.”  Typically, navigational searchers are looking for very specific information “because they have visited in the past, someone has told them about it because they have heard of a company and they just assume the site exists.  Unlike other types of searchers, there is only one right answer.” [1]  It is important to note that typically navigational searchers just want to get to the home page of the web site – not deep information.  With navigational searches, it is possible to bring up multiple results with the same name (i.e. A Few Guys Painting and A Few Guys Coding).
  • Informational searchers want information on a particular subject to answer a question they have or to learn more about a particular subject.  A typical query for a informational searcher might be “how do I write iPhone applications.”  Unlike navigational searchers, informational searchers typically want deep information on a particular subject – they just don’t know where it exists.  Typically, “information queries don’t have a single right answer.” [1]  By far, this type of search dominates the types of searches that people perform and therefore the key to having a high ranking in informational searches is to choose the right keywords for your web site.
  • Transactional searchers want to do something, whether it is purchase an item, sign up for a newsletter, etc. A sample transactional search query might be “colorado rockies tickets.”  Transactional queries are the hardest type of query to SEO for because “the queries are often related to specific products.” [1]  The fact that there are many retailers or companies that provide the same services only complicates matters.  In addition, it can be hard to dipher between whether the searcher wants information or a transaction (i.e. “Canon EOS XSi”).

After understanding the target searcher, one has to consider how the searcher will consume the information they find.  According to Hunt, et. al., “nearly all users look at the first two or three organic search results, less time looking at results ranked below #3 and far less time scanning results that rank seventh and below.  Eighty-three percent report scrolling down only when they don’t find the result they want within the first three results.”

So what does this mean for people doing SEO?  Choosing the correct keywords and descriptions to go on your site is probably the most important action one can take.  Searchers choose a search result fairly quickly, and they do so by evaluating 4 different pieces of information, included with every result, which include: URL, title, the snippet, and “other” factors.


Graph 3.1 – Identifying the percentages of the 4 influencing factors of a search result. [1]

3.3  Identify Web Site Goals

Now that we have identified the types of searchers that are looking for information, lets consider why one wants to even attempt SEO in the first place.  For that, one needs to identify some goals, including on deciding the purpose of your web site.  Generally, there six types of goals including:

  • Web sales.  This goal is selling goods or services online.  Note that this could be a purely online model (as in the case of Amazon.com or Buy.com) or it can be a mix between online and brick and mortar stores.  Web sales can be broken down further in specifying 1) whether or not the store is a retailer that sells many different products from different manufactures or if is a manufactures’ site and 2) what type of delivery is available (instant or traditional shipping).  In all cases, the ultimate desire is to increase the number of sales online.  This type of site would benefit from optimizing their site with transactional searchers in mind.
  • Offline sales. This goal involves converting web visitors into sales at brick and mortar stores. While this type of goal would benefit from transactional optimization, the benefits from SEO are much harder to calculate because there isn’t as good a measure on sales success.  The most important concept that a webmaster of an offline sales site can remember is to always emphasize the “call-to-action”, which the thing that you are trying to get someone to do, in this case, convert from web to physical sale.
  • Leads. Leads are conceptually the same as offline sales, however leads are defined by when the customer switches to the offline channel.  Customers who do some research online and then switch to offline are leads while customers who know model numbers, prices, etc. are offline sales.  With leads, the search optimization and marketing strategy needs to be different because customers who are leads are typically informational searchers instead of transactional searchers as in the case of web and offline sales.  Therefore, people who want to optimize for leads need to attract customers who are still deciding on what they want.
  • Market awareness. If a goal for one’s web site is market awareness, this is an instance of where paid placement could help boost your page rank quicker than organic results simply due to the fact that the product or service isn’t well know yet.  With market awareness, the web site mainly exists to raise awareness, so you would want to the site for navigational and informational searchers.
  • Information & entertainment. These sites exist solely to disseminate information or provide entertainment.  They typically don’t sell anything, but might raise revenue through ad placement or premium content, such as ESPN Insider.  Sites that focus on information and entertainment should focus almost exclusively on optimizing their site for informational searchers.
  • Persuasion.  Persuasion websites are typically designed to influence public opinion or provide help to people.  They usually not designed to make money.  In order to reach the most people, sites like these should be designed for a informational searcher.

3.4  Define SEO Success

A crucial element of undertaking SEO is to determine what should be used to measure the “success” of the efforts performed.  Depending on what type of goal the webmaster or or marketing department has defined for the website (web or offline sales, leads, market awareness, information and entertainment and persuasion), there are different ways to measure success.  For example, the natural way for a site that does web or offline sales to determine success is to count the conversions (i.e. the ratio of “lookers” to “buyers”).  If a baseline conversion ratio can be established prior to SEO, you can measure the difference between the old rate and the new rate after SEO.

This logic can be applied to any of the goals presented.  For example, with market awareness, you could measure conversion by sending out surveys to consumers or perhaps including some sort of promotion or “call-to-action” that would be unique to the SEO campaign.  If a consumer were to use that particular promotion or perform that action, it would give an indication of a conversion.

Another way to determine success is to analyze the traffic to the web site and there are a couple of different metrics that can be used for this including “page views and visits and visitors.” [1]  Using the most basic calculation with page views, you could determine a per-hour, per-day, per-week, per-month or even per-year average count of visitors to your website.  Using some tracking elements (such as cookies or JavaScript), you could determine the rise (or fall) or visitors to the site before and after SEO was performed.  The amount of information that this simple metric would provide would be invaluable because you could also determine peak visiting hours, which pages had the most hits at a specific time, visitor loyalty (unique vs. returning visitors), visitor demographics (location, browser and operating system, etc.) and even the time spent per visit.

3.5  Decide on Keywords

Keywords are the most important element of SEO.  It is important that during SEO, one focuses on the keywords that searchers are most likely to use while searching.  When choosing keywords, it is important to consider several different factors such as keyword variations, density, search volume and keyword “competition.”

Keyword variations play an important role in choosing keywords because one has have to consider how an average searcher may try to find information.  For example, certain keywords such as “review or compare” might be able to be used interchangeably in a search query.  In addition, some keywords are brand names that people automatically associate with products such as “Kleenex” for tissues and “iPod” for MP3 player.  If someone is able to have more variations on a certain sequence of keywords, they have a higher likelihood of being found.

Search volume also an big factor in determining which keywords to choose because one wants to choose keywords that people are actually searching for.  If a particular keyword only has 3,000 queries a month, it is much better to use a keyword that has 20,000 queries a month because 1) the searchers are associating the higher volume keyword with whatever the subject is instead of the lower volume keyword and 2) using the higher volume keyword will reach a larger audience.  However, this is not to say that low-volume keywords have no value.  In fact, the opposite is true.  Mega-volume keywords, such as brand names, often used by companies to compete to have the highest ranking.  If one were able to achieve the same or nearly same results with lower contention keywords, the SEO process would be much easier because fewer companies target them making it easier to achieve a high ranking with them.

Lastly, keyword density is an important factor.  When deciding  where to rank your page, the search engines have it figured out that anywhere from 3%-12% of the page is a good density for keywords, with 7% being optimal.  If it detects a higher percentage than this, the search spider might consider the page spam, simply because one is trying to stuff as many high volume keywords into the page as possible without any relevant context or content to the user.  Typically, this results in a lower ranking, no ranking or sometimes the page is even removed from the index or blocked.

3.5.1  Create Landing Pages for Keyword Combinations

The last step in assessing a site under consideration for SEO would be to identify pages on your site that you want your specific keyword queries to lead to when a searcher enters that query.  For example, P&G might want “best laundry detergent” to lead to their Tide home page.  Landing pages are the pages that these queries lead to and are “designed to reinforce the searchers intent.” [1]  Each of the keywords or phrases identified in section 3.4 must lead to a landing page and those pages must be indexed.  If there isn’t a landing page already for some of those keywords, then they must be created.

3.6  Page Elements That Matter And Don’t Matter

Now that the target audience, goals, keywords and landing pages are identified, it is crucial to consider the design of the web page.

  • Eliminate popup windows.  The content in popup windows are not indexed by spiders.  If important content, navigation  or links are contained inside the popup window, then it will not be seen by the spider.  The content needs to be moved outside the popup window.
  • Pulldown navigation.  Pulldown navigations suffers from the same problem as popup windows.  Since spiders cannot see these elements (mouse-over or click on them), they cannot index the content, which creates a large problem if the navigation is done with pulldown menus.  Either the pulldown menu must be done in a compatible way or the site has to allow for some other alternative means of navigation.
  • Simplifying dynamic URLs. Pages that use dynamic content and URLs must be simplified for the spiders to crawl.  The nature of a dynamic URL means that a spider could spend an infinite amount of time attempt to crawl all the possible URLs and that would produce a lot of duplicate content.  To deal with this, spiders will only crawl dynamic URLs if the URL has less than 2 dynamic parameters, is less than 1,000 characters long, does not contain a session identifier and every valid URL is linked from another page.
  • Validating HTML.  Robots are very sensitive to correctly formed documents.  It is important that the HTML is valid for the spider to get a good indication on what the page is truly about
  • Reduce dependencies.  Some technologies, such as Flash, make it impossible for the spider to index the content inside them.  The spider cannot view this particular content, so any important keywords or information inside it, is lost.  The content should be moved outside to allow indexing to take place.
  • Slim down page content.  A spider’s time is valuable and typically spiders don’t crawl all the pages of a bloated web site.  Google and Yahoo! spiders stop at about 100,000 characters. [1]  The typical cause of HTML page bloat is embedded content such as styling and other content such as JavaScript.  A simple way to solve this is to link to Cascading Style Sheet (CSS) so that you are able to make use of re-useable styles.  Another way to reduce JavaScript bloat is to use a program to obfusticate long files which replace long variable names such as “longVariableName”, which much shorter versions such as “A.”
  • Use redirects.  From time to time, pages within a web site move.  It is important to make accurate use of the correct type of redirects within your site so that when spiders attempt to visit the old URL, they are redirected to the new URL.  If they find that the server returns a 404 (Unavailable) for the old URL, the spider might remove that particular page from the index.  Instead, the proper way to indicate that the page has moved permanently is to use a server-side redirect, also known as a “301 redirect.”  This is returned to the spider when it attempts to navigate to the old URL and it is then able to update the index with the new URL.  A sample implementation might look like the following:

Redirect 301 /oldDirectory/oldName.html http://www.domain.com/newDirectory/newName.html

Note that spiders cannot follow JavaScript or Meta refresh directives. [1]  Additionally one can use a “302 redirect” for temporarily moved URLs.  See Fielding et. al for more information on “302 redirect.”

  • Create site maps. Site maps are important for larger sites because “it not only allows spiders to access your site’s pages but they also serve as very powerful clues to the search engine as to the thematic content of your site.” [1]  The anchor text used for the link could provide some very good keywords to the the spider.
  • Titles and snippets.  Together, the title and the snippet that the search spider extracts account for a large part of how they index a particular page.  The title, the most important clue to a spider on the particular subject of a page, is the most easily fixed element.  The title is a great place to use keywords that were previously decided on.  For example, the title element for StubHub, a ticket brokerage site is “Tickets at StubHub! Where Fans Buy and Sell Tickets.” Additionally, the snippet, or summary that the spider comes up with to describe their result is important as well.  Typically, the spider uses the first block of text that they run across to use for a snippet.  For example, for WebMD, Googlebot uses the following snippet “The leading source for trustworthy and timely health and medical news and information. Providing credible health information, supportive community, …”  In both of these examples, it is clear that having essential keywords present in both are highly correlated to having high page ranks.
  • Formatting heading elements.  Using traditional HTML subsection formatting elements, such as <h1>, <h2>, <h3>, etc. to denote important information can help give context clues to spiders on what text is important on a particular page.

3.6.1  The Importance of Links

Links, both internal and external, play a big role in SEO and in page ranking.  Search engines place a certain value on links because they can use these link to judge the the value of the information.  Similar to how scientific papers have relied on citations to validate and confer status upon an authors work, links apply status and value to a particular site.  “Inbound links act as a surrogate for the quality and “trustworthiness” of the content, which spiders cannot discern from merely looking at the words on the page.”  [1]

Several factors can influence how an algorithm ranks the link popularity on a particular page including 1) link quality, link quantity, anchor text and link relevancy.

Using the 4 link popularity factors from above, search engines use a theory of hub and authority pages to create link value.  Hub pages are web pages that link to other pages on a similar subject.  Authority pages are pages that are linked to by many other pages on a particular subject.  Therefore, search engines usually assign a high rank to these pages because they are most closely related to a searcher’s keywords. Using this model, it is easy to see why the harder an inbound link is to get, the more valuable it might be in terms of value.  For more information on inbound and outbound link importance and their value, see Hunt et. al.

3.7  Getting Robots To Crawl The Site

In addition to all of the goals and HTML elements above, it is crucial to have a “robots.txt” file that will allow the robot to crawl the site.  This file can give instructions to web robots using the Robots Exclusion Protocol.  Before any indexing occurs at a site, the robot will check the top-level directory for the “robots.txt” file and will index the site accordingly.  A sample “robots.txt” file might look like the following.

User-agent: *
Disallow:

This instructions applies to all robots or “User-agents” and they are to index the whole site because nothing is listed under the “Disallow” directive.  It is possible to tailor this file to individual robots and server content.  While the protocol is completely advisory, it is highly recommended to improve the search quality of what the robots index.  Note that the robot does not have to obey the “robots.txt” file, however most non-malicious robots do obey the instructions.

3.8  Dispelling SEO Myths

  • SEO is a one time activity.  SEO is a continual activity that must be revisited from time to time to ensure that the most up-to-date keywords and content are being used.
  • SEO is a quick fix process for site traffic.  Generating high quality organic traffic that will help with conversion is a slow process.  For each change that one makes to the web page, spiders must re-index that page and then calculate the new rankings.  It could take several months to years depending on your goal of achieving a higher ranking, passing a competitor in rankings or achieving the top results spot.
  • META tags help with page ranking. META tags were abused by keyword spammers by packing in as many highly searched keywords as possible, early in the days of developing search engines and many spiders now give META tags little to no credence.

4.  Promotion Of AFewGuysCoding.com

In order to test the authors hypothesis, we chose to perform our SEO experiment with AFewGuysCoding.com.  A Few Guys Coding, LLC is a small, author-owned company that provides contract engineering services mainly for mobile platforms (iPhone, iPod Touch, iPad, Android) but also does web and desktop applications.   This web site has never had SEO performed on it and was not designed with any such considerations in mind.

4.1  Initial Investigation For AFewGuysCoding.com

In order to determine what keywords we should focus on, we created a survey and asked the following question to people who are not engineers: “Suppose you were a manager of a business that had a great idea for a mobile phone application. You knew that you had to hire an iPhone, iPod or iPad developer because you didn’t directly know anyone who could do this work for you. What words, terms or phrases might you consider searching for in Google to find this person?”  Table 4.1 represents the range of responses that were provided.  In analyzing this data, the words provided were stemmed to account for different endings.  In addition, stop words, or words that are ignored because they don’t provide any significance to the query, were also filtered out and not considered.

Looking at the results and given the question posed to the survey audience, the percentages of the top three results did not surprise the authors.  What did surprise the authors were the relative high number of results for the keywords “technician”, “help”, “inventor/invention”, “creator” and “market/marketing” and the relatively low number of results for “mobile”, “phone”, “software” and “programmer.”

After careful considering, the authors chose to incorporate some of the keywords suggested by the survey audience into the web page.

Keyword Count Percentage
Developer 81 19.33%
Application 72 17.18%
iPhone 51 12.17%
Apple 24 5.73%
Phone 21 5.01%
iPod 17 4.06%
Mobile 17 4.06%
iPad 17 4.06%
Programmer 17 4.06%
Technician 11 2.63%
Help 10 2.39%
Mac/Macintosh 9 2.15%
Technology 6 1.43%
Market/Marketing 6 1.43%
Software 7 1.67%
Inventor/Invention 5 1.19%
Creator 5 1.19%
Business 4 0.95%
iTunes 3 0.72%
Designer/Designing 3 0.72%
Hire 3 0.72%
Handset 3 0.72%
Top 3 0.72%
Computer 2 0.48%
Company 2 0.48%
Contractor 3 0.72%
AT&T 1 0.24%
Store 1 0.24%
3G 1 0.24%
OS 1 0.24%
Code 1 0.24%
File 1 0.24%
Resource 1 0.24%
Sale 1 0.24%
Science 1 0.24%
Graphics 1 0.24%
Analyst 1 0.24%
Creative 1 0.24%
Devoted 1 0.24%
Energetic 1 0.24%
Smart 1 0.24%
Engineer 1 0.24%
Objective-C 1 0.24%
Total 419 100.00%

Table 4.1 – Responses from a search audience regarding possible keywords for AFewGuysCoding.com

4.2  Baseline Rankings

Prior to performing SEO on AFewGuysCoding.com, the Google page rank was 1/10.  The website was indexed by major search engines, such as Google, Yahoo!, Bing, AOL and Ask, but was not crawled on a regular basis.

Before performing an SEO activities, the amount of traffic that was coming from search engines was a little over 7% overall.  Most traffic was coming from direct referrals, for example when the user entered the address from a business card they had gotten (see graph 5.1).

In addition, the titles and snippets of the page were not as good as they could be (i.e. they didn’t include keywords or readily extractable content for the snippet).  For example, the title of the home page for AFewGuysCoding.com was “Welcome to A Few Guys Coding, LLC.”

Moreover, the web site was a transition from an older web site, www.davidstites.com, so many of the links already in Google referred to old pages that no longer existed.  When those pages were clicked on in the Google results, it took the user to a “404 Unavailable” page.


Graph 5.1 – Sources of traffic for AFewGuysCoding.com before SEO activities

In addition to the work done with keyword densities and titles, the author also started a blog and Twitter account, (http://blog.afewguyscoding.com, http://www.twitter.com/afewguyscoding), that addressed computer science and programming topics.  In the blog, we linked to other parts of the main site when a topic was referenced in a blog post that A Few Guys Coding dealt with, such as iPhone applications.  A summary feed of this blog was placed on the main page AFewGuysCoding.com web site and the authors noticed an increase in crawler traffic after the spider determined the content was changing frequently enough to warrant additional visits to the site to re-calculate PageRank.

4.3  Rankings After SEO

After an initial pass of SEO was performed to the web site, the traffic from search engines increased dramatically and the page rank increased 1 point to 2/10.  The authors believe that this increase was largely from changing the titles of the web pages themselves, changing keyword densities and tying keywords to landing pages.   Using Google Analytics, over a two month period, traffic for AFewGuysCoding.com was up 81.24%.

They word densities are not quite as high as 3%-11%, however, it is a marked improvement from before, when the important keyword densities were all 1% and below.  See Table 5.2.

Keyword Count Density Page
iPhone 10 2.55% /services
iPad 6 1.53% /services
iPod 6 1.53% /services
Code 10 2.55% /services
Software 2 1.32% /
Application 10 2.55% /services
Develop/Developer 4 1.42% /services

Table 5.2 – Keyword densities for certain pages on AFewGuysCoding.com

Graph 5.2 – Sources of traffic for AFewGuysCoding.com after SEO activities (April 2010)

Pages Page Views % Pageviews Avg. Time On Page
/ 387 66.38% 1:42
/learnmore 59 10.12% 2:09
/services 41 7.03% 1:08
/portfolio 44 7.55% 1:19
/contact 35 6.00% 0:45
/services/iphone 8 1.37% 0:39
/getaquote 6 1.03% 0:15
/services/ipad 3 0.51% 0:27
Totals 583 100.00% 1:03

Table 5.3 – Page View Overview for top 8 most visited pages in April, 2010

Graph 5.3 – Number of visitors per day for April, 2010 against Time on Site goal achievement rate

Another action that was taken by the authors was using Apache mod_rewrite and a redirect file, we we’re able to direct the spider to update their index to the new pages (from the older site) using a “301 redirect”.  We were able to transform the URL using mod_rewrite to match the current top-level domain.  This ensured that pages were not removed from the crawlers index due to being status 404.

Lastly, the authors set several goals for the web site (besides the increase in PageRank), including a “Time on Site” measure that would help measure SEO success.  If a particular user stayed on the site for longer than 5 minutes and/or had 10 or more page views, we considered this criteria meeting the goal.  See Graph 5.3 for a comparison of visitors to goals met.

4.3  Conclusions

Clearly, making simple changes to content (such as keywords and titles) can have a large effect on search engine ranking and the amount and quality of traffic that is directed to a site that has had SEO.  The difference in search engine traffic over a month represented an 400% increase. It might be reasonable to infer that an additional increase of 3-4% in keyword density might generate search engine referrals up to 50%-60%. The authors would like to see the effects of continuing SEO at a 3, 6, 9 and 12 month period.

5.  Recommendations To UCCS

Based on the research that we performed, we would like to make some suggestions to the UCCS EAS webmaster that would allow the UCCS EAS site to be ranked higher than it currently is.  Currently, the URL http://eas.uccs.edu/ has a Google PageRank of 4/10. By following the suggestions below, it may be possible to raise the Page Rank 1 or 2 points to 5/10 or 6/10.  We have broken down our recommendations into a four step process: define the scope and goals, understand the decision making process of a potential EAS student, create content, and build links.

5.1  Define the Scope and Goals

The first step is to define the scope and goals of the Search Engine Optimization efforts.  One question to ask is: Is this effort only related to getting more EAS students, or is it to promote UCCS as a whole?  How much effort can be put into this project?  The answers to these questions will determine the scope of the project.  One potential goal is to recruit more students to the college, but a new student may have attended even without the SEO campaign.  Consequently, a plan for measuring how students became interested in the college will be important.

5.2  Understanding the Decision Making Process

Key to understanding how to boost traffic through SEO practices is understanding the content target users are searching for.  This can be discovered through surveys or interviews with students that have gone through the process of choosing a college to attend.  This can be current UCCS students or students of other universities.  From my own experience I would guess that a typical decision making process would involve finding answers to these questions: Should I go to college? What is the best school for me?  Why should I go to UCCS?  Which major should I pick? How do I apply?  Where can I find help with my application?  Each of these questions is a good area for creating good content.

5.3  Create Content

Once the decision making process is understood, content can be created to answer the questions that people are searching for.  Special attention should be made to use keywords that people would search for when they have that question.  A quick look at the current EAS site reveals that there is little content related to recruitment.  Additionally, the current pages have few keywords that are searched frequently.  Additional recommendations are listed below that highlight some of the deficiencies of the current EAS pages.

  • Eliminate non-indexable content. The Adobe Flash content on the main landing page, http://www.eas.uccs.edu/ is unable to be indexed by spiders and robots.  All the information contained in that Flash element is lost.  Additionally, any images that contain content, are non-indexable as well.
  • Remove “expandable” navigation.  Spiders are unable to “click” on these individual sections to expand them and are therefore unable to crawl the linked pages.  The navigation should be reworked so that all links are accessible without needing to perform any special interface actions.
  • Choose better titles for individual pages.  Despite the actual pages content or purpose, all pages under the main landing page have the title “UCCS | College of Engineering and Applied Science.”  These titles should change depending on the subject or content of the page so that spiders and robots are able to create a more accurate index of the page.
  • Better use of heading formatting.  Headings should use valid heading tags such as <h1>, <h2>, <h3>, etc. so that the robots that crawl the site can extract main ideas and content from the page for the index.
  • Check and adjust keywords and keyword densities.  The densities of the keywords on the page are low should reflect what the page is about.  For example, on the application page for EAS, the keywords concerning admission and applying are low.  In fact, out of the top 18 keywords on the page only 4 or 5 have anything to do with admission and the densities are low, ranging from .79% to 2.37%.
Keyword Count Density
Engineering 63 5.54%
Science 43 3.78%
Computer 39 3.43%
Department 39 3.43%
Admission 27 2.37%
Application 23 2.02%
Colorado 20 1.76%
Electrical 19 1.67%
UCCS 18 1.58%
Mechanical 18 1.58%
Form 15 1.32%
Aerospace 14 1.23%
Springs 12 1.05%
College 12 1.05%
Applied 12 1.05%
Student 11 0.97%
Application 10 0.88%
Financial 9 0.79%
  • Include a site map.  A site map would help ensure that all the pages that were meant to be accessible to a web visitor are also accessible to a spider crawling the content.

5.4  Build Links

Lastly, it is important to build the authority score by getting other pages to link to the content pages.  Links can be built within organizations that already have relationships with the university.  For example the City of Colorado Springs, engineering organizations, and businesses that recruit from the EAS college could all be great sources of links.  Publishing press releases to news organizations of new websites could also be helpful in generating links.  Another form of link building could be simply link to the target pages heavily from other pages on the EAS site.

Another thing to consider is that people understand when content is on the EAS site it is going to be biased towards the EAS college.  Prospective students are not going to trust the document as much as if it were coming from a third party site.  One way to overcome this is to write content for other websites on the web.  This provides two benefits.  First it creates a seemingly unbiased source of information that can be slanted towards recruiting as EAS, and the content can link back to the EAS website providing a good link for building the authority score of a page.

By following these guidelines, the EAS college can succeed in generating more traffic on its recruitment pages and therefore seeing more students attend the college.

6.  Conclusion

In this research project, we have investigated the implementation of search engines.  We have also presented different elements that affect search ranking.  Based on our research and case study with AFewGuysCoding.com, we have provided recommendations to the UCCS EAS department to improve their PageRank within Google and other major search engines.  Indeed, search engine optimization is an important technique that any web master must master so that their site can be indexed as high in the rankings as possible.

References

  1. Hunt, B, Moran, M.  Search Engine Marketing, Inc,. Driving Search Traffic to Your Company’s Web Site, 2nd ed. (2008). IBM Press.
  2. Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., Berners-Lee., T.  RFC 2616 – HTTP/1.1 Status Code Definitions.  1999 [Online]. http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
  3. S. Brin and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems. 1998. pp. 107-117.
  4. Z. Gyongyi, H. Garcia-Molina, and J. Pedersen. Combating Web Spam with TrustRank. Very Large Data Bases. Vol. 30, pp. 576-587. 2004.
  5. O. Brandman, J. Cho, H. Garcia-Molina, N. Shivakumar. Crawler-Friendly Web Servers. ACM SIGMETRICS Performance Evaluation Review. Vol. 28, No. 2, pp. 9-14. 2000.
  6. G. Rodriguez-Mula, H. Garcia-Molina, A. Paepcke.  Collaborative value filtering on the Web. Computer Networks and ISDN Systemes. Vol. 30, No. 1-7, pp. 736-738. 1998.
  7. J. Cho, H. Garcia-Molina.  The Evolution of the Web and Implications for an Incremental Crawler.  Very Large Data Bases. pp. 200-209. 2000.
All code owned and written by David Stites and published on this blog is licensed under MIT/BSD.