Request Trial 

Web Tracking: Key Mechanisms and How You Can Defend Against Them

[fa icon="calendar"] Jan 24, 2018 4:35:45 PM / by Valentín Carela

Valentín Carela

web-tracking-mechanisms-defenses-talaia.jpg

Organizations are constantly collecting our data as we browse the internet. In some cases, users consent to giving over this information, like when they choose to fill in an online form. In other cases, this occurs without the user’s agreement. The large scale collection and analysis of personal information in fact makes up the core business of many companies, including service providers, content providers and other third parties, who each use it for commercial purposes.

The reason behind this data collection is that it allows businesses to target ads to particular customer profiles or demographics, to use price discrimination or to identify details about customers, like their financial standing, age, or health. Governments might also make use of this data for surveillance purposes, and identity theft criminals also attempt to access this user data.

However, the majority of users don’t want to be tracked in this way, nor to share their personal data with corporations without prior consent. Some preliminary defense practices have been developed, such as using ad blockers and clearing the browser cache - however as ways of tracking online are constantly evolving and becoming more sophisticated, these methods simply aren’t enough to defend against all threats. Indeed, only a very small number of internet users are fully aware of the sheer number of web tracking mechanisms that gather data about them as they browse. So, let’s take a deeper look into this growing practice. What are the different web tracking mechanisms? And how can users defend against them?

Web Tracking Mechanisms

The results of a collaborative research about Web Tracking, conducted by Talaia researchers and UPC BarcelonaTech, have been published in the Proceedings of the IEEE. The comprehensive survey presents more than 25 different web tracking methods that can be divided into 5 main groups: session-only, storage-based, cache-based, fingerprinting & other methods. Here’s an overview of the different approaches being used, and what you can do to protect yourself and your personal information in the face of these mechanisms:

Cookies

A commonly used method to identify and gain information about users is utilizing cookies. HTTP cookies are small pieces of data (each limited to 4 KB) which are placed in a browser storage by the web server. When a user visits a website for the first time, a cookie file with a unique user identifier can be stored on the user’s computer. Following this, the website can retrieve this identifier every time the user visits it. Thanks to a new law in 2011, companies are now forced to inform users of the use of cookies, which they can then choose to refuse or accept.

Flash cookies are used by Adobe Flash to store data on a user’s computer. They can be up to 100 KB big and since they are stored on the disk directory (as .sol or .sor files), they don’t expire and are more difficult to eliminate for the average user. Additionally, as Flash plugins share the same directory regardless of browser type, they allow for user tracking across all browsers.

How can I defend against it?

To minimize tracking via cookies, users can disallow HTTP cookies when visiting a new site (by simply press “no” when the website asks the user for consent). To avoid Flash cookies tracking, users can block Flash from placing data in their browser storage or disallow Flash cookies altogether in the browser or Flash manager settings.

Clickjacking

Clickjacking is a method of causing users to follow a link to a sensitive website element which can lead to user data being compromised. This can occur in many forms, mostly through the disguising of the attacker’s link among other site elements, or through fake web cursors which trick users into thinking that their cursor is in a different location on the page. Clickjacking can lead to compromising users’ anonymity, the theft of users’ email and other private data, seizure of Paypal credentials, or spying on a user by a webcam.

clickjacking web tracking talaia

By disguising links, input controls and other website elements, clickjacking is difficult to spot by users.

How can I defend against it?

Being cautious where you click on a page is the best defense against clickjacking - though users may also want to minimize tracking by disabling Javascript and Flash. Websites have started to require users to confirm their click decisions - such as in the case of Facebook Like buttons, and it is becoming increasingly difficult to hide links behind other elements thanks to randomizing user interface layouts.

Cooperation via CAPTCHA  

Other techniques involve the unwitting cooperation of users, such as one technique which  involves the creation of an image which poses as a CAPTCHA - in which users type a code into a field to confirm they’re not a robot. However, links to website destinations can be concealed within the words, letters or symbols of the CAPTCHA. Given that visited and unvisited links show in different colors on the browser, the symbols that users see, and therefore type into the field, discloses which destinations they’ve previously visited.

captcha web tracking - talaia

Visible words and characters represent links to website that the user hasn’t visited. Blank spaces are hidden links that the user visited before. By typing the words and characters into the box that users can see, they reveal part of their browsing history to attackers. 

How can I defend against it?

Unfortunately, there are minimal protections against this method. Tracking can be minimized by disabling JavaScript and Flash. Users can also disable pages being able to select their own fonts and be careful when filling in website CAPTCHAs.

Fingerprinting

A fingerprint is a unique identifier of any device, operating system, browser version or instance which is made up of values that can be obtained by a web service when a user browses a website. Fingerprinting gains access to these identifiers via a variety of methods. It permits tracking without storing any cookies, across multiple websites, and is completely transparent to users.

Device Fingerprinting

Device fingerprinting is a method which allows companies to identify a computing device - be it a laptop, tablet, PC or mobile device. This is particularly relevant when customers are using multiple devices to connect to the internet.

Location Fingerprinting

Location and network fingerprinting involves determining the global network and IP-based geographical location of any user - one of the most straightforward elements of a fingerprint that can be determined using the headers of incoming HTTP requests. By using network tools, the service is able to identify the name of the domain and the user’s Internet Service Provider.

Browser Fingerprinting

Browser fingerprinting, such as via CSS and HTML5 fingerprinting, allows sites to recognize the family and version of web browser being used.

Cross-browser fingerprinting

Cross-browser fingerprinting is more powerful and harder to evade than browser fingerprinting, since it lets people track you even if you use multiple browser. The underlying technology is based on a code that instructs browsers to perform a variety of tasks which require to draw on operating-system and hardware resources such as graphics cards, CPU cores, installed fonts and audio cards. Since these resources differ for each computer, users can be identified easily.

OS Instance Fingerprinting

Further to this, the version and architecture of a user’s operating system can be identified both by JavaScript and Flash. Other features which can be tracked include system language, timezone and local date and time, and whether a user has access to their camera and microphone enabled or not.

Passive Client Fingerprinting

Passive client fingerprinting collects attributes from network-connecting clients or servers. For example, TCP properties, TLS capabilities or characteristics about HTTP implementation can be collected from transport, session or application layers. Based on these attributes, operating system, system uptime and browser type can be identified. The three most popular passive fingerprinting techniques are TCP/IP fingerprinting, TLS fingerprinting and HTTP Fingerprinting.

Canvas Fingerprinting

Canvas fingerprinting is a commonly used tracking technique which relies on using the browser canvas API to draw invisible graphics. Since each browser instance will draw the graphics in a unique way, this identifies the browser being used.

How can I defend against it?

In order to defend against fingerprinting, users are advised to regularly clear their browser web cache. The Tor browser can provide greater anonymity, in particular in protecting against canvas fingerprinting since each time a canvas function is invoked, it triggers a request for permission from the user. Using VPNs to make your device appear to be connecting via another machine can also be effective in lessening fingerprinting. Users could also partly protect themselves by blocking JavaScript and Flash execution and using anonymous web proxies. Finally, OS Instance fingerprinting can also be partially countered by blocking Java, and ActiveX execution.

The Future Of Web Tracking

future of web tracking - talaia

In the next years, web tracking mechanisms will continue to evolve and users will be exposed to new types of data privacy threats. To face this development, several projects and initiatives are arising to defend users against web tracking. A good example is the Data Transparency Lab (DTL), a community of technology and industry experts, researchers and policymakers working to advance online personal data transparency through scientific research and design. It is backed by big tech companies like Mozilla and Telefónica. Another interesting organization that promotes defending digital privacy, free speech, and innovation is the nonprofit organisation Electronic Frontier Foundation (EFF).  One of their research projects called Panopticlick is particularly useful for internet users. The website includes a free tool that lets you check how good your defense against web tracking really is. In addition, in order to improve users' privacy defense, the EFF also developed the browser add-on  Privacy Badger to stop advertisers and other third-party trackers from secretly tracking the user.

Conclusion

Over time, methods of web tracking continuously evolved to become more and more invasive. Whilst in the past it was simple for users to prevent tracking, for example by removing HTTP cookies if they didn’t want them to be stored, recent technologies are significantly more difficult to avoid. Indeed, many users have very limited awareness that they are being tracked online which is particularly dangerous due to the potentially harmful nature of modern web tracking. Thankfully, there are a number of tips outlined in this article that users can follow to better protect themselves and their personal data online.

If you’d like to experience how Talaia’s network monitoring solutions can increase your security and privacy online, try our free 7-day trial here.

Start Free Trial

Topics: Cyber-Security

Written by Valentín Carela

Valentín is currently Product Manager and Lead Researcher at Talaia. He is also an external collaborator at the Broadband Communications Research Group (CBA) from UPC BarcelonaTech, where he obtained his Ph.D. in Computer Science about network traffic analysis and classification using Deep Packet Inspection and Machine Learning techniques.