Request Trial 

Advanced Application Identification with Talaia's Engine

[fa icon="calendar"] Jan 11, 2018 2:33:03 PM / by Valentín Carela

Valentín Carela

Copy of application identification.png

In recent years, network traffic classification has become crucial to understanding the Internet. Consequently, many solutions have been built to help accurately identify and classify network traffic.

Traffic classification is not an easy process though. The  continuous evolution of Internet applications and their techniques to avoid being detected is making their identification a very challenging task.

So, where should you start from? A detailed knowledge of exactly which applications are driving traffic is the key to provide the best service in your network. Some applications are more bandwidth or performance demanding, whereas some others will withstand temporary performance compromises. Either way, managing network issues relies on visibility and the capability to look at network performance impairments before and after they occur.

This means that network monitoring platforms must be designed to include application identification, for example in order to identify Netflix traffic to better understand the impact it may be having on your network or to pinpoint the number of Bittorrent peers in your network. This process needs to provide accurate results, be scalable, and have sustainable costs for the business. But how can platforms accurately identify applications? What are the main approaches to application classification and which one is the most reliable, sustainable and cost efficient? Let's find out.

netflix-application-identification-talaia.jpg

Identifying traffic from apps like Netflix allows us to better understand the impact apps may be having on the network.

3 Main Approaches To Real-Time Application Classification

Application classification can be carried out by a variety of hardware and software solutions. Typically, real-time application classification has been addressed using three main approaches: port numbers, DPI and Data Mining. Let’s take a look at these approaches in detail:

Port Numbers

The traditional approach to application classification relied on the protocol ports, which network monitoring solutions used to identify the applications. This obsolete technique is fast, cheap and does not require an inspection of the content of the traffic because it only relies on the port numbers that are on the packet headers. This is an important point, because this solutions do not arise privacy concerns. In addition, this technique is computationally lightweight and simple to understand and program, it just requires a matching fuction with, for example, the ports in the IANA register. On the other hand, classification using ports is currently fairly inaccurate, given that new applications have started using other applications’ ports (e.g., Bittorrent overport 80), or ephimeral and dynamic ports. Also, port-based classification does not provide classification at different granularities. As a result, the port-based classification lacks accuracy and completeness. Nonetheless, there are still some solutions that base their application classification on this approach.

Deep Packet Inspection

deep-packet-inspection-appliaction-identification-talaia.jpgWith the evolution of applications,  and the consequent difficulty to detect most of them with port-based solutions, Deep Packet Inspection (DPI) based techniques have surfaced. These approaches work by accessing the content of the traffic (i.e., packet payloads) and inspecting it in order to locate specific and identifiable patterns of the applications generating the traffic. This technique has limitations regarding encrypted traffic as well as some privacy concerns, given it is accessing the content that users are sending. In addition, in order to perform this classification in real-time, very expensive hardware is needed to carry out the operations. These high-level computation resources can be challenging to maintain and deploy. This means that DPI-based approaches can have some limitations in terms of scalability, given that you would need to buy a device for each link to be monitored, and change or replace them if you’re upgrading the bandwidth within your network. Though this approach can produce accurate results, it has some clear limitations.

Data Mining

Similarly to what happened with the port-based classification approach, where application developers started using dynamic ports in order to prevent applications from being classified, many network applications started to implement encryption in order to bypass DPI-based classification. As a result, the research community started to explore data mining approaches. These innovative approaches rely on the particular behavior of the traffic, its statistics and specifications (such as the number of packets, bytes, ports, flags). Whilst some application developers, such as BitTorrent, have begun obfuscating their traffic by adding random data, this practice has only beenadopted in very few cases, and therefore data mining techniques are not only the most innovative, but also the most effective ones at this time.

Data mining approaches can also be a bettersolution for businesses seeking application classification that can be flexibly and dynamically distributed according to the monitoring needs of the network. These approaches are based on machine learning and behavioral analysis techniques, which can study offline the correlation between particular traffic features and each application. Following this, the set of features can be leveraged to build a model, which is thenused to identify the network traffic online. Compared to DPI approaches, this software-based approach is less costly, fast enough to allow real-time application classification and it can be as accurate as hardware DPI-based solutions. It also avoids the privacy and scalability concerns involved in the DPI solutions. For these reasons, Talaia has utilized this approach for the development of its advanced application identification engine. 

Talaia’s Application Classification Engine

application-identification-talaia.jpg

Talaia's application identification engine is based on data mining and machine learning techniques

Talaia's application identification engine is based on acompendium of data mining techniques. The engine has been built thanks to an intensive research collaboration with UPC BarcelonaTech , as well as Talaia’s own research. It combines the latest research innovations with proprietary techniques in order to obtain a level of accuracy otherwise only achievable using expensive and unscalable DPI solutions.  Talaia’s engine achieves this by relying exclusively on flow metadata (e.g., NetFlow), which means it can be easily deployed in a matter of minutes in almost any network. 

Talaia's engine has the ability to classify traffic at three different layers. First, it identifies the traffic by application groups (e.g., Web, P2P, VoIP). Second, it classifies the traffic by the application itself (e.g., BitTorrent, Whatsapp, NetFlix, TOR). Finally, although it only relies on flow metadata, Talaia engine is able to classify the traffic at the domain level, even with encrypted traffic (e.g., www.linkedin.com, www.twitter.com). Talaia is continuosly updating the application classification engine and a new version with cutting-edge application identification capabilities has recently been released.

Conclusion

We can envision the development of the application classification approaches as a race between researchers, network operators and application developers. The reactions of developers to application identification approaches have made port-based solutions largely ineffective and led to DPI-based techniques being compromised. This triggered the rise of data mining solutions. Although some applications tried to avoid its classification  by obfuscating their traffic, data mining solutions remains the most innovative approach.

All in all, data mining approaches are superior than the others because they provide the ability to classify applications in real-time with high accuracy and without the high costs of DPI-based solutions. Therefore, as a pioneer in network visibility research and innovation, Talaia has chosen to leverage innovative data-mining approaches for the development of its advanced application classification engine.

If you’d like a first-hand experience of how Talaia’s network monitoring solutions can increase the visibility and security of your network, start you free 7 Day Trial now.

Start Free Trial

Topics: Cyber-Security

Written by Valentín Carela

Valentín is currently Product Manager and Lead Researcher at Talaia. He is also an external collaborator at the Broadband Communications Research Group (CBA) from UPC BarcelonaTech, where he obtained his Ph.D. in Computer Science about network traffic analysis and classification using Deep Packet Inspection and Machine Learning techniques.