At Talaia, we often receive questions about the mechanics of traffic classification engines and the accuracy of deep packet inspection products. Traffic classification refers to the identification of the layer 7 application responsible for network traffic. Traffic classification is traditionally achieved using deep packet inspection (DPI). In essence, this means matching traffic patterns that are known to belong to certain applications to packet contents. That is, capturing data packets, examining their contents, and checking if they conform to a known network protocol.
DPI engines are packaged in various forms, from software libraries to network appliances, and are a product of the effort of researchers and engineers who devote significant amounts of time to a) manually analyze network traffic and b) extract patterns that then DPI engines use to match packets to applications.
Therefore, a DPI engine is only as good as their creators make it be. In particular, a DPI engine is never complete (new applications appear all the time!) and there is a great deal of uncertainty about their accuracy and completeness. To aggravate things, it is not easy to check how good a DPI engine is at classifying traffic, as there is no accurate source of truth (ground truth) to systematically compare results against. Each DPI engine is good at detecting a particular set of applications (besides the most popular or trivially detectable).
Researchers from Aalborg University and UPC-BarcelonaTech have recently embarked in an effort to perform a comparison of the performance of several DPI engines. I was very fortunate to collaborate in this interesting study. We summarized our findings in a paper published in the January 2015 issue of Computer Networks titled “Independent comparison of popular DPI tools for traffic classification” (also available on my personal website).
The paper reviews several DPI products and reveals their actual performance, including popular open-source tools and a few commercial ones, including nDPI, libprotoident, Cisco’s NBAR and ipoque’s PACE.
Until now, the relative performance of these DPI tools was unknown, and the only accuracy figures available were those provided directly by their vendors, which one should always take with a grain of salt. The paper analyzes 6 DPI tools: 2 commercial (PACE and NBAR) and 4 open-source (OpenDPI, L7-filter, nDPI and Libprotoident).
The evaluation uses a reference dataset that was built with Volunteer-Based System (VBS). VBS is a tool to collect network traffic for academic research built by our colleagues at Aalborg University. VBS can be installed by volunteers and researchers and, besides collecting network traffic traces, it also logs the application behind each connection. With VBS, then, a dataset can be built for (among other uses) testing the performance of DPI products.
This reference dataset includes more than 750K traffic flows that belong to 70 popular applications, including YouTube, Facebook, Twitter, Gmail, Skype, Spotify, Dropbox and Bittorrent, to name few. Interestingly, this dataset (including packet payloads) has been made available to the research community and can be obtained here for academic use.
The study finds PACE to be the most accurate of all DPI products it tested. Interestingly, the paper also shows that the performance of open-source tools nDPI and Libprotoident is quite high, well above some of the commercial tools, such as Cisco NBAR, but not as precise as PACE. A combination of nDPI and Libprotoident would be able identify most applications and protocols in the dataset with an accuracy above 99%.
Onto the bad news, NBAR was especially disappointing, since it only identified 4 of the evaluated applications, and only obtained acceptable results at the application protocol level (e.g., DNS, HTTP or IMAP/SMTP, but for example, unable to identify HTTP services). It was also surprising that PACE showed difficulties detecting FTP traffic – but it it was the most accurate overall, and able to detect more applications than the others. Among the open source tools, l7-filter showed disappointing results. For example, it misclassified most HTTP traffic as Finger (?).
However, all these tools, including the commercial ones, struggled to identify the web servicesbehind HTTP and HTTPs traffic. Only PACE and nDPI were able to recognize part of the 34 evaluated HTTP services (17 and 7 respectively), while NBAR could identify none. This shows the limits of current DPI engines, and is why Talaia relies on complementary techniques to identify web services, as discussed in this earlier post in our blog.
Talaia does not rely directly on DPI to perform traffic classification: its main data input is flow-level data (NetFlow, IPFIX) or packet samples (sFlow). So, why is this study relevant to us? Traffic classification with the limited information that NetFlow provides is far from trivial (an issue that other vendors carefully avoid openly discussing). The way Talaia does it is quite unique: besides other techniques that are part of our secret sauce, it uses machine learning algorithms to mimic the results of DPI engines using only flow-level data. For this reason, there is a training phase, where we use the combined output of several DPI engines. So, the accuracy and completeness of DPI engines is of great interest to us.