Skip to content

Implementation

Michael Bolli edited this page Oct 28, 2018 · 2 revisions

Back end

Understanding how NfSen works

Before implementing the back end, a lot of time has been invested to get NfSen running. It took longer than anticipated particularly because of outdated and incomplete documentation. NfSen uses RRDtool as database and the team had a long discussion in trying to understand this choice, that time series databases are optimized for handling series of time-indexed data.

NfSen is mainly composed by Perl code. It was necessary to read large parts of the code to understand how the software was built, and how the writing to the RRD database is done. Because no one in the team was familiar with Perl, this caused additional time being spent in understanding the existing solution.

Create the database structure

Some tutorials were followed to create an RRD database and to insert data. By using the rrdgraph command it was then quite easy to save a diagram picture of the inserted data. All this playing around helped to get a better overview over what RRD is for and how to work with it.

To manipulate RRD databases from within PHP, the PECL RRD library (PECL: PHP bindings to rrd tool system, 2016) has been installed. The previously learned RRDtool shell commands have then been transformed into valid PHP statements and the RRD class implementation started. The RRD structure was taken from NfSen and modified, so that three years of data can be made available instead of one.

As the image files provided by the built-in rrdgraph are not responsive, we could not use this command. Instead, RRDtool provides a command called rrdxport, which accepts similar parameters like the rrdgraph command, and outputs averaged values over a timeframe in raw text.

Implement the nfdump interface

A class was implemented to execute nfdump commands from within PHP. This class made it possible to “put together” the needed options for the command and at the end execute it. All inputs get sanitized and escaped to provide basic security. Any return codes get caught, and converted to a more meaningful error message.

NfSen is using the nfdump “-I” command to get a summary of the data present in the capture file. A look at the output of this command made it clear, why NfSen did not support more protocols (apart from TCP, UDP, ICMP, and “other”). In fact, the command only returns values for these protocols. The initial plan to support any number of protocols has been quickly abandoned then.

When the first nfdump output had to be inserted into the RRD, it became interesting. nfdump has a command switch to return csv-formatted data, but NfSen is not using it for populating its RRD databases. This is because this exact command only does not support csv output. Like NfSen, there was no other choice, than to parse the output line by line.

Back end configuration and data import

After this worked for one capture file, an import class has been implemented, which is able to import all available data from a defined start date to the current date. While implementing this class, the need for a global config arose. First, we tried to use an ini file, but noticed very quickly that its functionality is very limited. For example, ini files do not support arrays (or any type of list). As a result, a plain old php file has been used instead, defining an array of key-values.

For debugging, a debug class has been implemented, which made it easier to print values, measure time intervals and logging events to the syslog facility.

API

During a team meeting, the team had to think about how the front end accesses any data on the backend. Consequently, a paper draft of an API was created and discussed, and at a later stage implemented.

The first working endpoint was /api/config, which provided the front end with information about the environment (e.g. which sources it should show in the dropdown list). The API then was extended with more endpoints to provide graph, flows and statistics data. To ensure some minimal security, only GET requests with the right number of parameters (matching the method) are allowed. Each parameter must match the actual parameter name and data type of the method. The endpoints return JSON. The API is described in the readme file of the nfsen-ng project (Bolli & Gallucci, 2017).

Autoloader

While putting the classes together, the need for autoloader functionality arose. Instead of including multiple class files in the top section of every other class, an autoloader gives the possibility to dynamically load the classes when they are needed. It took a bit of fiddling though, that made it work over multiple directories and namespaces. In the end, like in many situations the solution was simpler than assumed; move classes in different directories, but define their namespace matching the directory name.

Database interface for more flexibility

As mentioned before, the initial thought was to provide support for the Akumuli time-series database. Instead, support for RRD has been implemented, mainly because of time reasons. To still make it possible to add support for other databases in the future, an interface class named datasource has been created, which defines the methods a data source needs to implement.

CLI

A command line interface has been implemented. To provide an approximate ETA until import completion, the php-cli-progress-bar (Dawson, 2014) script has been included. This script makes it possible to show a progress bar, although integrating it has been a bit difficult.

Display data for ports over time

To support the display of port-specific data (like the PortTracker add-on for NfSen), some changes were necessary. The import class was updated to support ports data, but unfortunately due to the structure of RRD databases, it was not possible to incorporate this data in the existing RRD files. For example, to visualize data of port 22 on source A, a new RRD file A_22.rrd needs to be generated, featuring the same structure as the source RRDs. To avoid deleting the RRDs manually, an option -f (force) has been added to the CLI, to enable overwriting any existing data.

To enable the system to respond, even if a resource-intensive statistic is being generated (reading many nfcapd files), another config setting has been introduced. If max-processes is bigger than 1, the nfdump class may execute another nfdump command, even if another process is already running. As an additional information for the user, the actual executed command is sent to the front end as well.

Front end

Display a graph

The team searched for a suitable JavaScript charting library to use on the client side to render graphs. The criteria for the selection have been:

  • Examples: how does it look with huge data sets? How if there are gaps in the data?
  • Callbacks: is it possible to add own functionality?
  • Responsiveness: can a diagram adapt to any width?
  • Documentation: is everything up to date, clear and complete?
  • Comments from other users/developers

The first library that has been tested (Dygraphs) gave quick results and matched all the requirements out of the box. Its ease of use and quantity of available examples and ideas from the community motivated the team to use it.

Add options to the graph

Some buttons have been added to the front end to manipulate the graph scale (linear or logarithmic) and the display mode (stacked or line). Two elements have been created, one to host buttons to hide/show curves and a second to host their values together with curves colours. The last step has been to find a mobile-friendly way to provide a zoom feature allowing to zoom in and out.

Data for the graph

In the beginning, a CSV file containing only the data from one exporter (source) has been used. At that point, the graph was showing one source and “all” protocols (TCP, UDP, ICMP and “others”). However, NfSen always displays curves for sources rather than protocols. This led to a discussion within the team. Would it make sense to keep this “point of view” concerning the data and propose to switch between “protocols” and “sources” display? This made completely sense to the team who decided to keep both possibilities. Even though the ability to display the port over time was not part of the requirements, the team decided that as it was also desired by the project sponsor, this could be implemented and integrated to the display selection with no major delays.

The CSV file used for testing contained all data points available in the database. The advantage was the maximum precision obtained from these data points, but the drawback was that more data had to be transported between the client and the server. How many points should be made available to the front end? The team shared the question with the project sponsor and proposed an elegant solution. For long time ranges (years or months) it is not necessary to have the maximum details possible. As a result, the more the user zooms in, the more data points he will get for small time ranges, allowing the user to better correlate precise data with time. To dynamically load more data (more time points) as the user zooms in (or pans around), some ideas from (Sanderson, 2017) have been used.

GUI structure

As the whole front end is not only composed by a graph, the web application needed a structure to know where the various components (filters, content views, graphs, tables etc…) must be placed. Mock-ups were drawn, based on the requirements, with Pencil (Pencil Project: An open-source GUI prototyping tool that's available for ALL platforms., 2017), a GUI prototyping software.

The web application structure has been realized with Bootstrap, based on the proposed and approved mock-ups. Bootstrap uses its own grid system to layout the whole web page and horizontally spans over 12 columns. All Bootstrap visual effects are applied by giving values to the “class” attribute on the HTML element. By using predefined class values, it is possible to style the application for the target devices based on their width (xs, sm, md, lg xl). As a result, a label element that encapsulates a input element can be easily transformed into a clickable button by adding the values “btn” to the class attributes of the label element. As an example, this also gives the opportunity to get out of the box visual effects when buttons are clicked. The application is made of three parts: the navigation (header), the filter, and the content.

Filters for different modes

Between the three modes there are some input fields which can be reused. Rather than having redundant data (e.g. sources, time selection…) for each mode, a custom attribute is added to HTML elements so that they can be shown or hidden depending on which mode is selected. As an example, the HTML elements containing the attribute and value combination data-view="flows statistics" are visible only visible in flows and statistics mode but not in the graph mode.

All parameters (source selection, protocols selection, data type selection and time range selection) necessary to get the correct graph have been integrated in the front end. When the graph view (which represents most of the requirements) was almost done, the filter forms for flows and statistics have been completed. In the meantime, the client-side functions that call the back end API have been implemented and integrated.

Time selection and navigation

Ion.RangeSlider (Ineshin, 2017) has been used to fulfill the MUST requirements “move and display next or previous time slot” and “display by defining a time window”, it allows to see the entire range (date/time) of selectable data (for the graph) at a glance, and to navigate quickly.

Four buttons have been created to select time slots (24 hours, week, month, year) and a user can switch at any moment to time window selection by moving the two selectors. Additionally, two arrows buttons (left and right) allow to move from one slot the previous or next one. To dig deeper into some anomalies, a “copy from graph” button has been made available. When restricting the time range on the graph itself, this can be copied over as “new date” for the Ion.RangeSlider bar and therefore be used in other modes too (flows, statistics) rather than to redefine it in the Ion.RangeSlider bar again.

Display flows and statistics

The team decided that getting the output from nfdump and display it in text over a very long page like NfSen does, was not acceptable for a modern solution. As a result, FooTable has been used instead. FooTable is a jQuery plugin, designed to make tables responsive and feature-rich. It allows reading data directly from JSON or CSV and supports paging, filtering and sorting. To keep the flows or statistics results or to further manipulate them, an additional feature has been implemented and allows to request a CSV file.

Warning concerning the system resources usage

While developing and testing, the team realized that nfdump needs a lot of system resources to perform calculations for statistics or aggregation for flow listing over extended period of times. For this reason, is has been decided to warn the user accordingly. At first the idea has been to give an estimated time for the execution but this depends on so many variables like the RAM, the operating system, the CPUs, the size of the collected nfcapd files, the nfdump program itself, that it has been decided to display a simple warning message instead.