Skip to content
This repository has been archived by the owner on Aug 31, 2018. It is now read-only.

Data Sources

Rob edited this page Apr 17, 2015 · 3 revisions

Home

Back to The FIDO Pipeline

Data sources are a key piece to making FIDO successful. If detectors and threat feeds provide you with external information about a threat, then internal data sources provide you with the internal resources you need to properly assess the situation. Gathering internal information allows you to categorize both the user and the machine associated with an alert. Examples of internal data sources would be Active Directory, inventory or CMDBs, systems management applications such as SCCM or Landesk, and even other client software such as AV or Bit9.

FIDO supports the following internal data sources:

  • Active Directory
  • LANDesk
  • JAMF

The following data sources are planned or in progress:

  • SCCM
  • OpenLDAP
  • Bit9
  • Workday

The idea for this phase of the FIDO assembly line is to convert the source IP address from the alert into a known managed machine and then get the user to whom that system belongs. Not being able to convert the source IP can be valuable too. In our environment 'unknown' systems have proven to be personal machines or vendors who plug-in an infected laptops. For systems you can identify, however, getting detailed information starts with the machine by converting the source IP into a hostname.

The primary piece of information FIDO is looking for initially is 'last update time' from the machine specific data sources. If directly reaching out to the system fails, you have to rely upon the software stack installed on an endpoint. Looking at the 'last update time' value in the database which the clients update to will allow you to determine what host is behind the source IP of an alert. Typically most of these clients update their IP address to the server in near real-time. For example, if I'm on a laptop moving from one location to another and my IP address updates, most of these clients will detect the IP change. The client will then send an update to the server letting it know of the updated IP address. This is necessary for assorted reasons depending on the client. Fido leverages this functionality to determine the machine behind the source IP address.

Therefore, the two paths coming into data sources are then:

  • was able to reach out and 100% verify the host behind the source IP and will lookup machine specific information via actual hostname
  • unable to verify the host, but use the source IP to query the data sources to verify the host

At Netflix, with the first version of FIDO, we simply used DNS for name resolution of the source IP address. What we found was DNS was only accurate about 75% of the time because of the dynamic nature of modern networks. Once we moved to use the current logic our accuracy rate climbed to around 97%-98%, with anomalies coming from unmanaged and/or third party systems. By doing this approach we additionally gained the detailed information about the host which we could add in to the scoring algorithm. This is mostly to speak to DHCP controlled networks. If you're using static IP addresses, like those used in many datacenters for servers, or you do something like MAC address reservations, then DNS does become a valid internal data source. With this particular FIDO process, though, we get even better results with systems that have predictable IP addresses.

When the host has been verified detailed information can be gathered. Currently what FIDO is looking for include status of the security stack (is it installed? is it running? is it updated?), patch status, and operating system. This information is gathered as part of creating the machine posture of the system to create a machine score.

The last piece of information, which is important for the next step, is to get the username of the primary owner of the system. The username will be used to query specific user information such as title, access, and department membership and again will be used to create a user posture and score. Since most corporations employ some type of LDAP based system we simple bind to it and pass the username. While FIDO will retrieve a lot of information about a user, for scoring purposes the primary ones are title and department. Other values might be implemented in the future, but title/department reveal a lot about the access and priority of the user.

Finally, data sources are ripe for opportunity to build off of in this part of the FIDO pipeline. Whether it be adding additional modules to support different vendor data sources or starting to look at the network layer. Either way, the internal discovery is the neglected but highly important piece to threat analysis and building this out in a customizable way so each company can score to their needs is our intention.

On to the next step... Threat Feeds.

Clone this wiki locally