Skip to content

Commit

Permalink
Pandas magic extensions (#46)
Browse files Browse the repository at this point in the history
* NB magics mp_magics.py for base64unpack and IocExtract

Tidying the code up a bit in base64unpack
Adding tests for test_tiprovider_kql and test_tiproviders.py

* Added arguments checker to help with timeline and other functions that have a lot of kwargs.

Added requirements-dev.txt

* Fixing missing Dict import in utility.py

Fixing mypy errors in base64unpack.py

* Fixing parameters issue in timeline

Add parameter checks to timeline.py
Getting rid of and deprecating some functions

* Removing erroneous legend_column parameter.

Adding unit tests for utility.py

* Added more tests for utility.py

Fixed some errors and typos

* Fixing some tests an incorrect parameters used in notebooks

Fixing AttributeError for pandas datetime value (seeming to result from update to Pandas 1.0)
Changed tld_index and ssl_bl attributes to properties that auto-load on first use (prevents remote http request if data on class instantiation)
Change environment variable that controls test skipping to something more generic MSTICPY_TEST_NOSKIP

* Moved mp_magics to sectools_magics to avoid circular import problem

Added new location centering logic to foliummap
Add a closure to preserve config file name in pkg_config.py - also function to return the filename
process_tree - added pandas extension and changed main function so that it returnns the plot figure and layout
timeline - added pandas extension. added support for DateTime column in Tooltips (display as date time rather than number)
wsconfig - added method to dispaly available workspaces
base64unpack - added pandas extension
iocextract - added pandas extension

* Fixed bug in GeoIP DB downloader

Add doc of magic and pandas extension to IoCExtract.ipynb
Changed foliummap center functions to use median by default
Removed largely redundant os_family param from iocextract.py functions
Fixed sectools_magics iocextract class
Update test_ioc_extractor for new parameters

* Adding FoliumMap.ipynb notebook,

Updates to GeoIPLookups.ipynb
Added unit test test_folium.py
Fixed a few errors in foliummap.py

* Removed failing cell from end of GeoIPLookups notebook

* Missing test data file

* Another missing file

* And another!

* Updating docs for new usage.

Suppressing credscan error in AzureData.rst

* Removing notebook with misleading content

* Adding suppression file for credscan

* Credscan suppression for Sphinx-generated docs\build\html\_sources\data_acquisition\AzureData.rst.txt

* Trying to clean up pytest coverage report.

* Adding GeoIP tests.

Removing deprecated lines from coverage reports.

* Excluding test_geoip from local tests

* Spelling fixes for AzureData.rst

* Adding better help if someone tries to use a query that doesn't exist

* Review changes for foliummap

Experiment with image in README.md

* Adding a couple more graphics to README.md

* Fixing type of Azure in AzureData.rst
  • Loading branch information
ianhelle authored Feb 14, 2020
1 parent 3d30d9f commit 7ed69a3
Show file tree
Hide file tree
Showing 53 changed files with 5,321 additions and 14,141 deletions.
8 changes: 8 additions & 0 deletions .ci_config/coverage.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[run]
omit =
*/hostedtoolcache.windows.Python/*
*.site-packages.msticpy*

[report]
exclude_lines =
@deprecated
13 changes: 13 additions & 0 deletions .ci_config/credscan.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"tool": "Credential Scanner",
"suppressions": [
{
"placeholder": ", secret=secret)",
"_justification": "This is code usage example and does not contain a secret."
},
{
"file": "AzureData.rst.txt",
"_justification": "This is code usage example and does not contain a secret."
}
]
}
22 changes: 18 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,12 @@ Microsoft Threat Intelligence Python Security Tools.
The **msticpy** package was initially developed to support [Jupyter Notebook](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/What%20is%20the%20Jupyter%20Notebook.html)
authoring for [Azure Sentinel](https://azure.microsoft.com/en-us/services/azure-sentinel/).
Many of the included tools can be used in other security scenarios for threat hunting
and threat investigation. There are three main sub-packages:
and threat investigation.

<img src="https://github.com/microsoft/msticpy/blob/master/docs/source/visualization/_static/Timeline-08.png"
alt="Timeline" title="Msticpy Timeline Control" width="300" height="200" />

There are three main sub-packages:

- **sectools** - Python security tools to help with data enrichment,
analysis or investigation.
Expand Down Expand Up @@ -52,7 +57,7 @@ Output is to a decoded string (for single string input) or a DataFrame (for data

### iocextract

Uses a set of builtin regular expressions to look for Indicator of Compromise (IoC) patterns.
Uses a set of built-in regular expressions to look for Indicator of Compromise (IoC) patterns.
Input can be a single string or a pandas dataframe with one or more columns specified as input.

The following types are built-in:
Expand All @@ -62,7 +67,7 @@ The following types are built-in:
- DNS domain
- Hashes (MD5, SHA1, SHA256)
- Windows file paths
- Linux file paths (this is kind of noisy because a legal linux file path can have almost any character)
- Linux file paths (this is kind of noisy because a legal Linux file path can have almost any character)

You can modify or add to the regular expressions used at runtime.

Expand All @@ -72,7 +77,7 @@ Output is a dictionary of matches (for single string input) or a DataFrame (for

### tiproviders

The TILookup class can lookup IoCs across multiple TI providers. builtin
The TILookup class can lookup IoCs across multiple TI providers. built-in
providers include AlienVault OTX, IBM XForce, VirusTotal and Azure Sentinel.

The input can be a single IoC observable or a pandas DataFrame containing
Expand Down Expand Up @@ -101,6 +106,11 @@ Support IoC Types:
### geoip

Geographic location lookup for IP addresses.

<img src="https://github.com/microsoft/msticpy/blob/PandasMagicExtensions/docs/source/visualization/_static/FoliumMap-01.png"
alt="Folium map"
title="Plotting Geo IP Location" width="150" height="100" />

This module has two classes for different services:

- GeoLiteLookup - Maxmind Geolite (see <https://www.maxmind.com>)
Expand All @@ -119,6 +129,10 @@ This module is intended to be used to summarize large numbers of
events into clusters of different patterns. High volume repeating
events can often make it difficult to see unique and interesting items.

<img src="https://github.com/microsoft/msticpy/blob/PandasMagicExtensions/docs/source/data_analysis/_static/EventClustering_2a.png"
alt="Clustering"
title="Clustering based on command-line variability" width="150" height="200" />

This is an unsupervised learning module implemented using SciKit Learn DBScan.

The module contains functions to generate clusterable features from
Expand Down
413 changes: 336 additions & 77 deletions docs/notebooks/Base64Unpack.ipynb

Large diffs are not rendered by default.

3,551 changes: 83 additions & 3,468 deletions docs/notebooks/EventClustering.ipynb

Large diffs are not rendered by default.

654 changes: 328 additions & 326 deletions docs/notebooks/EventTimeline.ipynb

Large diffs are not rendered by default.

677 changes: 677 additions & 0 deletions docs/notebooks/FoliumMap.ipynb

Large diffs are not rendered by default.

Loading

0 comments on commit 7ed69a3

Please sign in to comment.