Skip to content

Commit

Permalink
LocalDataDriver for using CSV and pickled DF files as a QueryProvider (
Browse files Browse the repository at this point in the history
…#64)

* LocalDataDriver for using CSV and pickled DF files as a QueryProvider

Removed deprecated kql.py, query_builtin_queries, query_mgr.py, query_schema.py
Changed location of query_defns.py and made pkg reference updates in several modules and notebooks.
Some fixes to support local_data_driver in query_store.py, driver_base.py and data_providers.py
Unit test - test_localdata_queries.yaml and supporting data and query files.
Fixed test in test_utils.py to work on Linux
Add documentation for LocalDataDriver to DataProviders.rst and updated section on creating query files.
Reduced warnings produced during pytest run to something more reasonable.

* Added "AzureSentinel" alias for LogAnalytics DataEnvironment

Changed tilookup and kql_base/kql_driver so that handling failure to load is a bit friendlier.
E.g. running TILookup in a non-IPython environment (with ASTI provider) will now just cause a warning, not an exception.
kql_driver.py also updated to check for get_ipython() returning None and output friendlier message.
Change driver_base.py and derived class to take additional QuerySource parameter for query() method - not yet used but
required so that we can implement driver-specific checks on query parameters.

* Fixing PR comments for docs (plus a few other things I saw)

Updated DataQueries.rst with new queries
Checked in notebook to create DataQueries.rst
Removed deprecated class from query_defns.py

* Typo in warning

* Missing parenthesis in DataProviders.rst
  • Loading branch information
ianhelle authored May 21, 2020
1 parent 43ad997 commit 478b5bb
Show file tree
Hide file tree
Showing 58 changed files with 3,498 additions and 5,151 deletions.
3,980 changes: 2,005 additions & 1,975 deletions docs/notebooks/Data_Queries.ipynb

Large diffs are not rendered by default.

840 changes: 6 additions & 834 deletions docs/notebooks/TIProviders.ipynb

Large diffs are not rendered by default.

25 changes: 25 additions & 0 deletions docs/notebooks/msticpyconfig.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
QueryDefinitions:

TIProviders:
OTX:
Args:
AuthKey: "***REMOVED***"
Primary: True
Provider: "OTX" # Explicitly name provider to override
VirusTotal:
Args:
AuthKey: "***REMOVED***"
Primary: True
Provider: "VirusTotal"
XForce:
Args:
ApiID: "d99c9637-3049-4c1e-b608-18c3bad769f9"
AuthKey: "f3531662-7849-4080-9e79-b728daadc2e8"
Primary: True
Provider: "XForce"
AzureSentinel:
Args:
WorkspaceID: "a927809c-8142-43e1-96b3-4ad87cfe95a3"
TenantID: "35a9e601-82db-42da-b521-efc4a2f6783c"
Primary: False
Provider: "AzSTI"
620 changes: 480 additions & 140 deletions docs/source/data_acquisition/DataProviders.rst

Large diffs are not rendered by default.

161 changes: 91 additions & 70 deletions docs/source/data_acquisition/DataQueries.rst

Large diffs are not rendered by default.

16 changes: 8 additions & 8 deletions docs/source/data_acquisition/GeoIPLookups.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,8 +126,8 @@ not work reliably cross-platform.
iplocation = GeoLiteLookup(api_key="mykey", db_folder="/tmp/mmdb")
Usage
^^^^^
GeoLite Usage
^^^^^^^^^^^^^

Creating an instance of the GeoLiteLookup class
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -227,8 +227,8 @@ entity (see :py:class:`IpAddress<msticpy.nbtools.entityschema.IpAddress>`)
'Count...)
Looking up a list of IP Addresses
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Looking up a list of IP Addresses with GeoLiteLookup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


.. code:: ipython3
Expand Down Expand Up @@ -325,8 +325,8 @@ environment variable holding the key value, as shown in the example.
Provider: "IPStackLookup"
Usage
^^^^^
IPStack Usage
^^^^^^^^^^^^^

Manually Entering the IPStack Key
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -395,8 +395,8 @@ Lookup IP location from IPStack
{"Address": "90.156.201.97", "Location": {"CountryCode": "RU", "CountryName": "Russia", "Longitude": 37.6068, "Latitude": 55.7386, "Type": "geolocation"}, "Type": "ipaddress"}
Looking up a list of IP Addresses
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Looking up a list of IP Addresses with IPStackLookup
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code:: ipython3
Expand Down
32 changes: 14 additions & 18 deletions docs/source/msticpy.data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,13 @@ msticpy.data.drivers.kql\_driver module
:undoc-members:
:show-inheritance:

msticpy.data.drivers.local\_data\_driver module
-----------------------------------------------
.. automodule:: msticpy.data.drivers.local_data_driver
:members:
:undoc-members:
:show-inheritance:

msticpy.data.drivers.security\_graph\_driver module
---------------------------------------------------
.. automodule:: msticpy.data.drivers.security_graph_driver
Expand All @@ -102,38 +109,27 @@ msticpy.data.drivers.security\_graph\_driver module
:show-inheritance:

msticpy.data.drivers.odata\_driver module
---------------------------------------------------
-----------------------------------------
.. automodule:: msticpy.data.drivers.odata_driver
:members:
:undoc-members:
:show-inheritance:

msticpy.data.drivers.mdatp\_driver module
---------------------------------------------------
-----------------------------------------
.. automodule:: msticpy.data.drivers.mdatp_driver
:members:
:undoc-members:
:show-inheritance:

msticpy.nbtools.kql module
--------------------------

.. deprecated:: version 0.2.0
Use :py:class:`msticpy.data.QueryProvider` instead.


msticpy.nbtools.query\_builtin\_queries module
----------------------------------------------

.. deprecated:: version 0.2.0
Use :py:class:`msticpy.data.QueryProvider` instead.


msticpy.nbtools.query\_defns module
msticpy.data.query\_defns module
-----------------------------------

.. deprecated:: version 0.2.0
Use :py:class:`msticpy.data.QueryProvider` instead.
.. automodule:: msticpy.data.query_defns
:members:
:undoc-members:
:show-inheritance:


msticpy.nbtools.query\_mgr module
Expand Down
10 changes: 9 additions & 1 deletion docs/source/msticpy.nbtools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,21 @@ msticpy.nbtools.foliummap module
:show-inheritance:

msticpy.nbtools.morph_charts module
--------------------------------
-----------------------------------

.. automodule:: msticpy.nbtools.morph_charts
:members:
:undoc-members:
:show-inheritance:

msticpy.nbtools.nbinit module
-----------------------------

.. automodule:: msticpy.nbtools.nbinit
:members:
:undoc-members:
:show-inheritance:

msticpy.nbtools.nbdisplay module
--------------------------------

Expand Down
70 changes: 35 additions & 35 deletions docs/source/visualization/TimeSeriesAnomalies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ visualization built using the `Bokeh
library <https://bokeh.pydata.org>`__ as well as using built-in native
KQL operators.

Time Series analysis generally involves below steps
- Generating TimeSeries Data
- Use Time Series Analysis functions to discover anomalies
Time Series analysis generally involves below steps
- Generating TimeSeries Data
- Use Time Series Analysis functions to discover anomalies
- Visualize Time Series anomalies

Read more about time series analysis in detail from reference microsoft
TechCommunity blog posts

**Reference Blog Posts:**
**Reference Blog Posts:**

- `Looking for unknown anomalies - what is normal? Time Series analysis & its applications in Security <https://techcommunity.microsoft.com/t5/azure-sentinel/looking-for-unknown-anomalies-what-is-normal-time-series/ba-p/555052>`__

Expand All @@ -25,38 +25,38 @@ TechCommunity blog posts
# Imports
import sys
import warnings
from msticpy.nbtools.utility import check_py_version
from msticpy.common.utility import check_py_version
MIN_REQ_PYTHON = (3, 6)
check_py_version(MIN_REQ_PYTHON)
from IPython import get_ipython
from IPython.display import display, HTML, Markdown
import ipywidgets as widgets
import pandas as pd
#setting pandas display options for dataframe
pd.set_option("display.max_rows", 100)
pd.set_option("display.max_columns", 50)
pd.set_option("display.max_colwidth", 100)
# msticpy imports
from msticpy.data import QueryProvider
from msticpy.nbtools import *
from msticpy.sectools import *
from msticpy.nbtools.wsconfig import WorkspaceConfig
from msticpy.nbtools.timeseries import display_timeseries_anomolies
WIDGET_DEFAULTS = {
"layout": widgets.Layout(width="95%"),
"style": {"description_width": "initial"},
}
#Adjusting width of the screen
display(HTML("<style>.container { width:80% !important; }</style>"))
ws_config = WorkspaceConfig()
Expand Down Expand Up @@ -180,11 +180,11 @@ Query, data source, parameters and parameterized raw KQL query
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Expand Down Expand Up @@ -275,7 +275,7 @@ the similar details
where_clause: str (optional)
Optional additional filter clauses
Query:
{table} {where_clause} | project {timestampcolumn},{aggregatecolumn},{groupbycolumn} | where {timestampcolumn} >= datetime({start}) | where {timestampcolumn} <= datetime({end}) | make-series {aggregatecolumn}={aggregatefunction} on {timestampcolumn} from datetime({start}) to datetime({end}) step {timeframe} by {groupbycolumn} | extend (baseline,seasonal,trend,residual) = series_decompose({aggregatecolumn}) | mv-expand {aggregatecolumn} to typeof(double), {timestampcolumn} to typeof(datetime), baseline to typeof(long), seasonal to typeof(long), trend to typeof(long), residual to typeof(long) | project {timestampcolumn}, {aggregatecolumn}, baseline | render timechart with (title="Time Series Decomposition - Baseline vs Observed TimeChart")
{table} {where_clause} | project {timestampcolumn},{aggregatecolumn},{groupbycolumn} | where {timestampcolumn} >= datetime({start}) | where {timestampcolumn} <= datetime({end}) | make-series {aggregatecolumn}={aggregatefunction} on {timestampcolumn} from datetime({start}) to datetime({end}) step {timeframe} by {groupbycolumn} | extend (baseline,seasonal,trend,residual) = series_decompose({aggregatecolumn}) | mv-expand {aggregatecolumn} to typeof(double), {timestampcolumn} to typeof(datetime), baseline to typeof(long), seasonal to typeof(long), trend to typeof(long), residual to typeof(long) | project {timestampcolumn}, {aggregatecolumn}, baseline | render timechart with (title="Time Series Decomposition - Baseline vs Observed TimeChart")

.. code:: ipython3
Expand All @@ -292,11 +292,11 @@ the similar details
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Expand Down Expand Up @@ -364,7 +364,7 @@ KQL Time series functions such as ``series_decompose_anomalies()``.
.. code:: ipython3
timeseriesdemo = pd.read_csv('TimeSeriesDemo.csv',
parse_dates=["TimeGenerated"],
parse_dates=["TimeGenerated"],
infer_datetime_format=True)
timeseriesdemo.head()
Expand All @@ -378,11 +378,11 @@ KQL Time series functions such as ``series_decompose_anomalies()``.
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Expand Down Expand Up @@ -443,7 +443,7 @@ KQL Time series functions such as ``series_decompose_anomalies()``.
</table>
</div>


Displaying Time Series anomaly alerts
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -513,11 +513,11 @@ details
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Expand Down Expand Up @@ -570,11 +570,11 @@ other suspicious activity from other datasources.
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Expand Down Expand Up @@ -693,7 +693,7 @@ Documentation for display_timeseries_anomalies
ygrid : bool, optional
Whether to show the yaxis grid (default is False)
color : list, optional
List of colors to use in 3 plots as specified in order
List of colors to use in 3 plots as specified in order
3 plots- line(observed), circle(baseline), circle_x/user specified(anomalies).
(the default is ["navy", "green", "firebrick"])

Expand All @@ -710,7 +710,7 @@ Documentation for display_timeseries_anomalies
.. raw:: html


<div class="bk-root">
<a href="https://bokeh.org" target="_blank" class="bk-logo bk-logo-small bk-logo-notebook"></a>
<span id="1001">Loading BokehJS ...</span>
Expand Down Expand Up @@ -746,14 +746,14 @@ function.
from bokeh.io import export_png
from IPython.display import Image
# Create a plot
timeseries_anomaly_plot = display_timeseries_anomolies(data=timeseriesdemo, y= 'TotalBytesSent')
# Export
# Export
file_name = "plot.png"
export_png(timeseries_anomaly_plot, filename=file_name)
# Read it and show it
display(Markdown(f"## Here is our saved plot: {file_name}"))
Image(filename=file_name)
Expand All @@ -762,7 +762,7 @@ function.
.. raw:: html


<div class="bk-root">
<a href="https://bokeh.org" target="_blank" class="bk-logo bk-logo-small bk-logo-notebook"></a>
<span id="1407">Loading BokehJS ...</span>
Expand Down Expand Up @@ -791,7 +791,7 @@ will display linegraph.
timechartquery = """
let TimeSeriesData = PaloAltoTimeSeriesDemo_CL
| extend TimeGenerated = todatetime(EventTime_s), TotalBytesSent = todouble(TotalBytesSent_s)
| extend TimeGenerated = todatetime(EventTime_s), TotalBytesSent = todouble(TotalBytesSent_s)
| summarize TimeGenerated=make_list(TimeGenerated, 10000),TotalBytesSent=make_list(TotalBytesSent, 10000) by deviceVendor_s
| project TimeGenerated, TotalBytesSent;
TimeSeriesData
Expand Down
1 change: 1 addition & 0 deletions msticpy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@

from ._version import VERSION
from .common import pkg_config as settings
from .nbtools.nbinit import init_notebook

__version__ = VERSION
2 changes: 1 addition & 1 deletion msticpy/_version.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
"""Version file."""
VERSION = "0.5.0"
VERSION = "0.5.1"
1 change: 1 addition & 0 deletions msticpy/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# flake8: noqa: F403
from .data_providers import QueryProvider
from .azure_data import AzureData
from .query_defns import DataEnvironment, DataFamily

from .._version import VERSION

Expand Down
Loading

0 comments on commit 478b5bb

Please sign in to comment.