Skip to content

Commit

Permalink
Fixing some grammatical errors
Browse files Browse the repository at this point in the history
  • Loading branch information
ianhelle committed Oct 20, 2024
1 parent a48ec1e commit 62b06ec
Show file tree
Hide file tree
Showing 3 changed files with 100 additions and 103 deletions.
6 changes: 3 additions & 3 deletions docs/source/data_acquisition/GeoIPLookups.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,14 +82,14 @@ The example shown here shows part of the ``OtherProviders`` section of
msticpyconfig.yaml. You can specify an API key in the ``AuthKey`` setting.
For example, ``AuthKey: abcd424246789`` or use a reference to an
environment variable holding the key value.
The API key you need to specify in the ``AuthKey`` setting is you MaxMind
License Key that can be found on the MaxMind website under Account > Services.
The API key you need to specify in the ``AuthKey`` setting is your MaxMind
License Key that can be found on the MaxMind website under Account/Services.
Set the ``AccountID`` field to your MaxMind Account ID. (this is typically
not a secret value but you can opt to store this in an environment variable
or Azure Key Vault).

The DBFolder setting specifies a folder where the downloaded Maxmind
database files will be stored and referenced from. Thefolder path
database files will be stored and referenced from. The folder path
can be prefixed with "~" to specify a path relative to the current
users home directory (this works cross-platform).

Expand Down
155 changes: 78 additions & 77 deletions docs/source/data_acquisition/UploadData.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
Data Uploaders
==============

As well as retrieving data from a data source you may wish to upload a data set to a data source.
This may be a local data file you want to add to you centralized data source or they may be findings
from your investigation that you want to store long term.
MSTICpy contains data uploader functions for both Azure Sentinel/Log Analytics, and Splunk data sources.
Data can be provided to both uploaders as a Pandas DataFrame, value separated file (e.g. csv, tsv),
or a folder path of value separated files.
As well as retrieving data from a data source, you may wish to upload a data set to a data source.
This may be a local data file you want to add to your centralized data source, or they may be findings
from your investigation that you want to store long-term.
MSTICpy contains data uploader functions for both Azure Sentinel/Log Analytics and Splunk data sources.
Data can be provided to both uploaders as a Pandas DataFrame, value-separated file (e.g., csv, tsv),
or a folder path of value-separated files.

Uploading data to Azure Sentinel/Log Analytics
----------------------------------------------
Expand All @@ -15,62 +15,63 @@ Instantiating the Azure Sentinel uploader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The first step in uploading data is to instantiate an uploader for the location we wish to upload data to.
For Azure Sentinel there are two parameters that need to be passed at this stage,
For Azure Sentinel, there are two parameters that need to be passed at this stage:
the workspace ID of the workspace to upload data to, and the workspace key.

.. note:: that these are different from the details required to query data from Log Analytics using the DataProvider.
Your workspace key can be found under the Advanced setting tab of your Log Analytics workspace.**
.. note:: These are different from the details required to query data from Log Analytics using the DataProvider.
Your workspace key can be found under the Advanced settings tab of your Log Analytics workspace.

.. code:: ipython3
from msticpy.data.uploaders.loganalytics_uploader import LAUploader
laup = LAUploader(workspace=WORKSPACE_ID, workspace_secret=WORKSPACE_KEY)
from msticpy.data.uploaders.loganalytics_uploader import LAUploader
laup = LAUploader(workspace=WORKSPACE_ID, workspace_secret=WORKSPACE_KEY)
You can also set a ``debug`` flag when instantiating which will provide additional progress messages during an upload process.
You can also set a ``debug`` flag when instantiating, which will provide additional progress messages during the upload process.

Uploading a DataFrame to Azure Sentinel
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To upload a Pandas DataFrame to Log Analytics you simply pass the DataFrame to ``.upload_df()`` along with the name of a table
you wish the data to be uploaded to. If that table exists the data will be appended to it, alternatively the table will be created.
Note that all tables fall under the Custom Log category so any name you provide will be appended with _CL (i.e. table_name will be table_name_CL).
To upload a Pandas DataFrame to Log Analytics, you simply pass the DataFrame to ``.upload_df()`` along with the name of a table
you wish the data to be uploaded to. If that table exists, the data will be appended to it; alternatively, the table will be created.
Note that all tables fall under the Custom Log category, so any name you provide will be appended with _CL
(i.e., table_name will be table_name_CL).
Log Analytics will parse each column in the DataFrame into a column in the resulting table.

.. note:: table_name cannot contain any special characters except `_` all other characters will be removed.
.. note:: table_name cannot contain any special characters except `_`; all other characters will be removed.

.. code:: ipython3
laup.upload_df(data=DATAFRAME, table_name=TABLE_NAME)
laup.upload_df(data=DATAFRAME, table_name=TABLE_NAME)
Uploading a File to Azure Sentinel
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To upload a file to Log Analytics pass the path to the file to ``.upload_file()``. By default a comma separated
value file is expected but if you have some other separator value you can pass this with the ``delim`` parameter.
You can specify a table name to upload the data to with that ``table_name`` parameter but by default the uploader
To upload a file to Log Analytics, pass the path to the file to ``.upload_file()``. By default, a comma-separated
value file is expected, but if you have some other separator value, you can pass this with the ``delim`` parameter.
You can specify a table name to upload the data to with the ``table_name`` parameter, but by default, the uploader
will upload to a table with the same name as the file.

.. code:: ipython3
laup.upload_file(file_path=FILE_PATH)
laup.upload_file(file_path=FILE_PATH)
Uploading a Folder to Azure Sentinel
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can also upload a whole folder of files. To do this simply pass the folder path to ``.upload_folder()``.
By default this will upload all csv files in that folder to the Log Analytics workspace, with each file being
uploaded to a table with a name corresponding to the file name. Alternatively you can also specify single a table
name under which all files will be uploaded. If you have some other separated value file type you can pass ``delim``,
and the specified delimiter value, however currently there is only support for a single delim type across files.
By default this method attempts to upload all files in the specified folders, if you want to only process certain file
extensions you can pass the ``glob`` keyword parameter with the a pattern for files to attempt to upload. The
pattern format required follows the ``pathlib.glob()`` pattern - more details are avaliable `here <"https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob>`_
You can also upload a whole folder of files. To do this, simply pass the folder path to ``.upload_folder()``.
By default, this will upload all csv files in that folder to the Log Analytics workspace, with each file being
uploaded to a table with a name corresponding to the file name. Alternatively, you can also specify a single table
name under which all files will be uploaded. If you have some other separated value file type, you can pass ``delim``
and the specified delimiter value; however, currently, there is only support for a single delim type across files.
By default, this method attempts to upload all files in the specified folders. If you want to only process certain file
extensions, you can pass the ``glob`` keyword parameter with a pattern for files to attempt to upload. The
pattern format required follows the ``pathlib.glob()`` pattern - more details are available `here <https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob>`_.

.. code:: ipython3
laup.upload_folder(folder_path=FOLDER_PATH, glob="*.csv")
laup.upload_folder(folder_path=FOLDER_PATH, glob="*.csv")
During upload a progress bar will be displayed showing the upload process of the files within the folder.
During upload, a progress bar will be displayed showing the upload process of the files within the folder.

Uploading data to Splunk
------------------------
Expand All @@ -79,98 +80,98 @@ Instantiating the Splunk uploader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The first step in uploading data is to instantiate an uploader for the location we wish to upload data to.
For Splunk there are three parameters that need to be passed at this stage, the Splunk host name, a username,
and a password. You can also pass a parameter for ``port``, by default this value is 8089.
In addition, The security auth token of ``bearer_token`` can be also passed
instead of username and password as same as Splunk QueryProvider.
For Splunk, there are three parameters that need to be passed at this stage: the Splunk host name, a username,
and a password. You can also pass a parameter for ``port``; by default, this value is 8089.
In addition, the security auth token of ``bearer_token`` can also be passed
instead of username and password, as with the Splunk QueryProvider.

.. code:: ipython3
from msticpy.data.uploaders.splunk_uploader import SplunkUploader
spup = SplunkUploader(username=USERNAME, host=HOST, password=PASSWORD)
from msticpy.data.uploaders.splunk_uploader import SplunkUploader
spup = SplunkUploader(username=USERNAME, host=HOST, password=PASSWORD)
You can also set a ``debug`` flag when instantiating which will provide additional progress messages during an upload process.
You can also set a ``debug`` flag when instantiating, which will provide additional progress messages during the upload process.

On the other hand, You can use the stored credentials in msticpyconfig.yaml to SplunkUploader.
Alternatively, you can use the stored credentials in msticpyconfig.yaml with SplunkUploader.

.. code:: ipython3
from msticpy.data.uploaders.splunk_uploader import SplunkUploader
spup = SplunkUploader()
from msticpy.data.uploaders.splunk_uploader import SplunkUploader
spup = SplunkUploader()
.. note:: Due to the way Splunk API's work the time taken to upload a file to
Splunk can be significantly longer than with Log Analytics.*
.. note:: Due to the way Splunk API's work, the time taken to upload a file to
Splunk can be significantly longer than with Log Analytics.

Uploading a DataFrame to Splunk
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To upload a Pandas DataFrame to Splunk you simply pass the DataFrame to ``.upload_df()``
along with index you wish the data to be uploaded to.
As the ``source_type`` parameter, csv, json or others can be input and then passed to
df.to_csv(), df.to_json(), df.to_string() styles respectively and **json** is by default.
``table_name`` parameter remains for the backward compatibility.
To upload a Pandas DataFrame to Splunk, you simply pass the DataFrame to ``.upload_df()``
along with the index you wish the data to be uploaded to.
As the ``source_type`` parameter, csv, json, or others can be input and then passed to
df.to_csv(), df.to_json(), df.to_string() styles respectively, and **json** is by default.
The ``table_name`` parameter remains for backward compatibility.
If the index provided does not exist and you want it to be created,
you can pass the parameter ``create_index = True``.
you can pass the parameter ``create_index=True``.

.. Note – table name for Splunk refers to source type.
.. note:: table name for Splunk refers to source type.

.. code:: ipython3
spup.upload_df(data=DATAFRAME, index_name=INDEX_NAME)
spup.upload_df(data=DATAFRAME, index_name=INDEX_NAME)
During upload a progress bar will be shown showing the upload process of the upload.
During upload, a progress bar will be shown showing the upload process.

Uploading a File to Splunk
^^^^^^^^^^^^^^^^^^^^^^^^^^

To upload a file to Splunk pass the path to the file to ``.upload_file()`` along with the name of
To upload a file to Splunk, pass the path to the file to ``.upload_file()`` along with the name of
the index you want the data uploaded to.
By default, a comma separated value file is expected but if your file has
some other separator value you can pass this with the ``delim`` parameter.
You can specify the source type to upload the data to with that ``source_type`` parameter
but by default the uploader will upload to the sourcetype with the same name as the file.
As the ``source_type`` parameter, csv, json or others can be input and then passed to
By default, a comma-separated value file is expected, but if your file has
some other separator value, you can pass this with the ``delim`` parameter.
You can specify the source type to upload the data to with the ``source_type`` parameter,
but by default, the uploader will upload to the sourcetype with the same name as the file.
As the ``source_type`` parameter, csv, json, or others can be input and then passed to
df.to_csv(), df.to_json(), df.to_string() styles respectively.

The default is **json** if without ``table_name`` parameter, because ``table_name`` remains
only for the backward compatibility.
The default is **json** if without the ``table_name`` parameter, because ``table_name`` remains
only for backward compatibility.

As with uploading a DataFrame
As with uploading a DataFrame,
if the index provided does not exist and you want it to be created, you can pass
the parameter ``create_index = True``.
the parameter ``create_index=True``.

.. code:: ipython3
spup.upload_file(file_path=FILE_PATH, index_name=INDEX_NAME)
spup.upload_file(file_path=FILE_PATH, index_name=INDEX_NAME)
Uploading a Folder to Splunk
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can also upload a whole folder of files. To do this simply pass the folder path to
You can also upload a whole folder of files. To do this, simply pass the folder path to
``.upload_folder()`` along with the
name of the index you want the data uploaded to. By default,
this will upload all csv files in that folder to Splunk,
with each file being uploaded to a sourcetype with a name corresponding to the file name.

Alternatively, you can also
specify single a source type which all files will be uploaded with the ``source_type`` parameter.
As the ``source_type`` parameter, csv, json or others can be input and then passed to
specify a single source type under which all files will be uploaded with the ``source_type`` parameter.
As the ``source_type`` parameter, csv, json, or others can be input and then passed to
df.to_csv(), df.to_json(), df.to_string() styles respectively.
The default is **json** if without ``table_name`` parameter, because ``table_name`` remains
only for the backward compatibility.
The default is **json** if without the ``table_name`` parameter, because ``table_name`` remains
only for backward compatibility.

If your files have some
other separated value file type you can pass ``delim``, and the specified delimiter value, however currently there is
only support for a single delim type across files. By default this method attempts to upload all files in the specified
folders, if you want to only process certain file extensions you can pass the ``glob`` keyword parameter
with the a pattern for files to attempt to upload.
other separated value file type, you can pass ``delim`` and the specified delimiter value; however, currently, there is
only support for a single delim type across files. By default, this method attempts to upload all files in the specified
folders. If you want to only process certain file extensions, you can pass the ``glob`` keyword parameter
with a pattern for files to attempt to upload.
The pattern format required follows the ``pathlib.glob()`` pattern - more details are
avaliable `here <"https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob>`_
As with the other methods if the index provided does not exist and you want it to be created,
you can pass the parameter ``create_index = True``.
available `here <https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob>`_.
As with the other methods, if the index provided does not exist and you want it to be created,
you can pass the parameter ``create_index=True``.

.. code:: ipython3
spup.upload_folder(folder_path=FOLDER_PATH, index_name=INDEX_NAME)
spup.upload_folder(folder_path=FOLDER_PATH, index_name=INDEX_NAME)
During upload a progress bar will be shown showing the upload process of the files within the folder.
During upload, a progress bar will be shown showing the upload process of the files within the folder.
42 changes: 19 additions & 23 deletions docs/source/getting_started/UserSessionConfig.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,19 @@ Automatic Loading of Query Providers and Components
The ``mp_user_session.py`` module is designed to load and initialize query providers and other
components based on configuration provided in a YAML file.

This allows you to load multiple providers and components in a single step
avoiding having to write a lot of repetitive code in your notebooks.
This allows you to load multiple providers and components in a single step avoiding having to
write a lot of repetitive code in your notebooks.

The user is expected to supply the path to the YAML file to the ``load_user_config`` function.
Each key in the ``QueryProviders`` and ``Components`` sections of the YAML file will be the
name of the component variable in the local namespace.
Each key in the ``QueryProviders`` and ``Components`` sections of the YAML file will be the name
of the component variable in the local namespace.

Example YAML Configuration
--------------------------

Here is an example of a YAML configuration file that defines query providers and components:

.. code-block yaml
.. code-block:: yaml
QueryProviders:
qry_prov_sent:
Expand Down Expand Up @@ -44,28 +44,24 @@ Here is an example of a YAML configuration file that defines query providers and
workspace: CyberSecuritySoc
auth_methods: ['cli', 'device_code']
Each key in the ``QueryProviders`` and ``Components`` sections is the name of the instance of the
component created in your notebook environment. For example, the ``qry_prov_md`` entry is
equivalent to the code:

Each key in the ``QueryProviders`` and ``Components`` sections is the name
of the instance of the component created in your notebook environment.
For example the ``qry_prov_md`` entry is equivalent to the code:

.. code-block python
.. code-block:: python
import msticpy as mp
qry_prov_md = mp.QueryProvider("M365D")
You can also specify initialization arguments. The ``qry_prov_sent`` entry
adds ``debug=True`` to the parameters given to the query provider.

You can also ask the user session manager to call the ``connect`` method
for the provider with the ``Connect`` property, and supply parameters
the the ``connect`` call with the ``ConnectArgs`` property.
You can also specify initialization arguments. The ``qry_prov_sent`` entry adds ``debug=True`` to
the parameters given to the query provider.

The ``Components`` section allows you to define non-query components
and works in a similar way to the ``QueryProviders`` section.
The main difference here is that you need to specify the module and class
of the component that you want to load. In the example above, we
are loading the ``MicrosoftSentinel`` class from the ``msticpy.context.azure`` module
and requesting that the ``connect`` method is called with the parameters
specified in the ``ConnectArgs`` property.
You can also ask the user session manager to call the ``connect`` method for the provider with the
``Connect`` property, and supply parameters to the ``connect`` call with the ``ConnectArgs``
property.

The ``Components`` section allows you to define non-query components and works in a similar way to
the ``QueryProviders`` section. The main difference here is that you need to specify the module and
class of the component that you want to load. In the example above, we are loading the
``MicrosoftSentinel`` class from the ``msticpy.context.azure`` module and requesting that the
``connect`` method is called with the parameters specified in the ``ConnectArgs`` property.

0 comments on commit 62b06ec

Please sign in to comment.