added docs

Matteo Lodi · Matteo Lodi · commit 7a1e8e36e023 · 2019-12-31T16:24:50.000+01:00
diff --git a/docs/Contribute.md b/docs/Contribute.md
@@ -0,0 +1,43 @@
+# Contribute
+
+Intel Owl was designed to ease the addition of new analyzers. With a simple python script you can integrate your own engine or integrate an external service in a short time.
+
+
+## How to add a new analyzer
+You may want to look at the following analyzer example to start to build a new one: [Example](https://github.com/certego/IntelOwl/blob/master/api_app/script_analyzers/analyzer_template.py)
+
+After having written the new python module, you have to remember to:
+* Put the module in the `file_analyzers` or `observable_analyzers` directory based on what it can analyze
+* Add the new module as a celery task in [tasks.py](https://github.com/certego/IntelOwl/blob/master/intel_owl/tasks.py)
+* Add a new entry in the [analyzer configuration](https://github.com/certego/IntelOwl/blob/master/configuration/analyzer_config.json):
+  
+  Example 
+  ```
+    "Analyzer_Name": {
+        "type": "file",
+        "external_service": true,
+        "leaks_info": true,
+        "run_hash": true,
+        "supported_filetypes": ["application/javascript"],
+        "python_module": "haget_run",
+        "additional_config_params": {
+            "custom_required_param": "ANALYZER_SPECIAL_KEY"
+        }
+    },
+  ```
+  
+  Remember to set at least:
+  * `type`: can be `file` or `observable`. It specifies what the analyzer should analyze
+  * `python_module`: name of the task that the analyzer must launch
+  
+  The `additional_config_params` can be used in case the new analyzer requires additional configuration.
+  In that way you can create more than one analyzer for a specific python module, each one based on different configurations.
+  MISP and Yara Analyzers are a good example of this use case: for instance, you can use different analyzers for different MISP instances.
+
+* If possible, add a new basic unit test in [tests.py](https://github.com/certego/IntelOwl/blob/master/api_app/tests.py)
+ 
+  Then follow the [Test](https://github.com/certego/IntelOwl/blob/master/docs/Tests.md) guide to start testing.
+
+* Finally, add the new analyzer/s in the list in the doc: [Usage](https://github.com/certego/IntelOwl/blob/master/docs/Usage.md)
+
+If everything is working, you can submit your pull request!
diff --git a/docs/Installation.md b/docs/Installation.md
@@ -0,0 +1,125 @@
+# Installation
+
+## Deployment options:
+* Docker-compose for classic server deployment
+* (Future) Docker for serverless deployment (ex: AWS Fargate)
+
+We suggest you to clone the project, configure the required environment variables and run `docker-compose up` using the docker-compose file that is embedded in the project.
+
+That file leverages a public docker image that is available in Docker Hub: certego/intelowl
+
+## Deployment Info
+Main components of the web application:
+* Django
+* Rabbit-MQ
+* Celery (for async calls and crons)
+* Nginx
+* Uwsgi
+* Flower (optional)
+
+All these components are managed by docker-compose
+
+Database: PostgreSQL
+
+## Deployment preparation
+### Environment configuration
+Before running the project, you must populate some environment variables in a file to provide the required configuration.
+In the project you can find a template file named `env_file_app_template`.
+You have to create a new file named `env_file_app` from that template and modify it with your own configuration.
+
+Required variables to run the image:
+* DJANGO_SECRET: random 50 chars key, must be unique, generate it randomly
+* DB_HOST, DB_PORT, DB_USER, DB_PASSWORD: PostgreSQL configuration
+
+Optional variables needed to enable specific analyzers:
+* ABUSEIPDB_KEY: AbuseIPDB API key
+* GSF_KEY: Google Safe Browsing API key
+* OTX_KEY: Alienvault OTX API key
+* CIRCL_CREDENTIALS: CIRCL PDNS credentials in the format: `user|pass`
+* VT_KEY: VirusTotal API key
+* HA_KEY: HybridAnalysis API key
+* INTEZER_KEY: Intezer API key
+* FIRST_MISP_API: FIRST MISP API key
+* FIRST_MISP_URL: FIRST MISP URL
+* CUCKOO_URL: your cuckoo instance URL
+
+### Database configuration
+Before running the project, you must populate the basic configuration for PostgreSQL.
+In the project you can find a template file named `env_file_postgres_template`.
+You have to create a new file named `env_file_postgres` from that template and modify it with your own configuration.
+
+Required variables (we need to insert some of the values we have put in the previous configuration):
+* POSTGRES_PASSWORD (same as DB_PASSWORD)
+* POSTGRES_USER (same as DB_USER)
+* POSTGRES_DB -> default `intel_owl_db`
+
+If you prefer to use an external PostgreSQL instance, you should just remove the relative image from the `docker-compose.yml` file and provide the configuration to connect to your controlled instance/s.
+
+### Web server configuration
+By default Intel Owl provides basic configuration for:
+* Nginx (`intel_owl_nginx_http` or `intel_owl_nginx_https`)
+* Uwsgi (`intel_owl.ini`)
+
+You can find them in the `configuration` directory.
+
+By default, the project would use the default deployment configuration and HTTP only.
+
+I suggest you to change these configuration files based on your needs and mount them as volumes by changing the `docker-compose.yml` file.
+
+In case you enable HTTPS, remember to set the environment variable `HTTPS_ENABLED` as "enabled" to increment the security of the application.
+
+### Analyzers configuration
+In the file `analyzers_config.json` there is the configuration for all the available analyzers you can run.
+For a complete list of all current available analyzer please look at: [Usage](https://github.com/certego/IntelOwl/blob/master/docs/Usage.md)
+
+You may want to change this configuration to add new analyzers or to change the configuration of some of them.
+
+The name of the analyzers can be changed at every moment based on your wishes.
+You just need to remember that it's important that you keep at least the following keys in the analyzers dictionaries to let them run correctly:
+* `type`: can be `file` or `observable`. It specifies what the analyzer should analyze
+* `python_module`: name of the task that the analyzer must launch
+
+For a full description of the available keys, check the [Usage](https://github.com/certego/IntelOwl/blob/master/docs/Usage.md) page
+
+### Rebuilding the project
+If you make some code changes and you like to rebuild the project, launch the following command from the project directory:
+
+`docker build --tag=<your_tag> .`
+
+Then, you should provide your own image in the `docker-compose.yml` file.
+
+
+## AWS support
+At the moment there's a basic support for some of the AWS services. More is coming in the future. 
+
+### Secrets
+If you would like to run this project on AWS, I'd suggest you to use the "Secrets Manager" to store your credentials. In this way your secrets would be better protected.
+
+This project supports this kind of configuration. Instead of adding the variables to the environment file, you should just add them with the same name on the AWS Secrets Manager and Intel Owl will fetch them transparently.
+
+Obviously, you should have created and managed the permissions in AWS in advance and accordingly to your infrastructure requirements.
+
+Also, you need to set the environment variable `AWS_SECRETS` to `True` to enable this mode.
+
+You can customize the AWS Region changing the environment variable `AWS_REGION`.
+
+### SQS
+If you like, you could use AWS SQS instead of Rabbit-MQ to manage your queues.
+In that case, you should change the parameter `CELERY_BROKER_URL` to `sqs://` and give your instances on AWS the proper permissions to access it.
+
+Also, you need to set the environment variable `AWS_SQS` to `True` to activate the additional required settings.
+
+### ... More coming
+
+
+## Run
+After having properly configured the environment files as suggested previously, you can run the image.
+The project uses `docker-compose`. You have to move to the project main directory to properly run it.
+
+`docker-compose up`
+
+
+## After deployment
+### Users creation
+You may want to run `docker exec -ti intel_owl_uwsgi python3 manage.py createsuperuser` after first run to create a superuser.
+Then you can add other users directly from the Django Admin Interface after having logged with the superuser account.
diff --git a/docs/Tests.md b/docs/Tests.md
@@ -0,0 +1,25 @@
+# Tests
+
+If you want to run tests, you should:
+
+- Change the following environment variables in the environment file based on the data you would like to test
+    * TEST_TOKEN -> API key for testing purposes
+    * TEST_JOB_ID
+    * TEST_MD5
+    * TEST_URL
+    * TEST_IP
+    * TEST_DOMAIN
+    
+- I didn't add to the project my test files because they are malware samples and I do not want people to use them in a wrong way. 
+So, if you want to test files, you should add the following files to the folder `test_files`:
+    * documento.pdf (to test PDF engines)
+    * documento.rtf (to test RTF engines)
+    * documento.doc (to test DOC engines)
+    * file.exe (to test PE engines)
+    * file.dll (to test DLL engines)
+    * non_valid_pe.exe (to test non-valid PE)
+
+    
+Run:
+`docker exec -ti intel_owl_uwsgi python3 manage.py test`
+    
diff --git a/docs/Usage.md b/docs/Usage.md
@@ -0,0 +1,89 @@
+# Usage
+
+## Client
+Intel Owl main objective is to provide a single API interface to query in order to retrieve threat intelligence at scale.
+
+So, in order to interact with the Intel Owl APIs, you may want to use the specific client that is available at: https://github.com/mlodic/pyintelowl
+
+We suggest to download the client, read the instructions and use it as a library for your own python projects or directly from the command line.
+
+### Tokens creation
+The server authentication is managed by API keys. So, if you want to interact with Intel Owl, you have to create one or more unprivileged users from the Django Admin Interface and then generate a token for those users.
+Afterwards you can leverage the created tokens with the Intel Owl Client.
+
+
+## Available Analyzers
+
+The following is the list of the available analyzers you can run out-of-the-box:
+
+### File analyzers:
+* File_Info: static generic File analysis
+* PDF_Info: static PDF analysis
+* Rtf_Info: static RTF analysis
+* Doc_Info: static generic document analysis
+* PE_Info: static PE analysis
+* Signature_Info: PE signature extractor
+* Strings_Info_Classic: strings extraction
+* Strings_Info_ML: strings extraction plus strings ranking based on Machine Learning
+* VirusTotal_v3_Get_File: check file hash on VirusTotal
+* VirusTotal_v3_Get_File_And_Scan: check file hash on VirusTotal. If not already available, perform a scan
+* VirusTotal_v3_Scan_File: scan a file on VirusTotal
+* VirusTotal_v2_Get_File: check file hash on VirusTotal using old API endpoints
+* VirusTotal_v2_Scan_File: scan a file on VirusTotal using old API endpoints
+* Intezer Scan: scan a file on Intezer
+* Cuckoo_Scan: scan a file on Cuckoo
+* HybridAnalysis_Get_File: check file hash on HybridAnalysis
+* OTX_Check_Hash: check file hash on OTX Alienvault
+* MISP_Check_Hash: check a file hash on a MISP instance
+* MISPFIRST_Check_Hash: check a file hash on the FIRST MISP instance
+* Yara_Scan_Community: scan a file with community yara rules
+* Yara_Scan_Florian: scan a file with Neo23x0 yara rules
+* Yara_Scan_Intezer: scan a file with Intezer yara rules
+* Yara_Scan_Custom_Signatures: scan a file with your own added signatures
+
+
+### Observable analyzers (ip, domain, url, hash)
+* VirusTotal_v3_Get_Observable: search an observable in the VirusTotal DB
+* VirusTotal_v2_Get_Observable: search an observable in the VirusTotal DB using the old API endpoints
+* HybridAnalysis_Get_Observable: search an observable in the HybridAnalysis DB
+* OTXQuery: scan an observable on Alienvault OTX
+* TalosReputation: check an IP reputation from Talos
+* Robtex_Forward_PDNS_Query: scan a domain against the Robtex Passive DNS DB
+* Robtex_Reverse_PDNS_Query: scan an IP against the Robtex Passive DNS DB
+* Robtex_IP_Query: get IP info from Robtex
+* GoogleSafebrowsing: scan an observable against GoogleSafeBrowsing DB
+* GreyNoiseAlpha: scan an IP against the Alpha Greynoise API
+* CIRCLPassiveDNS: scan an observable against the CIRCL Passive DNS DB
+* CIRCLPassiveSSL: scan an observable against the CIRCL Passive SSL DB
+* MaxMindGeoIP: extract GeoIP info for an observable
+* AbuseIPDB: scan an IP against the AbuseIPDB
+* Fortiguard: scan an observable with the Fortiguard URL Analyzer
+* TorProject: check if an IP is a Tor Exit Node
+* MISP: scan an observable on a MISP instance
+* MISPFIRST: scan an observable on the FIRST MISP instance
+* Shodan_Search: _not_already_available_
+
+## Analyzers customization
+You can create new analyzers based on already existing modules by changing the configuration values (`analyzer_config.json`).
+
+The following are all the keys that you can change without touching the source code:
+* `leaks_info`: if set, in the case you specify via the API that a resource is sensitive, the specific analyzer won't be executed
+* `external_service`: if set, in the case you specify via the API to exclude external services, the specific analyzer won't be executed
+* `supported_filetypes`: can be populated as a list. If set, if you ask to analyze a file with a different mimetype from the ones you specified, it won't be executed
+* `not_supported_filetypes`: can be populated as a list. If set, if you ask to analyze a file with a mimetype from the ones you specified, it won't be executed
+* `observable_supported`: can be populated as a list. If set, if you ask to analyze an observable that is not in this list, it won't be executed. Valid values are: `ip`, `domain`, `url`, `hash`
+
+Also, you can change the name of every available analyzer based on your wishes.
+
+Changing other keys will break the analyzer. In that case, you should think about create a new python module or to modify an existing one.
+
+To contribute to the project, see [Contribute](https://github.com/certego/IntelOwl/blob/master/docs/Contribute.md)
+
+
+
+
+
+
+
+
+