-
Notifications
You must be signed in to change notification settings - Fork 136
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DOC]: Can I use Morpheus without Blue Shield, DPU, Doca, etc? #1210
Comments
Hi @nyck33! Thanks for submitting this issue - our team has been notified and we'll get back to you as soon as we can! |
Hi @nyck33 Morpheus integration with the BlueField DPU is one of several use cases for the framework. However, there are many others that are not dependent at all on that integration. Please take a look at some of the examples in the repo: |
@pdmack thanks for pointing those out and I saw this: https://github.com/nv-morpheus/Morpheus/blob/branch-23.11/examples/ransomware_detection/common/feature_extractor.py so now it's basically up to me to step through some of these examples to understand how the inputs are made. But I have one question. Say for example: https://github.com/nv-morpheus/Morpheus/blob/branch-23.11/examples/ransomware_detection/models/ransomw-model-short-rf/config.pbtxt so know it's expecting a 297 *1 dimension input vector which I am guessing can be made by the crreate_features.py script after extract_features.py makes the json files found here: https://github.com/nv-morpheus/Morpheus/tree/branch-23.11/examples/data/appshield/Heur/snapshot-1 based on telemetry it measures right? But I still have no idea what this model is, if there was any transfer learning, what dataset it was trained on, how it can be fine-tuned more, etc. |
Are some of these models proprietory? Am I wrong in assuming they would be on Hugging Face or some other publicly well known model depot? |
Also with telemetry collection on Windows, I was asking ChatGPT about this: https://chat.openai.com/share/425a2879-ccfd-48f4-a359-8c39e073a256 But would doing something like below be sufficient? ChatGPT: Since you’re requesting an Ubuntu version, the collection of some specific telemetry data like Let’s refine the scripts more closely to your needs: Step 1: Telemetry Data CollectionHere’s a refined version of the Ubuntu Telemetry Data Collection script. This script will collect data similar to what was observed in import os
import json
import time
import psutil
class TelemetryCollector:
def __init__(self, base_dir='raw_telemetry_data'):
self.base_dir = base_dir
os.makedirs(self.base_dir, exist_ok=True)
def collect_data(self):
data = {
'cmdline': [p.cmdline() for p in psutil.process_iter(attrs=['cmdline'])],
'envars': dict(os.environ),
'pslist': [p.info for p in psutil.process_iter(attrs=['pid', 'name'])],
'netscan': [c.laddr + c.raddr for c in psutil.net_connections(kind='inet')],
# Add other required telemetry data collection methods here
}
return data
def run(self, interval=60):
while True:
data = self.collect_data()
timestamp = time.strftime("%Y-%m-%d_%H-%M-%S")
for data_type, content in data.items():
file_path = os.path.join(self.base_dir, f'{data_type}_{timestamp}.json')
with open(file_path, 'w') as f:
json.dump(content, f)
time.sleep(interval)
if __name__ == "__main__":
collector = TelemetryCollector()
collector.run() Step 2: Feature Extraction and StructuringThis step will involve reading the collected raw telemetry data and applying the import os
import time
import glob
from feature_extractor import FeatureExtractor # Importing from the provided Python file
class FeatureExtractorDriver:
def __init__(self, raw_data_dir='raw_telemetry_data', structured_data_dir='structured_data'):
self.raw_data_dir = raw_data_dir
self.structured_data_dir = structured_data_dir
os.makedirs(self.structured_data_dir, exist_ok=True)
self.feature_extractor = FeatureExtractor() # Initialize FeatureExtractor
def structure_data(self):
# Example to process raw data files in raw_data_dir
# Adjust the logic as per the actual methods of feature_extractor.py and create_features.py
for file_path in glob.glob(os.path.join(self.raw_data_dir, '*.json')):
with open(file_path, 'r') as f:
raw_data = json.load(f)
# Process raw_data using feature_extractor and create_features.py
# Save the processed data in structured_data_dir in snapshot-x directories
# Implement the details based on the actual methods in the provided Python files
pass
def run(self, interval=60):
while True:
self.structure_data()
time.sleep(interval)
if __name__ == "__main__":
driver = FeatureExtractorDriver()
driver.run() Note:
Would you like any specific refinements or additions to these scripts based on the provided Python files and the specific feature extraction methods they contain? |
or for Windows: ChatGPT: Here’s a refined version tailored for Windows: Step 1: Telemetry Data Collectionimport os
import json
import time
import psutil
import subprocess # To run Windows-specific commands if needed
class TelemetryCollector:
def __init__(self, base_dir='raw_telemetry_data'):
self.base_dir = base_dir
os.makedirs(self.base_dir, exist_ok=True)
def collect_cmdline(self):
return {proc.info['pid']: proc.cmdline() for proc in psutil.process_iter(attrs=['pid', 'cmdline'])}
def collect_envars(self):
# Collecting environment variables. Specific method may vary.
return dict(os.environ)
def collect_pslist(self):
return [p.info for p in psutil.process_iter(attrs=['pid', 'name'])]
def collect_netscan(self):
return [conn.laddr + conn.raddr for conn in psutil.net_connections(kind='inet')]
# Collect additional windows specific data like handles, ldrmodules, etc.
def collect_data(self):
data = {
'cmdline': self.collect_cmdline(),
'envars': self.collect_envars(),
'pslist': self.collect_pslist(),
'netscan': self.collect_netscan(),
# Add other required telemetry data collection methods here
}
return data
def run(self, interval=60):
while True:
data = self.collect_data()
timestamp = time.strftime("%Y-%m-%d_%H-%M-%S")
for data_type, content in data.items():
file_path = os.path.join(self.base_dir, f'{data_type}_{timestamp}.json')
with open(file_path, 'w') as f:
json.dump(content, f)
time.sleep(interval)
if __name__ == "__main__":
collector = TelemetryCollector()
collector.run() Step 2: Feature Extraction and StructuringThis step remains largely similar to the Ubuntu version, with the focus being on reading the raw telemetry data, applying the import os
import time
import glob
from feature_extractor import FeatureExtractor # Importing from the provided Python file
class FeatureExtractorDriver:
def __init__(self, raw_data_dir='raw_telemetry_data', structured_data_dir='structured_data'):
self.raw_data_dir = raw_data_dir
self.structured_data_dir = structured_data_dir
os.makedirs(self.structured_data_dir, exist_ok=True)
self.feature_extractor = FeatureExtractor() # Initialize FeatureExtractor
def structure_data(self):
# Example to process raw data files in raw_data_dir
# Adjust the logic as per the actual methods of feature_extractor.py and create_features.py
for file_path in glob.glob(os.path.join(self.raw_data_dir, '*.json')):
with open(file_path, 'r') as f:
raw_data = json.load(f)
# Process raw_data using feature_extractor and create_features.py
# Save the processed data in structured_data_dir in snapshot-x directories
# Implement the details based on the actual methods in the provided Python files
pass
def run(self, interval=60):
while True:
self.structure_data()
time.sleep(interval)
if __name__ == "__main__":
driver = FeatureExtractorDriver()
driver.run() Note:
Please review and modify these scripts according to your exact needs, especially focusing on the collection of Windows-specific telemetry data and the details of feature extraction and structuring based on the provided Python files. If you need assistance with a specific part of these scripts or have additional requirements, feel free to ask! |
https://github.com/aquasecurity/appshield Appshield is mentioned in the documentation somewhere but noticed that is an archived repo so not maintained right? So that makes NVIDIA Morpheus seem ever-so-slightly outdated or not maintained well since it references an archived repo. Can @pdmack or someone else comment? |
Hey @nyck33 - the models and examples here are just that: example use-cases of the Morpheus SDK. They're not meant to be the final models or workflow that you implement, more of a starter for you to see how you may choose to make Morpheus fit within your system. On the AppShield repo, that's not the AppShield referenced in our workflow, it just shares the same name. Per our notebook we specifically note that App Shield means DOCA App Shield. I think this issue will be better served as a Discussion thread so I'm going to convert it - if you have a specific documentation change you'd like to see, please file an issue we can action on, otherwise we can continue to discuss here. |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
How would you describe the priority of this documentation request
Medium
Describe the future/missing documentation
https://docs.nvidia.com/doca/sdk/app-shield-programming-guide/index.html#abstract
The whole time I believed it would be possible to but it's not? It is or I'd have to figure out the telemetry part on my own? I mean, if I want to use Morpheus but DPU is not available on GCP, Azure or AWS or available in limited quantities then I'm in trouble right?
Can you clarify on this point?
Where have you looked?
https://github.com/nyck33/Morpheus_nkim_fork/blob/branch-23.11/docs/source/models_and_datasets.md
https://docs.nvidia.com/doca/sdk/app-shield-programming-guide/index.html#abstract
https://developer.nvidia.com/networking/doca
https://docs.nvidia.com/doca/sdk/developer-qsg/index.html
Code of Conduct
The text was updated successfully, but these errors were encountered: