Testing this model ? #12

RohithYogi · 2024-04-03T04:31:03Z

How do I test the model by loading the ones in checkpoints - like if I have a PE file - I can feed it to the model and get the result (true/false)?

RohithYogi · 2024-04-03T08:55:12Z

python -m src.bin.malshare - NOT WORKING
python train.py --benign_dir="../../raw/dll/" --malware_dir="../../raw/dasmalwerk/" - ERROR [Unpickling](https://github.com/jaketae/deep-malware-detection/issues/2)

python -m src.bin.dll
python extract_header.py --input_dir="../../raw/dll/" --output_dir="../../data/dll/"

python -m src.bin.dasmalwerk
python extract_header.py --input_dir="../../raw/dasmalwerk/" --output_dir="../../data/dasmalwerk/"

python train.py --benign_dir="/workspaces/deep-malware-detection/data/dll/" --malware_dir="/workspaces/deep-malware-detection/data/dasmalwerk/"

After I resolved the issues - I am now able to train the model - but that's it
If I give a new sample malware to benign then it should predict it right (a boolean) -

How can I test this ?

jaketae · 2024-04-28T16:54:35Z

Hi, I think what you're looking for is an evaluation/inference pipeline. I don't have a script specifically for that purpose, but you should be able to write one pretty quickly by importing these functions:

deep-malware-detection/src/deep_malware_detection/utils.py

Lines 76 to 100 in 8c45fc0

    
           def predict(model, data_loader, device, apply_sigmoid=False, to_numpy=True): 
        
               model.eval() 
        
               y_true = [] 
        
               y_pred = [] 
        
               for inputs, labels in tqdm(data_loader, leave=False): 
        
                   inputs = inputs.to(device) 
        
                   outputs = model(inputs) 
        
                   y_true.append(labels) 
        
                   y_pred.append(outputs) 
        
               y_true = torch.cat(y_true).to(int) 
        
               if apply_sigmoid: 
        
                   y_pred = sigmoid(torch.cat(y_pred)) 
        
               else: 
        
                   y_pred = (torch.cat(y_pred) > 0).to(int) 
        
               if to_numpy: 
        
                   y_true = y_true.cpu().numpy() 
        
                   y_pred = y_pred.cpu().numpy() 
        
               assert y_true.shape == y_pred.shape 
        
               model.train() 
        
               return y_true, y_pred 
        
           def get_accuracy(model, data_loader, device): 
        
               y_true, y_pred = predict(model, data_loader, device, to_numpy=False) 
        
               return 100 * (y_true == y_pred).to(float).mean().item()

Specifically, you can load your model and create a data loader for your dataset to measure model accuracy.

To get a prediction for a single file, you can take a look at the predict function to see how you can get a boolean label given an input. Basically, if the output of the model is negative, it means a 0, otherwise, it is 1. If you prefer a probabilistic interpretation, you can also cast a sigmoid to the output.

Hope this helps!

jaketae mentioned this issue Apr 28, 2024

Further analysis #11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing this model ? #12

Testing this model ? #12

RohithYogi commented Apr 3, 2024

RohithYogi commented Apr 3, 2024 •

edited

Loading

jaketae commented Apr 28, 2024

Testing this model ? #12

Testing this model ? #12

Comments

RohithYogi commented Apr 3, 2024

RohithYogi commented Apr 3, 2024 • edited Loading

jaketae commented Apr 28, 2024

RohithYogi commented Apr 3, 2024 •

edited

Loading