Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing this model ? #12

Open
RohithYogi opened this issue Apr 3, 2024 · 2 comments
Open

Testing this model ? #12

RohithYogi opened this issue Apr 3, 2024 · 2 comments

Comments

@RohithYogi
Copy link

How do I test the model by loading the ones in checkpoints - like if I have a PE file - I can feed it to the model and get the result (true/false)?

@RohithYogi
Copy link
Author

RohithYogi commented Apr 3, 2024

python -m src.bin.malshare - NOT WORKING
python train.py --benign_dir="../../raw/dll/" --malware_dir="../../raw/dasmalwerk/" - ERROR [Unpickling](https://github.com/jaketae/deep-malware-detection/issues/2)

python -m src.bin.dll
python extract_header.py --input_dir="../../raw/dll/" --output_dir="../../data/dll/"

python -m src.bin.dasmalwerk
python extract_header.py --input_dir="../../raw/dasmalwerk/" --output_dir="../../data/dasmalwerk/"

python train.py --benign_dir="/workspaces/deep-malware-detection/data/dll/" --malware_dir="/workspaces/deep-malware-detection/data/dasmalwerk/"

After I resolved the issues - I am now able to train the model - but that's it
If I give a new sample malware to benign then it should predict it right (a boolean) -

How can I test this ?

@jaketae
Copy link
Owner

jaketae commented Apr 28, 2024

Hi, I think what you're looking for is an evaluation/inference pipeline. I don't have a script specifically for that purpose, but you should be able to write one pretty quickly by importing these functions:

def predict(model, data_loader, device, apply_sigmoid=False, to_numpy=True):
model.eval()
y_true = []
y_pred = []
for inputs, labels in tqdm(data_loader, leave=False):
inputs = inputs.to(device)
outputs = model(inputs)
y_true.append(labels)
y_pred.append(outputs)
y_true = torch.cat(y_true).to(int)
if apply_sigmoid:
y_pred = sigmoid(torch.cat(y_pred))
else:
y_pred = (torch.cat(y_pred) > 0).to(int)
if to_numpy:
y_true = y_true.cpu().numpy()
y_pred = y_pred.cpu().numpy()
assert y_true.shape == y_pred.shape
model.train()
return y_true, y_pred
def get_accuracy(model, data_loader, device):
y_true, y_pred = predict(model, data_loader, device, to_numpy=False)
return 100 * (y_true == y_pred).to(float).mean().item()

Specifically, you can load your model and create a data loader for your dataset to measure model accuracy.

To get a prediction for a single file, you can take a look at the predict function to see how you can get a boolean label given an input. Basically, if the output of the model is negative, it means a 0, otherwise, it is 1. If you prefer a probabilistic interpretation, you can also cast a sigmoid to the output.

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants