Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I just want to know that whether could u provide the data used in the project,THX #1

Closed
ForeverRuri opened this issue Oct 12, 2017 · 5 comments

Comments

@ForeverRuri
Copy link

I wanna the data set. plz!

@mprhode
Copy link
Owner

mprhode commented Oct 12, 2017

@ForeverZH0204 We hope to release the dataset but are waiting for a review on the paper before release, I will post here when it's out

@ForeverRuri
Copy link
Author

ForeverRuri commented Oct 13, 2017

thanks a lot!

I also read the paper related.I have several questions to list here and waiting for your are available:

1. how to understand the name 'a particular time into file execution'?

if i have a record with length of 20s, sample it with a interval of 5 seconds, what is the value of 'a particular time into file execution'? I suppose that to be 4, in other word, the actual amount of data used in the model.Is that right?

2. Confusion with figure7 , table X and table XI

As i read in the readme.md and the code, figure7 and table X come from the setting that use the whole training set and then test with a feature(s)-omit test set? And table XI comes from a omition on the whole dataset ? then explore the difference of total process's impact score.
If so, the conclusion of The

'impact score increases relative to others as more features are
omitted, this may indicate that total processes are combined
with other inputs to create discriminating features, though the
input is not highly impactful alone.'

is really hard for me to accept
I hope that i have a mistake.

thanks for your reply!
Wish u a good day.

@mprhode

@mprhode
Copy link
Owner

mprhode commented Oct 16, 2017

Hi @ForeverZH0204 - in answer to your questions:

  1. By "a particular time into file execution" we mean the real time since the start of the execution of the sample. We are arguing that more snapshots (i.e. more data) has a higher correlation with accuracy than the real time since the file began executing.

  2. You are right, Fig 7 and Table X looks at omission of data in the test set and Table XI looks at omission during training and testing. We are looking at the impact of all the features but in the discussion of the total processes feature, we argue that it's average impact score grows as more features are ommitted (the impact score is the fall in accuracy / number of features omitted). For some features, the impact does not really change when just this single feature is omitted, this feature + one other feature, or this feature + 2 other features. This implies that for these features the impact of their omission is not really affected by co-omission of other features. Because it the impact score of "total processes" increases with the number of features omitted at the same time, we believe this indicates that total processes is combined with other features in the RNN to give distinguishing representations between malicious and benign samples. In Table XI the omission of total processes sees one of the biggest falls in accuracy, so we think it is a useful feature for the model but that it's usefulness is realised when combined with other data. We can train further models with different combinations of inputs to test this (but we did not yet for this paper).

Thank you for your questions and I hope that has made it a little more clear - I will work on a presentation of the work which explains these points more clearly.

@ForeverRuri
Copy link
Author

thanks for your reply!
But for question 2,if we want to explore the relationship between the difference and a certain variable,I think we need to keep the other conditions unchanged.

@vinayakumarr
Copy link

When will exactly data set will be released for further research

@mprhode mprhode closed this as completed Dec 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants