I just want to know that whether could u provide the data used in the project,THX #1

ForeverRuri · 2017-10-12T09:56:19Z

I wanna the data set. plz!

mprhode · 2017-10-12T20:48:38Z

@ForeverZH0204 We hope to release the dataset but are waiting for a review on the paper before release, I will post here when it's out

ForeverRuri · 2017-10-13T01:51:51Z

thanks a lot!

I also read the paper related.I have several questions to list here and waiting for your are available:

1. how to understand the name 'a particular time into file execution'?

if i have a record with length of 20s, sample it with a interval of 5 seconds, what is the value of 'a particular time into file execution'? I suppose that to be 4, in other word, the actual amount of data used in the model.Is that right?

2. Confusion with figure7 , table X and table XI

As i read in the readme.md and the code, figure7 and table X come from the setting that use the whole training set and then test with a feature(s)-omit test set? And table XI comes from a omition on the whole dataset ? then explore the difference of total process's impact score.
If so, the conclusion of The

'impact score increases relative to others as more features are
omitted, this may indicate that total processes are combined
with other inputs to create discriminating features, though the
input is not highly impactful alone.'

is really hard for me to accept
I hope that i have a mistake.

thanks for your reply!
Wish u a good day.

@mprhode

mprhode · 2017-10-16T09:38:05Z

Hi @ForeverZH0204 - in answer to your questions:

By "a particular time into file execution" we mean the real time since the start of the execution of the sample. We are arguing that more snapshots (i.e. more data) has a higher correlation with accuracy than the real time since the file began executing.
You are right, Fig 7 and Table X looks at omission of data in the test set and Table XI looks at omission during training and testing. We are looking at the impact of all the features but in the discussion of the total processes feature, we argue that it's average impact score grows as more features are ommitted (the impact score is the fall in accuracy / number of features omitted). For some features, the impact does not really change when just this single feature is omitted, this feature + one other feature, or this feature + 2 other features. This implies that for these features the impact of their omission is not really affected by co-omission of other features. Because it the impact score of "total processes" increases with the number of features omitted at the same time, we believe this indicates that total processes is combined with other features in the RNN to give distinguishing representations between malicious and benign samples. In Table XI the omission of total processes sees one of the biggest falls in accuracy, so we think it is a useful feature for the model but that it's usefulness is realised when combined with other data. We can train further models with different combinations of inputs to test this (but we did not yet for this paper).

Thank you for your questions and I hope that has made it a little more clear - I will work on a presentation of the work which explains these points more clearly.

ForeverRuri · 2017-10-16T13:12:58Z

thanks for your reply!
But for question 2,if we want to explore the relationship between the difference and a certain variable,I think we need to keep the other conditions unchanged.

vinayakumarr · 2018-01-25T05:29:59Z

When will exactly data set will be released for further research

mprhode closed this as completed Dec 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I just want to know that whether could u provide the data used in the project,THX #1

I just want to know that whether could u provide the data used in the project,THX #1

ForeverRuri commented Oct 12, 2017

mprhode commented Oct 12, 2017

ForeverRuri commented Oct 13, 2017 •

edited

Loading

mprhode commented Oct 16, 2017

ForeverRuri commented Oct 16, 2017

vinayakumarr commented Jan 25, 2018

I just want to know that whether could u provide the data used in the project,THX #1

I just want to know that whether could u provide the data used in the project,THX #1

Comments

ForeverRuri commented Oct 12, 2017

mprhode commented Oct 12, 2017

ForeverRuri commented Oct 13, 2017 • edited Loading

1. how to understand the name 'a particular time into file execution'?

2. Confusion with figure7 , table X and table XI

mprhode commented Oct 16, 2017

ForeverRuri commented Oct 16, 2017

vinayakumarr commented Jan 25, 2018

ForeverRuri commented Oct 13, 2017 •

edited

Loading