Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timesketch_importer duplicates jsonl events (imports twice) #2796

Open
boingomw opened this issue Jun 15, 2023 · 11 comments · May be fixed by #3115
Open

Timesketch_importer duplicates jsonl events (imports twice) #2796

boingomw opened this issue Jun 15, 2023 · 11 comments · May be fixed by #3115
Assignees
Labels

Comments

@boingomw
Copy link

Describe the bug
timesketch_importer runs twice when executed on json_line files, resulting in double events.

To Reproduce
Steps to reproduce the behavior:

  1. Create a sample timeline:
    log2timeline.py --storage_file example.plaso /usr/bin
  2. verify how many lines are in the file:
    pinfo.py example.plaso
  3. import into timesketch:
    timesketch_importer --host http://127.0.0.1:81 -u examiner -p xxxxxx --sketch_id 19 example.plaso
  4. verify in gui you have 1 import and the correct # of events
  5. convert the plaso to json_line (normally, this is done for a reason, like slicing or something):
    psort.py -o json_line -w example.jsonl example.plaso
  6. get a word count to verify the # of messages hasn't changed:
    wc example.jsonl
  7. Import this into timesketch:
    timesketch_importer --host http://127.0.0.1:81 -u examiner -p xxxxx--sketch_id 19 example.jsonl
  8. Check Gui:
    example-jsonl
    14.1K events (2 imports: details)
    example
    7K events (imported with CLI importer tool)

Expected behavior
Expected it to not double import

Screenshots
image

Desktop (please complete the following information):

  • OS: ubuntu
  • Browser chrome
  • Version 22

latest docker install, as of 6/15/2023

@S-Nicholas
Copy link

Same issue as #2334

@berggren
Copy link
Contributor

@boingomw I see that you have 2 imports reported. This indicates that you ran the importer twice with the same timeline name. That will put the events in the same timeline.

Can you confirm that this isn't the case? I can't reproduce this issue on my end.

@berggren berggren self-assigned this Sep 19, 2023
@boingomw
Copy link
Author

I did run it twice. once for the .json file and once for the .plaso file. The issue is that the files both had the same amount of lines in them, but when I imported the json file, it ended up having 2x the number of events.

so when you do the process above you end up with 7k for the .json and 7k for the .plaso file?

@arisjr
Copy link

arisjr commented Sep 25, 2023

I can confirm here that timesketch_importer is also creating doubled sources for JSONL imports, doubling the events on searches. The same does not apply to web imports, that seems to import correctly.

# timesketch_importer --version
API Client Version: 20230721
Importer Client Version: 20230721

If you need a sample jsonl, I can supply.

Regards,

@jaegeral
Copy link
Collaborator

jaegeral commented Sep 25, 2023

I just tried it with timesketch --sketch 1 import /usr/local/src/timesketch/temp/sigma_temp.jsonl (the CLI tool) the content of the file being (https://github.com/google/timesketch/blob/master/test_tools/test_events/sigma_events.jsonl):

{"message": "A message","timestamp": 123456789,"datetime": "2015-07-24T19:01:01+00:00","timestamp_desc": "Write time","extra_field_1": "foo"}
{"message": "Another message","timestamp": 123456790,"datetime": "2015-07-24T19:01:02+00:00","timestamp_desc": "Write time","extra_field_1": "bar"}
{"message": "Yet more messages","timestamp": 123456791,"datetime": "2015-07-24T19:01:03+00:00","timestamp_desc": "Write time","extra_field_1": "baz"}
{"message": "Install: zmap:amd64 (1.1.0-1) [Commandline: apt-get install zmap]","timestamp": 123456791,"datetime": "2015-07-24T19:01:03+00:00","timestamp_desc": "foo","command":"Commandline: apt-get install zmap","data_type":"apt:history:line","display_name":"GZIP:/var/log/apt/history.log.1.gz","filename":"/var/log/apt/history.log.1.gz","packages":"Install: zmap:amd64 (1.1.0-1)","parser":"apt_history"}
{"message": "[11 / 0x000b] Source Name: Microsoft-Windows-Sysmon Strings: ['DLL', '2022-01-22 23:07:43.492', '{C784477D-8DE8-61EC-AAAA-000000003C00}', '7812', 'C:\\Windows\\tifubjdl\\lysjbpb.exe', 'C:\\Windows\\itfnduuui\\Corporate\\mimilib.dll', '2022-01-22 23:07:43.492'] Computer Name: DESKTOP-B0TAAAA Record Number: 913 Event Level: 4","computer_name":"DESKTOP-B0TAAAA","data_type":"windows:evtx:record","datetime":"2022-01-22T23:07:43.502205+00:00","display_name":"OS:/data/input/Microsoft-Windows-Sysmon%4Operational.evtx","event_identifier":"11","event_level":"4","message_identifier":"11","parser":"winevtx","source_name":"Microsoft-Windows-Sysmon","timestamp":"1642892863502205","timestamp_desc":"Creation Time" }

And I got a new timeline with 5 events
Screenshot 2023-09-25 at 15 10 28

@boingomw
Copy link
Author

Maybe it's volume related and 5 isn't enough lines to trigger

@arisjr
Copy link

arisjr commented Sep 25, 2023

sample.zip

@jaegeral , try this. password: sample123
It has 484 events, but doubles up on importing.

Regards

@arisjr
Copy link

arisjr commented Sep 25, 2023

@jaegeral I just realized that you're using timesketch cli instead of timesketch-import-client (timesketch_importer). Is there any difference on the approaches?

@jaegeral
Copy link
Collaborator

Hm indeed, it is importing them twice.

@jaegeral
Copy link
Collaborator

fwiw, I am still working on this, it seems my e2e tests in #2976 does not trigger it.

@mari0d
Copy link

mari0d commented Jun 27, 2024

Still seeing this bug in the latest version of TS. Looking at the code this flush call isn't needed since the stream close method calls flush() already. We are seeing duplicates because flush() is called twice (and the _data_lines buffer isn't cleared directly by flush() which makes the method name a bit misleading).

@mari0d mari0d linked a pull request Jun 27, 2024 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants