Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[taxii2] Infinite loop in handling "next" value #1683

Closed
netbroom opened this issue Jan 10, 2024 · 21 comments
Closed

[taxii2] Infinite loop in handling "next" value #1683

netbroom opened this issue Jan 10, 2024 · 21 comments
Labels
bug use for describing something not working as expected solved use to identify issue that has been solved (must be linked to the solving PR)

Comments

@netbroom
Copy link
Contributor

Description

This pull request adds code to check "next" value in TAXII 2.1 against a list of available collections.

However, "next" is not a collection, it's a value that the server provides for pagination based on each server's individual implementation.

From the TAXII 2.1 docs:

This property identifies the server provided value of the next record or set of records in the paginated data set. This property MAY be populated if the more property is set to true.

This value is opaque to the client and represents something that the server knows how to deal with and process.

For example, for a relational database this could be the index autoID, for elastic search it could be the Scroll ID, for other systems it could be a cursor ID, or it could be any string (or int represented as a string) depending on the requirements of the server and what it is doing in the background.

This causes the connector to enter an infinite loop:

# Assuming newer versions will support next
while True:
    objects = process_response(objects, response, version)

    # Check if "more" exists in response and its value is True
    if "more" in response and response["more"] == True:
        filters["next"] = response["next"]

        ###
        ### checks for "next" value in list of collections
        ###
        if (
            response["next"] in self.available_collections
            and len(self.available_collections) > 0
        ):
            response = collection.get_objects(**filters)

        ###
        ### "next" value is not a collection so it will continue in the loop
        ###

    ###
    ### "response" is not updated, meaning "more" is still True, so this will never execute to break the loop
    ###
    else:
        # "more" doesn't exist or is not True, exit the loop
        break

Environment

Reproduced logic with samples of raw taxii2.py code.

Reproducible Steps

Add TAXII2.1 feed using connector.

@annoyingapt
Copy link
Contributor

I think the spirit of the pr was to have a check for next. I would probably recommend rolling back the code and changing it to something like..

if "more" in response and "next" in response and response["more"] == True:

@annoyingapt
Copy link
Contributor

@daniele2010 what are your thoughts on this? I haven't come across next not being in root, did you have an example response I could review?

@daniele2010
Copy link
Contributor

Hi, at the time of the PR the problem happened while I was trying to integrate https://osint.digitalside.it/taxiiserver.html

I've debugged the code, below an extract of the response in the PR part:

{'more': True, 'next': 'a73b346e-33df-4b13-b878-c5538aba8d7c', 'objects': [...]

that next key is checked against a list of id:

self.available_collections.extend([c.id for c in api_root.collections])

maybe @netbroom has a different output from the server, but this is the case where I was working at the time and it seems to work.

@netbroom
Copy link
Contributor Author

@annoyingapt I think the safest and quickest option at this time is to roll back the commit before adding additional code as

  • This check is not needed, as "next" can be any value determined by the server, according to the specification; it does not need to be a collection ID.
  • This may be impacting or preventing other parts of the platform from running, or causing performance issues, because this connector gets caught in an infinite loop, processing the same response objects each iteration.

Once the issue is fixed though I agree the connector should check if "next" has a value. I believe "next" should also be different than the current "next" value but not totally sure about that. Otherwise the same request will be made in a loop.

@daniele2010 What was the reasoning that "next" must be a collection ID and the job must abort if it isn't?

@annoyingapt
Copy link
Contributor

I'm not near a laptop for another couple of weeks, so someone else will need to raise a pr to fix this. I haven't looked at the digital side feed yet, but it might be that it more appropriate to use the misp feed connector to consume their feed. Note to whoever fixes this, if you roll back the code, can you merge the change I made here #1657

netbroom added a commit to netbroom/connectors that referenced this issue Jan 11, 2024
@netbroom
Copy link
Contributor Author

@annoyingapt I created #1686 to apply on top of the latest code in master in the interest of time and so your additional changes would not be lost.

@netbroom
Copy link
Contributor Author

#1686 merged into master, closing as completed.

@SamuelHassine SamuelHassine added bug use for describing something not working as expected solved use to identify issue that has been solved (must be linked to the solving PR) labels Jan 12, 2024
@daniele2010
Copy link
Contributor

@netbroom the problem is that DigitalSide provides a next string that is a collection id but it is not valid:

The response contains this id:

response["next"]
'89b239cf-949b-42f1-b66f-90c7ca5f92eb'

while the root server has only these two collections:

self.available_collections
0: 'e98d6c94-fbce-11ed-b5dd-3bad2ffe9ebf'
1: 'c1f43330-103b-11ee-9ee3-4b022e286589'

so the next instruction would fail because there is no 89b239cf-949b-42f1-b66f-90c7ca5f92eb in the server:

response = collection.get_objects(**filters)

I know this is a problem with DigitalSide server but some sort of control on the parameter is still needed, maybe a control on the validity of the UUID?
It could be implemented a check if the next parameter is a UUID, something like: "if the next is a UUID enter if block in my PR otherwise continue normally"

@netbroom
Copy link
Contributor Author

@daniele2010 I'm not familiar with DigitalSide but it's possible that is a UUID for an object in the collection, not a collection ID.

The next value is determined purely by the server to specify to the client what value should be provided to retrieve the next objects in the collection.

next can be any value. It does not need to be a UUID.

For example if the server uses an integer for indicator IDs stored in the database, the server may provide the last indicator ID in the response so that when the client requests the next page of objects, the server knows where it left off.

In our TAXII feed next is a combination of a timestamp and an indicator ID, for example: 2023-11-16 22:40:05|23070925

The format of next should be validated by the TAXII server, not the client. The client should send back whatever next was provided by the server. If the next value is invalid for whatever reason, the server should respond with an error which the connector should then handle.

I think the only validation that should be done on next is to check if it's the same value as the current next. If the value is the same, the connector and server may get caught in a loop requesting the same objects.

@annoyingapt
Copy link
Contributor

@daniele2010 can you try the misp-feed connector instead? https://osint.digitalside.it/Threat-Intel/digitalside-misp-feed/

@daniele2010
Copy link
Contributor

@netbroom So since next can be any value, before calling a collection.get_objects against it, it should be validated to be sure that is a valid id of a collection, in your TAXII feed how it is handled by the clients?

filters["next"] = response["next"]
response = collection.get_objects(**filters)

@annoyingapt thank you, I'll try in the next days, I would have liked to avoid it since I already have a MISP instance and it'd be cleaner to import it from there instead of OpenCTI

@annoyingapt
Copy link
Contributor

The opencti misp-feed connector pulls misp feeds like circl. No misp needed. That is a different connector.

@netbroom
Copy link
Contributor Author

netbroom commented Jan 12, 2024

So since next can be any value, before calling a collection.get_objects against it, it should be validated to be sure that is a valid id of a collection, in your TAXII feed how it is handled by the clients?

@daniele2010

next is not a collection ID.

Do not validate it.

The connector will break.

@annoyingapt
Copy link
Contributor

annoyingapt commented Jan 12, 2024

I really want to have a deep dive into this feed when I get back. But next should only be used for pagination of a single collection which have more objects than what is available in the request. So the server is basically telling you, use this value as a search and I will bring up the next 100 objects for example. The server should not be trying to daisy chain collections together, there is other code in the python code to cycle between collection ids.

@netbroom
Copy link
Contributor Author

@annoyingapt yes exactly.

For TAXII 2 the collection ID is already in the URL, IE /taxii2/[api-root]/collections/[collection ID]/objects/?[filters, including next]

We have a free TAXII 2.1 test collection available to test with here if anyone would like to use it to test this connector.

@daniele2010
Copy link
Contributor

@annoyingapt you're right, it shouldn't be done that way but it is still valid, since in next parameter you can put anything as @netbroom said.

The thing that I think is not correct, but please explain to me if I'm wrong, is that in case next is not a collection id the method collection.get_objects shouldn't get called because it is not a valid id.

Moreover, in my case if the client asks for those objects it couldn't find them and it would crash.

So, to recap, before my PR there was a bug with the integration that would not let me extract data from digitalside TAXII server, with my PR I have introduced a bug, but the solution I think is not reverse the PR but introduce a fix that would let anyone use any TAXII sever, otherwise we would lose the benefit of using this connector.

@annoyingapt
Copy link
Contributor

The next variable is being sent as a filter argument. Don't worry, I'll have a look at the feed soon and can update you with what I find. There is also a taxii client built into the opencti gui you can use too, which is independent of this connector.

@annoyingapt
Copy link
Contributor

annoyingapt commented Jan 13, 2024

I get the feeling that when you changed the code, the uuid in next might not have been a collection id like you thought, so by not matching, you aren't pulling the second request and creating the error you were seeing. But this is all hypothetical.

@netbroom
Copy link
Contributor Author

The thing that I think is not correct, but please explain to me if I'm wrong, is that in case next is not a collection id the method collection.get_objects shouldn't get called because it is not a valid id.

@daniele2010 collection is the Python object that handles each collection ID and requests to that collection. The TAXII client creates a collection object when you specify which collection you want to get objects from.

So when collection.get_objects() is called, the objects only from that collection are queried. The filters are only applied for that collection.

next is just a filter on the same collection used for keeping track of which objects in that collection were already received. It doesn't tell the server which collection to query; that is a different part of the code that initializes the collection object.

@annoyingapt
Copy link
Contributor

annoyingapt commented Jan 22, 2024

I just spun up the misp feed connector with the digitalside feed and it works out of the box on OpenCTI v5.11.14. I'll test their taxii server now.
image

@annoyingapt
Copy link
Contributor

For visibility.
davidonzo/Threat-Intel#45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug use for describing something not working as expected solved use to identify issue that has been solved (must be linked to the solving PR)
Projects
None yet
Development

No branches or pull requests

4 participants