Modules do not consider the 1000 objects limit of the API #462

hawa-b · 2023-03-29T13:27:00Z

Describe the bug

We observed the misbehavior with the MP modules because we are currently still bound to this API by our running TKGi installation. We did not investigate whether it is also faulty with the Policy API modules. I'm not shure if it was fixed for Policy API completely in issue #439.

The background is that the NSX API limits the returns to a maximum of 1000 objects. If there are more objects of one type (in our case e.g. logical switches or logical switch ports), you can see that a "cursor" is returned. The cursor is then used in the next call to get the "next page". You have to loop until no "cursor" comes back, then you have reached the last page and collected all objects.

This cursor mechanism is not operated by the MP modules. The error is in the request() function in script https://github.com/vmware/ansible-for-nsxt/blob/master/plugins/module_utils/vmware_nsxt.py

There after a simple error handling just the result of the API call is returned to the calling function. These in turn rely on the fact that after a call to request(), for example for a GET, they get all objects in the return. So the calling functions don't check if there are more than 1000 objects either.

The usual procedure of MP modules, e.g. when changing the parameters of a logical switch:

GET call via request() to determine all existing LS (incorrectly without cursor mechanism).
In the returned list the corresponding display name is searched for and for this the UUID of the object is stored if it exists
If an ID was found, the object is changed
If the ID was not found, a new object is created

If there are more than 1000 objects, the search will not find the searched object if it is not among the first 1000. Then the Ansible module assumes that an object of that name does not yet exist and creates (on each playbook run) a new object. The multiple objects with the same name then lead to further problems...

Reproduction steps

Development environments or labs are usually small installations with few NSX objects, so the error is usually not noticed there.

To reproduce the error, more than 1000 objects (e.g. 1050) of one type (for example logical switches, created with nsxt_logical_switches) must exist.
Then try to change something in one of the last created objects (change the description or the VLAN-ID with nsxt_logical_switches). It can be a bit difficult to find a suitable object, because it must be >1000 in the list. But because the GET calls have "sort_by": "create_time" set, the last generated object should be a good candidate.
If you look at the objects in the NSX GUI after one or more playbook runs, you will find several of them with identical display names...

Expected behavior

It is expected that the MP modules can also correctly handle object sets >1000 items.

For this purpose the mentioned function request() has to be modified to handle the "cursor" or pagination mechanism correctly.

Additional context

No response

The text was updated successfully, but these errors were encountered:

Priyanka-Middha · 2023-10-10T05:08:15Z

Problem definition: Pagination was not working as expected for more than 1000 objects in Ansible MP modules

Symptoms: [Enter 2-4 sentences]
If there are more than 1000 objects, the search will not find the searched object if it is not among the first 1000. Then the Ansible module assumes that an object of that name does not yet exist and creates (on each playbook run) a new object. The multiple objects with the same name then lead to further problems

Impact to customer: [Enter 2-4 sentences]
Duplicate objects were created

Relevant log’s location: [Enter log location] Ansible command output

Steps to reproduce: [Provide the steps to reproduce this problem]
Development environments or labs are usually small installations with few NSX objects, so the error is usually not noticed there.

To reproduce the error, more than 1000 objects (e.g. 1050) of one type (for example logical switches, created with nsxt_logical_switches) must exist.
Then try to change something in one of the last created objects (change the description or the VLAN-ID with nsxt_logical_switches). It can be a bit difficult to find a suitable object, because it must be >1000 in the list. But because the GET calls have "sort_by": "create_time" set, the last generated object should be a good candidate.
If you look at the objects in the NSX GUI after one or more playbook runs, you will find several of them with identical display names...

Workaround: [Enter the workaround, if any, else state “None”] None

Resolution: [Enter the problem’s resolution, if identified, else, state “None”] Fixed as part of #469 and #471

Versions where this is a known issue: [Enter the version numbers where this is a known issue.] master branch

Version where this is fixed: [Enter the version numbers where this is fixed.] https://github.com/vmware/ansible-for-nsxt/tree/master

hawa-b added the bug Something isn't working label Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modules do not consider the 1000 objects limit of the API #462

Modules do not consider the 1000 objects limit of the API #462

hawa-b commented Mar 29, 2023

Priyanka-Middha commented Oct 10, 2023

Modules do not consider the 1000 objects limit of the API #462

Modules do not consider the 1000 objects limit of the API #462

Comments

hawa-b commented Mar 29, 2023

Describe the bug

Reproduction steps

Expected behavior

Additional context

Priyanka-Middha commented Oct 10, 2023