Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modules do not consider the 1000 objects limit of the API #462

Open
hawa-b opened this issue Mar 29, 2023 · 1 comment
Open

Modules do not consider the 1000 objects limit of the API #462

hawa-b opened this issue Mar 29, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@hawa-b
Copy link

hawa-b commented Mar 29, 2023

Describe the bug

We observed the misbehavior with the MP modules because we are currently still bound to this API by our running TKGi installation. We did not investigate whether it is also faulty with the Policy API modules. I'm not shure if it was fixed for Policy API completely in issue #439.

The background is that the NSX API limits the returns to a maximum of 1000 objects. If there are more objects of one type (in our case e.g. logical switches or logical switch ports), you can see that a "cursor" is returned. The cursor is then used in the next call to get the "next page". You have to loop until no "cursor" comes back, then you have reached the last page and collected all objects.

This cursor mechanism is not operated by the MP modules. The error is in the request() function in script https://github.com/vmware/ansible-for-nsxt/blob/master/plugins/module_utils/vmware_nsxt.py

There after a simple error handling just the result of the API call is returned to the calling function. These in turn rely on the fact that after a call to request(), for example for a GET, they get all objects in the return. So the calling functions don't check if there are more than 1000 objects either.

The usual procedure of MP modules, e.g. when changing the parameters of a logical switch:

  1. GET call via request() to determine all existing LS (incorrectly without cursor mechanism).
  2. In the returned list the corresponding display name is searched for and for this the UUID of the object is stored if it exists
  3. If an ID was found, the object is changed
  4. If the ID was not found, a new object is created

If there are more than 1000 objects, the search will not find the searched object if it is not among the first 1000. Then the Ansible module assumes that an object of that name does not yet exist and creates (on each playbook run) a new object. The multiple objects with the same name then lead to further problems...

Reproduction steps

Development environments or labs are usually small installations with few NSX objects, so the error is usually not noticed there.

  1. To reproduce the error, more than 1000 objects (e.g. 1050) of one type (for example logical switches, created with nsxt_logical_switches) must exist.
  2. Then try to change something in one of the last created objects (change the description or the VLAN-ID with nsxt_logical_switches). It can be a bit difficult to find a suitable object, because it must be >1000 in the list. But because the GET calls have "sort_by": "create_time" set, the last generated object should be a good candidate.
  3. If you look at the objects in the NSX GUI after one or more playbook runs, you will find several of them with identical display names...

Expected behavior

It is expected that the MP modules can also correctly handle object sets >1000 items.

For this purpose the mentioned function request() has to be modified to handle the "cursor" or pagination mechanism correctly.

Additional context

No response

@hawa-b hawa-b added the bug Something isn't working label Mar 29, 2023
@Priyanka-Middha
Copy link
Contributor

Problem definition: Pagination was not working as expected for more than 1000 objects in Ansible MP modules

Symptoms: [Enter 2-4 sentences]
If there are more than 1000 objects, the search will not find the searched object if it is not among the first 1000. Then the Ansible module assumes that an object of that name does not yet exist and creates (on each playbook run) a new object. The multiple objects with the same name then lead to further problems

Impact to customer: [Enter 2-4 sentences]
Duplicate objects were created

Relevant log’s location: [Enter log location] Ansible command output

Steps to reproduce: [Provide the steps to reproduce this problem]
Development environments or labs are usually small installations with few NSX objects, so the error is usually not noticed there.

To reproduce the error, more than 1000 objects (e.g. 1050) of one type (for example logical switches, created with nsxt_logical_switches) must exist.
Then try to change something in one of the last created objects (change the description or the VLAN-ID with nsxt_logical_switches). It can be a bit difficult to find a suitable object, because it must be >1000 in the list. But because the GET calls have "sort_by": "create_time" set, the last generated object should be a good candidate.
If you look at the objects in the NSX GUI after one or more playbook runs, you will find several of them with identical display names...

Workaround: [Enter the workaround, if any, else state “None”] None

Resolution: [Enter the problem’s resolution, if identified, else, state “None”] Fixed as part of #469 and #471

Versions where this is a known issue: [Enter the version numbers where this is a known issue.] master branch

Version where this is fixed: [Enter the version numbers where this is fixed.] https://github.com/vmware/ansible-for-nsxt/tree/master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants