Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update get_all Method in db.py to Retrieve All Items from DynamoDB table #642

Merged
merged 1 commit into from
Oct 1, 2024

Conversation

MustaphaU
Copy link
Contributor

@MustaphaU MustaphaU commented Sep 27, 2024

This update modifies the get_all method to ensure that all items from the DynamoDB table are retrieved during a scan operation.

Issue #, if available:
#639

Description of changes:
Update get_all Method in db.py to Retrieve All Items from DynamoDB Table

Reason for Change:

  • This update modifies the get_all method in the db.py file to ensure all items from the DynamoDB table are retrieved during a scan operation.
  • It addresses an issue in Lab 4 of the Personalization workshop, where discrepancies between reranked and unranked product lists occur due to incomplete data retrieval.

Updated get_all method:

def get_all(self):
    items = []
    response = self.table.scan()
    while 'LastEvaluatedKey' in response:
        items.extend(response['Items'])
        response = self.table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
    items.extend(response['Items'])
    return items
  • Context:
    In Lab 4, unranked_product_ids are pulled from the featured items table, subsequently used to obtain a reranked list of products from "all" products via AWS Personalize. The existing get_all method uses DynamoDB table scan which only retrieves a limited subset of items by default, thus the reranked list may contain fewer items than the unranked list.

Description of testing performed to validate your changes (required if pull request includes CloudFormation or source code changes):

Test:

all_products_resp = requests.get('http://{}/products/all'.format(products_service_instance))

all_products = all_products_resp.json()
print(len(all_products))

Output is 2465 representing all products.

  • Further tests were performed to ensure that both reranked and unranked lists now contain the same number of items when fetched for side-by-side comparison in Lab 4.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…Table

This update modifies the `get_all` method to ensure that all items from the DynamoDB table are retrieved during a scan operation.


**Reason for Change:** In Lab 4, `unranked_product_ids` are pulled from the `featured` items table, subsequently used to obtain a ranked list of products from "all" products via AWS Personalize. The existing `get_all` method's use of DynamoDB's scan operation only retrieves a limited subset of items by default, thus the ranked list may contain fewer items than the unranked list depending what's available in the fraction of the products table returned by the scan. This makes side-by-side comparisons of these lists difficult. 

This modification will ensure that all items are fetched through a paginated retrieval of data.

**Updated `get_all` function:
```
def get_all(self):
    items = []
    response = self.table.scan()
    while 'LastEvaluatedKey' in response:
        items.extend(response['Items'])
        response = self.table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
    items.extend(response['Items'])
    return items
```

Thus every item in the DynamoDB table is fetched before any operations that require a complete dataset, such as generating personalized rankings.
Copy link
Contributor

@BastLeblanc BastLeblanc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution !

@BastLeblanc BastLeblanc merged commit 1e4febc into aws-samples:master Oct 1, 2024
2 checks passed
@MustaphaU
Copy link
Contributor Author

Thanks for the contribution !

Thank you @BastLeblanc !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Shipped
Development

Successfully merging this pull request may close these issues.

2 participants