Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion from AWS Veteran and ORM Framework Author #1238

Open
MacHu-GWU opened this issue May 21, 2024 · 2 comments
Open

Suggestion from AWS Veteran and ORM Framework Author #1238

MacHu-GWU opened this issue May 21, 2024 · 2 comments

Comments

@MacHu-GWU
Copy link

MacHu-GWU commented May 21, 2024

I have noticed a significant design flaw in PynamoDB. In most ORM frameworks, such as SQLAlchemy and MongoEngine, users are allowed to define a connection object and use a context manager to choose which connection object to use in a code block.
For example, in SQLAlchemy, it looks like:

import sqlalchemy as sa
engine1 = sa.create_engine(...)
engine2 = sa.create_engine(...)
with engine1.connect() as conn:
conn.execute(...)

Additionally, these frameworks usually provide a feature to set a global connection object, eliminating the need to explicitly specify the connection.

In PynamoDB, there are three API levels: Connection, TableConnection, and Model. The Model class (not instance) has a private attribute called _connection, which creates the connection object when an API that requires an AWS API call is used, and it remains there indefinitely.

This design leads to an issue where users may switch the default AWS profile during runtime (for example, when testing on different AWS accounts or performing actions using a mock before switching to a real AWS account). Users might think they have switched the AWS profile, but PynamoDB will continue to use the cached connection and won't be able to switch AWS profiles. Below is an example:

from pynamodb.models import Model
from pynamodb.attributes import UnicodeAttribute, NumberAttribute
from pynamodb.connection import Connection
from pynamodb.constants import PAY_PER_REQUEST_BILLING_MODE
from boto_session_manager import BotoSesManager
from rich import print as rprint


class Item(Model):
    class Meta:
        table_name = "pynamodb_connection_example_key_value_items"
        region = "us-east-1"
        billing_mode = PAY_PER_REQUEST_BILLING_MODE

    key = UnicodeAttribute(hash_key=True)
    value = NumberAttribute()


bsm1 = BotoSesManager(profile_name="my_profile", region_name="us-east-1")
bsm2 = BotoSesManager(profile_name="my_profile", region_name="us-east-2")

# ------------------------------------------------------------------------------
# This won't work
# ------------------------------------------------------------------------------
with bsm1.awscli():
    conn = Connection()
    Item.Meta.region = bsm1.aws_region
    Item.create_table(wait=True) # expect to create table in us-east-1

with bsm2.awscli():
    conn = Connection()
    Item.Meta.region = bsm1.aws_region
    Item.create_table(wait=True)  # expect to create table in us-east-2

# ------------------------------------------------------------------------------
# This would work, but rely on the private attribute _connection
# ------------------------------------------------------------------------------
with bsm1.awscli():
    Item._connection = None
    Item.Meta.region = bsm1.aws_region
    conn = Connection()
    Item.create_table(wait=True)  # expect to create table in us-east-1

print("--- bsm2 ---")
with bsm2.awscli():
    Item._connection = None
    Item.Meta.region = bsm2.aws_region
    conn = Connection()
    Item.create_table(wait=True)  # expect to create table in us-east-2

Since a DynamoDB connection is a virtual concept, unlike an RDBMS connection, and is essentially a REST API rather than a long-living connection, I understand why PynamoDB is designed this way. However, for each API that requires an AWS API call, it should provide an optional parameter, such as a pynamodb.connection.Connection or boto3.session.Session object. If it is not provided, the existing logic should be used to determine the connection; otherwise, the specified connection should be used explicitly.

Furthermore, PynamoDB should provide a context manager that allows users to use pynamodb.connection.Connection or boto3.session.Session to override the current implicit connection. This aligns with the Python philosophy of "explicit is better than implicit." These two features are available in most ORM frameworks but are missing in PynamoDB.

Another issue I have found is that the connection object is a wrapper of the botocore client. I understand that the connection object aims to provide additional functionality, such as sending telemetry. However, this design limits the fine-grained control of the underlying boto3.session.Session or botocore.Client. For example, users cannot use auto-refreshable sessions. In my opinion, while the connection API is nice, users should not lose the capability to use the exact boto3.session.Session or botocore.Client. I acknowledge that this may require a more complex design to implement correctly, so I present this as a personal suggestion open for discussion.

Thank you for the great library.

@MrZoidberg
Copy link

exactly what I'm looking for. working with LocalStack and AWS within the same code in a nightmare in PynamoDB

@ikonst
Copy link
Contributor

ikonst commented Nov 12, 2024

This all makes sense to me. I'll happily review any PRs for the above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants