Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Data too long for column 'resource_id' #523

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Faraz32123
Copy link

when the course id is a little bit long, the lti xblock id becomes too long for resource_id to handle

To reproduce:

  • Create a course with long ID. For example: course-v1:Public+TEST_IT_01+TEST_IT_01_Dec_2023
  • Add lti_consumer in Advanced settings for the course
  • Try to add an LTI component in the course with version 1.3
  • The unit crashes
    I have tested it locally
Screenshot 2024-12-27 at 11 56 37 AM

Muhammad Faraz Maqsood and others added 2 commits December 27, 2024 12:59
when the course id is a little bit long, the lti xblock id becomes too long for resource_id to handle
@Faraz32123
Copy link
Author

@feanil can u look into this PR and merge it. Thanks.

@Faraz32123 Faraz32123 requested a review from feanil December 27, 2024 08:12
@Faraz32123 Faraz32123 changed the title Fix/short resource fix: Data too long for column 'resource_id' Dec 27, 2024
Copy link
Contributor

@feanil feanil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me but I've asked @ormsbee to take a look to see if we can better couple this to the course ID ( and if that makes sense) so that we don't have future issues here. @Faraz32123 do you know if there is a max limit on the LTI side for this field?

@ormsbee
Copy link
Contributor

ormsbee commented Jan 6, 2025

@feanil: By the LTI 1.3 spec, I think this field should be limited to 255 chars anyhow, so I think we're good there.

It's also supposed to be case-sensitive, which the current implementation is not–but I don't think that's worth the potential headaches involved with fixing it at the moment.

@Faraz32123: There are a number of things that I don't understand about the LtiArgsLineItem model, and I would appreciate your thoughts:

  1. What is the cardinality of LtiArgsLineItem? Is it one row per LtiConsumerBlock in a course? Something else?
  2. If it is intended to map 1:1 to an LtiConsumerBlock instance, why is there no uniqueness constraint on the model?
  3. What is resource_id actually used for, and how is it different from resource_link_id, since they both seem to be set to the XBlock's location UsageKey? I had thought that maybe resource_id would be the identifier on the tool provider side, and the resource_link_id would be the id on the consumer side, but that doesn't make sense if they're the same.
  4. Why does the comment above resource_id say "...not used by the LMS."

Thank you.

@Faraz32123
Copy link
Author

@ormsbee hope you doing well. So, based on my understanding based on code, the LtiAgsLineItem model has a one-to-many relationship with LtiConfiguration, so that a single LtiConfiguration can have multiple LtiAgsLineItem instances. The uniqueness constraint is enforced at the level of the LtiAgsScore model, which links scores to specific line items and users, as indicated by the unique_together constraint in that model.

You are correct that resource_id serves as the identifier on the tool provider side, while resource_link_id is the identifier on the consumer side LMS. Although the code currently setting defaults for both to the same identifier(location), there may have been considerations during the initial development to allow having different identifiers for the LTI tool and the LMS resulting in making 2 separate fields, as both can have different values.

We can certainly dig deeper to explore this further.

@ormsbee
Copy link
Contributor

ormsbee commented Jan 9, 2025

@Faraz32123: So I have one short term concern, and a couple of longer-term ones.

The short-term one is just how expensive this migration is going to be, and if there are going to be any negative consequences for running it. LtiConsumerBlock is one of the more popular block types, meaning that a large site may have something on the order of 10-100K of these. I don't know how many LtiAgsLineItem rows this maps out to, because while I get that LtiAgsLineItem is M:1 to LtiConfiguration, I don't really understand what it represents that would be M:1–it looks almost like a 1:1 model attached via M:1 relation.

As an anecdotal reference point, We've had column changes in the past that averaged around 1 million rows/minute to run the migration, during which that table was locked entirely. So a migration against LtiAgsLineItem may be edging into potentially disruptive territory. @MichaelRoytman: Wanted to make sure you saw this PR, in case that's something that is a concern for you folks–I have no idea how big these tables are for you folks these days.

My longer term concern is that this seems to just be broken in that it's not actually storing the right data at all. So while it might be useful for short-term compatibility to widen the field so we can continue to store the broken things consistently, we should at some point fix this and store the right thing.

In addition, it looks like these models are missing important constraints. Whether it's LtiConfiguration or LtiArgsLineItem, something is logically 1:1 with LtiConsumerBlock and should therefore have a unique constraint on the UsageKey for that item. That doesn't seem to exist anywhere.

Addressing either of the longer term concerns will involve a potentially disruptive data migration: dealing with historical duplicates from race conditions, fixing code that assumes resource_id == resource_link_id, etc.

@MichaelRoytman
Copy link
Contributor

Thanks for the ping, @ormsbee! I'll review and follow up by EOD this Wednesday 01/15/2025.

@MichaelRoytman
Copy link
Contributor

Thanks again for the ping. Here are my thoughts. I apologize if any of this you already know. I'm also writing this out as review for myself.

TL;DR

  1. I have no concerns about the migration.
  2. I don't think we should establish a ForeignKey from LtiAgsLineItem to LtiConfiguration. However, we could establish a uniqueness constraint on LtiConfiguration.location.
  3. We should not set LtiAgsLineItem.resource_id on the linked line. It should remain blank.

Migration Concerns

I’m not concerned about the migration, because the size of the LtiAgsLineItem table is not of a concerning size. I don’t think we can expect any database level operations on other tables in this application, right? I’m having trouble generating the associated SQL.

Relationship between LtiConfiguration and LtiAgsLineItem

In terms of the cardinality of the relationship between LtiConfiguration and LtiAgsLineItem, I don't agree that it's 1:M.

According to the LTI 1.3 AGS specification, the following is true.

A line item is usually a column in the tool platform’s gradebook; it is able to hold the results associated with a specific activity for a set of users. The activity is expected to be associated with a single LTI context within the platform, so there is a one-to-many relationship between a context and its line items.

However, "activity" is undefined, and it seems to be a loose term. An LtiAgsLineItem doesn't even necessarily have to be associated with any resource link. See Overview. In our implementation, a resource link corresponds to an LTI component which is represented by an instance of the LtiConfiguration model. For that reason, I don't think we actually want to enforce any invariant on the relationship between LtiAgsLineItem and LtiConfiguration. I know this contradicts the earlier quoted paragraph, but that's how I understand it, because lineItems can be created without an association to any resource link (i.e. LtiConfiguration) in the "programmatic" AGS mode.

I'm going to ignore the mention of gradebook in the quote above, because the edX platform does not properly implement LTI 1.3 AGS in terms of the "gradebook", and that's a separate issue.

That being said, what is 1:1 to LtiConsumerXBlock is LtiConfiguration. In our implementation of LTI, each placement of an LTI link corresponds to a single component which corresponds to a single instance of the LtiConfiguration model, so we could implement a uniqueness constraint on the location field. But because we don't have such constraints, as you pointed out, I do see cases of this constraint being violated. There must be/have been some edge cases that allow/allowed this; it's hard to tell if this was already patched because there is no created or modified fields on the model.

resource_link_id versus resource_id

I agree that resource_link_id and resource_id refer to different things and that they should not be the same value.

resource_link_id

In LTI, an LTI link launches a learner to an LTI tool. A given LTI link may link to a resource in the tool. Multiple LTI links to the same resource can be placed throughout the LTI context, where "context" generally refers to a course (or any other collection of resources that is associated with a common set of users and roles). When more than one LTI link is placed in a context, the platform must differentiate between them with a platform-unique resource_link_id. This is a value the platform creates and provides to the tool. The location field (i.e. the UsageKey) is a suitable option for this because it's unique across the platform. resource_link_id is a part of the core LTI 1.3 specification.

See LTI Links.

resource_id

resource_id is introduced by the LTI 1.3 AGS specification. A resource_id refers to the tool resource, and it's a value the tool creates and provides to the platform in certain cases. It's different from a resource_link_id in that resource_link_ids refer to an LTI link to a tool resource, as defined by the platform, whereas resource_id refers to the resource itself, as defined by the tool.

A resource_id isn't something that the platform generates. A resource_id is optionally provided to the platform by the tool when creating a lineItem via the line item service. See here. The resource_id isn't used by the LMS because it's a value only the tool cares about.

The line item service is only used when the AGS mode is set to "programmatic", which gives tools the responsibility to create lineItems. In the "declarative" AGS mode, which is the default, the platform is responsible for creating lineItems, which is what's happening here.

In the latter case, resource_id should be None, because the the resource_id is a tool-specific value, and the tool is not involved in the creation of lineItems under the "declarative" AGS mode.

Theoretically, the resource_link_id and resource_link could be the same, I suppose, but that would be extremely unusual. What is more the issue is that resource_id should not have any value at all in the "declarative" AGS mode. I believe this line should not be setting any resource_id, and it should be blank.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants