-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new check for indexing with atomic=True #251
Add new check for indexing with atomic=True #251
Conversation
@David-Wobrock - is this repository still maintained and/or open to contributions? Do you have thoughts about this proposed check? |
Thanks for the review, @fevral13 ! I tried to leave some comments explaining the counterintuitive nature of why even a fast |
Thanks for the PR! I'll try to have a look in the next days 😇 I need to find some time to give some love to this repo again! Many requests and improvements that can be made 💪 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great contribution @DavidCain Thanks a lot! 💯
I have two little comments to address: handle concurrently
index creation, and change the check to a warning. With these changes, we should be good to merge it IMO 👍
@@ -23,7 +23,7 @@ def assertValidSql(self, sql, allow_warnings=False): | |||
errors, _, warnings = self.analyse_sql(sql) | |||
self.assertEqual(0, len(errors), f"Found errors in sql: {errors}") | |||
if not allow_warnings: | |||
self.assertEqual(0, len(warnings), f"Found warnings in sql: {errors}") | |||
self.assertEqual(0, len(warnings), f"Found warnings in sql: {warnings}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch 👍 Thanks!
# https://www.postgresql.org/docs/current/sql-altertable.html | ||
# (Most common example is `ALTER TABLE... ADD COLUMN`, then later `CREATE INDEX`) | ||
if sql.startswith("ALTER TABLE"): | ||
return has_create_index(sql_statements[i + 1 :]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue re-using this function here, is that we are not detecting some cases you are describing. For instance if we had this unit test:
diff --git a/tests/unit/test_sql_analyser.py b/tests/unit/test_sql_analyser.py
index 01a2bb3..7553796 100644
--- a/tests/unit/test_sql_analyser.py
+++ b/tests/unit/test_sql_analyser.py
@@ -320,6 +320,15 @@ class PostgresqlAnalyserTestCase(SqlAnalyserTestCase):
]
self.assertBackwardIncompatibleSql(sql, code="CREATE_INDEX_EXCLUSIVE")
+ def test_create_concurrently_index_exclusive(self):
+ sql = [
+ "BEGIN;",
+ 'ALTER TABLE "users" ADD COLUMN "email" varchar(254) NULL;',
+ 'CREATE CONCURRENTLY INDEX "user_email" ON "users" ("email");',
+ "COMMIT;",
+ ]
+ self.assertBackwardIncompatibleSql(sql, code="CREATE_INDEX_EXCLUSIVE")
+
def test_create_index_exclusive_no_lock(self):
sql = [
'ALTER TABLE "users" ADD COLUMN "email" varchar(254) NULL;',
It does not trigger an error:
Found 19 test(s).
System check identified no issues (0 silenced).
..F................
======================================================================
FAIL: test_create_concurrently_index_exclusive (tests.unit.test_sql_analyser.PostgresqlAnalyserTestCase.test_create_concurrently_index_exclusive)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./django-migration-linter/tests/unit/test_sql_analyser.py", line 330, in test_create_concurrently_index_exclusive
self.assertBackwardIncompatibleSql(sql, code="CREATE_INDEX_EXCLUSIVE")
File "./django-migration-linter/tests/unit/test_sql_analyser.py", line 30, in assertBackwardIncompatibleSql
self.assertNotEqual(0, len(errors), "Found no errors in sql")
AssertionError: 0 == 0 : Found no errors in sql
----------------------------------------------------------------------
Ran 19 tests in 0.003s
FAILED (failures=1)
I guess we can modify a bit the check to also take that into account :) Since, as you said, creating an index concurrently can be a long running operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great callout and this is just oversight on my part. I'll make sure that CREATE INDEX CONCURRENTLY
is flagged as well.
fn=has_create_index_in_transaction, | ||
message="CREATE INDEX prolongs transaction, delaying lock release", | ||
mode=CheckMode.TRANSACTION, | ||
type=CheckType.ERROR, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the main point I see with this contribution.
The check makes total sense, and thanks for adding it 👍 We definitely want to merge it :)
1/ It's not a silver bullet. I don't think by default we should raise an error during migration linting when doing this. To me, we should warn the user instead, that it could create a long-running locking transaction. Basically, stay in the same vein as the other indexing operations which also lock the table. It's the same risk => we know it will lock the table, but we can only warn that it is potentially a long-running lock, since we don't know about the real size and load of the table.
In short:
type=CheckType.ERROR, | |
type=CheckType.WARNING, |
😁
2/ It feels like a subset of CREATE_INDEX
, since it checks CREATE_INDEX
+ being in a transaction + having an ALTER TABLE
operation. I feel like there is room for something more generic, and not only when linked to index creation.
Basically, any combination of "transaction + exclusive lock operation + potential long-running operation" should trigger such a warning.
I'm completely fine starting with this specific check, we don't have to change the approach, but it's more to keep in mind that we could enhance this check to something more generic 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
You're absolutely right to point out that this could have some false positives (i.e. indexing a small table will generally be fast & thus there's little cause for concern). I opted for an
ERROR
because I can't really think of many situations in which you would ever need to be creating an index within a transaction. I understand your rationale, though -- we should probably only call it an error if it will always be the source of problems.I'll happily make this a
WARNING
and reach for--warnings-as-errors
to elevate the warning for our use cases. -
Also totally agreed that this could be made generic for detecting any long-running queries that prolong lock release. I think the reason I targeted
CREATE INDEX
is that:- index creation is a very common migration operation
- you can pretty much always move index creation out of the transaction. Other schema-modifying queries may not share that easy resolution.
I would also be very happy to start with this type of check, and replace it entirely with something more generic at a later time!
Thanks for the review & the thorough explanations! I'll be applying both your suggested changes, I think they'll help a ton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll happily make this a WARNING and reach for --warnings-as-errors to elevate the warning for our use cases.
That would be great! Thanks a lot 🙇
You can always use --warnings-as-errors CREATE_INDEX_EXCLUSIVE
to mark only this one warning as an error. I think that should work 🤞
I would also be very happy to start with this type of check, and replace it entirely with something more generic at a later time!
Entirely agreed 👍 It was just an opening for the future, I'm completely fine with the current version, handling CREATE INDEX [CONCURRENTLY]
.
An additional nice touch would be to add a mention of this change in the Once you did all the changes, we'll be able to merge this PR - and we'll release 5.0.0 of the linter 😄 It's been a long time since the last release. |
Creating indices concurrently (provided that index creation *completes*) is generally a much safer way to build an index (since it allows reads *and* writes to the table during index creation). However, due to the inherent tradeoffs of concurrent index creation, the operation will almost always take *longer* than a traditional (blocking) index creation. It's actually much *more* important to lint against concurrent index creation when an exclusive lock is open & held. Test is from a comment by @David-Wobrock (I corrected syntax, though): 3YOURMIND#251 (comment)
Thanks again for the feedback! I believe the outstanding requests have been addressed (it's a Thanks again for a really pleasant first PR experience. 😄 I'd be delighted to make any other changes you as see fit. |
We should log the *warnings* found, not errors!
Briefly, this check helps prevent a long-running index creation from preventing all reads and writes to a table. This supplements an existing check ================================== There is an existing check, `CREATE_INDEX` which warns about creating indices without the `CONCURRENTLY` argument. It's a helpful check! However, there are some valid reasons to prefer non-concurrent index creation. For example, concurrent index creation may struggle to complete when applied to write-heavy tables (since Postgres will have to make repeated passes on the table to accommodate the new reads since the last round). The danger of `EXCLUSIVE` locks =============================== If you *do* want to create indices nonconcurrently, there is still a **huge footgun** to be aware of -- locks which are held in a transaction are not released until the transaction closes. Dangerous Django defaults ========================= By default, any time you add a new Django field with `db_index=True` or `unique=True`, the automatically generated `Migration` has: - implicit setting of `atomic=True` (transactions are on) - an `ALTER TABLE...ADD COLUMN` call (exclusively locks the table) - a `CREATE INDEX` invocation (*hopefully* quick, due to nullable columns) It's even more dangerous if you combine an `ALTER TABLE` call with `CREATE INDEX` of an existing, non-nullable column. Favor `atomic=False` if you must index ====================================== Index-creating migrations should have few `operations`. Since index creation does not modify row-level data (and only requires locking against writes), there's no reason to use transactions. It's prudent to bypass the default `atomic=True` for index creation.
While it's always a dangerous *pattern* to create an index while a transaction is open & holds an exclusive lock, in practice it's not always a big deal (for example, indexing a table with very few rows can be fast). Accordingly, downgrade to a warning.
5.0.0 will be released soon, be sure that we document this new check!
Creating indices concurrently (provided that index creation *completes*) is generally a much safer way to build an index (since it allows reads *and* writes to the table during index creation). However, due to the inherent tradeoffs of concurrent index creation, the operation will almost always take *longer* than a traditional (blocking) index creation. It's actually much *more* important to lint against concurrent index creation when an exclusive lock is open & held. Test is from a comment by @David-Wobrock (I corrected syntax, though): 3YOURMIND#251 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @DavidCain 💯
Thanks a lot for the contribution 🙇
114bcf7
to
1f339d1
Compare
Codecov ReportPatch coverage:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## main #251 +/- ##
==========================================
+ Coverage 93.97% 94.10% +0.12%
==========================================
Files 78 81 +3
Lines 1992 2034 +42
==========================================
+ Hits 1872 1914 +42
Misses 120 120
☔ View full report in Codecov by Sentry. |
Creating indices concurrently (provided that index creation *completes*) is generally a much safer way to build an index (since it allows reads *and* writes to the table during index creation). However, due to the inherent tradeoffs of concurrent index creation, the operation will almost always take *longer* than a traditional (blocking) index creation. It's actually much *more* important to lint against concurrent index creation when an exclusive lock is open & held. Test is from a comment by @David-Wobrock (I corrected syntax, though): #251 (comment)
Briefly, this check helps prevent a long-running index creation from
preventing all reads and writes to a table.
This supplements an existing check
There is an existing check,
CREATE_INDEX
which warns about creatingindices without the
CONCURRENTLY
argument. It's a helpful check!However, there are some valid reasons to prefer non-concurrent index
creation. For example, concurrent index creation may struggle to
complete when applied to write-heavy tables (since Postgres will have to
make repeated passes on the table to accommodate the new reads since the
last round).
The danger of
EXCLUSIVE
locksIf you do want to create indices nonconcurrently, there is still a
huge footgun to be aware of -- locks which are held in a transaction
are not released until the transaction closes.
Dangerous Django defaults
By default, any time you add a new Django field with
db_index=True
orunique=True
, the automatically generatedMigration
has:atomic=True
(transactions are on)ALTER TABLE...ADD COLUMN
call (exclusively locks the table)CREATE INDEX
invocation (hopefully quick, due to nullablecolumns)
It's even more dangerous if you combine an
ALTER TABLE
call withCREATE INDEX
of an existing, non-nullable column.Favor
atomic=False
if you must indexIndex-creating migrations should have few
operations
. Since indexcreation does not modify row-level data (and only requires locking
against writes), there's no reason to use transactions.
It's prudent to bypass the default
atomic=True
for index creation.