Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix mango tests using custom db name #5341

Merged
merged 3 commits into from
Nov 26, 2024
Merged

Conversation

jiahuili430
Copy link
Contributor

@jiahuili430 jiahuili430 commented Nov 22, 2024

Overview

When running mango tests, sometimes get org.apache.lucene.store.NoSuchDirectoryException error. This is because each test in the same class used the same database name, and in the setUp function always delete and recreate it, which caused a race condition. Using custom database names should solve it.

  • Makefile: Add -F, --failfast to stop mango tests early on failures
  • Add setUp() and teardown() to create/delete db for each test to
    avoid org.apache.lucene.store.NoSuchDirectory Exception errors.
  • Replace self.db.recreate() with super().setUp(db_per_test=True)

Testing recommendations

make mango-test

Related Issues or Pull Requests

Checklist

  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • Documentation changes were made in the src/docs folder
  • Documentation changes were backported (separated PR) to affected branches

@@ -189,6 +189,7 @@ def test_uses_partial_index_with_non_indexable_selector(self):
@unittest.skipUnless(mango.has_text_service(), "requires text service")
class IndexSelectorText(mango.DbPerClass):
def setUp(self):
self.db = mango.Database(f"{mango.random_db_name()}_{str(self).split()[0]}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many tests files which use self.db.recreate() did you consider modifying that function?

Copy link
Contributor Author

@jiahuili430 jiahuili430 Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using the same db name, 0.13 is the shortest time to avoid 500 locally.
This approach also increased the overall Mango test time to 60.730s, compared to 49.936s using custom database name.

def recreate(self):
-    NUM_TRIES = 10
+    NUM_TRIES = 3

-    for k in range(NUM_TRIES):
+    for k in range(1, NUM_TRIES):
         r = self.sess.get(self.url)
         if r.status_code == 200:
             db_info = r.json()
             docs = db_info["doc_count"] + db_info["doc_del_count"]
                 if docs == 0:
                     # db exists and it is empty -- exit condition is met
                     return
                 self.delete()
+        time.sleep(k * 0.13)
         self.create()
-        time.sleep(k * 0.1)
     raise Exception(
         "Failed to recreate the database after {} tries".format(NUM_TRIES)
     )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All time based approaches are flowed. They don't solve the problem. They introduce flakiness. The options I see:

  1. restructure all tests so they are not using recreate. They would always start from scratch and create fresh databaase with uniq name.
  2. update dreyfus_index_manager.erl to handle cleanup differently
  3. modify clouseau so it handle rapid re-creation better
  4. change recreate function so it uses new name on each retry

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for option 4.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option 4 seems like the simplest approach, although maybe consider a name change since it wouldn't be "recreating" the db if it has a different name. Maybe just create_random or something like that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Virtually, that is still about recreating the database, in my opinion. But I agree that create_random would capture better the adjusted semantics for option 4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all for the suggestions and comments.

  • recreate() is used in classmethod setUpClass(), such as UserDocsTests, NumStringTests, etc., which is difficult to modify.
  • recreat_random(), simple solution, but deleting the old database becomes a problem.
  • So finally decided to use setUp() and tearDown(). Use a boolean variable db_per_test to control whether to use class db or create/delete a custom db for each test.

@jiahuili430 jiahuili430 force-pushed the mango-tests-use-custom-db branch 2 times, most recently from 77f5cc0 to 976b206 Compare November 26, 2024 15:22

def recreate(self):
NUM_TRIES = 10

for k in range(NUM_TRIES):
Copy link
Contributor Author

@jiahuili430 jiahuili430 Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried many times, k is only equal to 0 and 1, so I guess maybe we can revert them back.

When running mango tests, sometimes get
`org.apache.lucene.store.NoSuchDirectoryException` error. This is because
each test in the same class used the same database name, and in the setUp
function always delete and recreate it, which caused a race condition.

Adding `setUp()` and `teadown()` to create/delete the db for each test
should solve the issue.
Copy link
Contributor

@iilyak iilyak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I run the mango test many times and it is definitely makes mango tests less flaky.

❯ make mango-test
==> snappy (compile)
==> ibrowse (compile)
==> jiffy (compile)
==> mochiweb (compile)
==> meck (compile)
==> recon (compile)
==> proper (compile)
==> couch_epi (compile)
==> config (compile)
==> couch_log (compile)
WARN:  Missing plugins: [pc]
==> b64url (compile)
WARN:  Missing plugins: [pc]
==> exxhash (compile)
==> ets_lru (compile)
==> couch_quickjs (compile)
make[1]: `libquickjs.lto.a' is up to date.
make[1]: `qjsc' is up to date.
==> chttpd (compile)
==> couch (compile)
==> couch_event (compile)
==> mem3 (compile)
==> couch_index (compile)
==> couch_mrview (compile)
==> couch_replicator (compile)
==> couch_pse_tests (compile)
==> couch_stats (compile)
==> couch_peruser (compile)
==> couch_tests (compile)
==> couch_dist (compile)
==> custodian (compile)
==> ddoc_cache (compile)
==> dreyfus (compile)
==> nouveau (compile)
==> fabric (compile)
==> global_changes (compile)
==> ioq (compile)
==> jwtf (compile)
==> ken (compile)
==> mango (compile)
==> rexi (compile)
==> setup (compile)
==> smoosh (compile)
==> weatherreport (compile)
==> couch_prometheus (compile)
==> couch_scanner (compile)
==> rel (compile)
==> couchdb (compile)
==> weatherreport (escriptize)
[ * ] Setup log dir: /Users/iilyak/Code/couchdb/dev/logs ... ok
[ * ] Ensure CouchDB is built ... ok
[ * ] Ensure Erlang boot script exists ... ok
[ * ] Prepare configuration files ... ok
[ * ] Start node node1 ... ok
[ * ] Check node at http://127.0.0.1:15984/ ... ok
[ * ] Running cluster setup ... ok
[ * ] Exec command  COUCH_USER=adm COUCH_PASS=pass src/mango/.venv/bin/nose2 -F -s src/mango/test -c src/mango/unittest.cfg  ... ......................................................................................................................................................................................................................................s..............................................................................................................................................................
----------------------------------------------------------------------
Ran 389 tests in 41.966s

OK (skipped=1)

@jiahuili430 jiahuili430 merged commit 7fe38c0 into main Nov 26, 2024
23 checks passed
@jiahuili430 jiahuili430 deleted the mango-tests-use-custom-db branch November 26, 2024 20:18
@@ -339,7 +339,7 @@ mango-test: devclean all
--admin=adm:pass \
--no-eval "\
COUCH_USER=adm COUCH_PASS=pass \
src/mango/.venv/bin/nose2 -s src/mango/test -c src/mango/unittest.cfg $(MANGO_TEST_OPTS)"
src/mango/.venv/bin/nose2 -F -s src/mango/test -c src/mango/unittest.cfg $(MANGO_TEST_OPTS)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does -F buy for us? The Mango tests are quick to run, and sometimes perhaps it is good to see if there are other errors besides the one that failed the whole run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While working on this, I even changed unittest.cfg to run 16-index-selectors-test.py only. Because I keep getting 500 errors due to this test file. Different combinations of test cases will result in the same failure message but fail on different test cases. 0.85 sec vs 50 sec to run the entire test suite.

-F, --failfast allows us to identify the issue quickly, speed up the feedback loop, and make debugging easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants