Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing books from another Bookwyrm instance fails #3004

Closed
eldang opened this issue Sep 21, 2023 · 6 comments · Fixed by #3135
Closed

Importing books from another Bookwyrm instance fails #3004

eldang opened this issue Sep 21, 2023 · 6 comments · Fixed by #3135
Labels
bug Something isn't working

Comments

@eldang
Copy link

eldang commented Sep 21, 2023

Describe the bug
I'm running into a pair of issues importing a CSV exported from Bookwyrm: rows with a review or rating fail, and the books that get successfully imported don't seem to get added to my books.

To Reproduce
Steps to reproduce the behavior:

  1. Export from bookwyrm.social
  2. Try importing to outside.ofa.dog
  3. Any entry with a review or star rating will fail, and the import will hang at a percentage that corresponds to the share of rows in the export file that had neither a review nor a rating
  4. Remove all of those entries from the bookwyrm export CSV and retry
  5. The import succeeds, but none of the books actually get added to my shelves or profile.

Other things I've tried:

  • Breaking the export down into small chunks: even down to 25 rows in a file, I get the same behaviour pattern.
  • Unchecking the "import reviews" box in the import: it gets through the file, but books still don't get added to my shelves or profile.
  • Each of the different CSV format options in the dropdown: seems to get the same behaviour, though I haven't tested all permutations thoroughly.
  • Retrying failed books: none of them work on a retry.

Expected behavior
The books import with reviews, and get shelved the same as in the source instance.

Instance
outside.ofa.dog

Additional context
I am aware that #2980 may make this issue redundant by providing an alternative way to migrate books between instances. I can't tell how close that work is to completion, which should probably determine whether this bug is worth working on in parallel or not.


Desktop (please complete the following information):
- OS: Mac OS 13.5.2
- Browser Firefox
- Version 117.0.1

@eldang eldang added the bug Something isn't working label Sep 21, 2023
@futzle
Copy link

futzle commented Sep 21, 2023

Outside of a Dog is on 0.6.3 if that is significant.

@futzle
Copy link

futzle commented Sep 21, 2023

In the Outside of a Dog Imports page, the import is listed as:

  • Items = 4 (say)
  • Pending = 4
  • Successful items = 0
  • Failed items = 0

Yet on the Celery Status page, there are no pending tasks of any kind.

In the bookwyrm-worker logs, this error message around the time seems relevant:

django.db.utils.IntegrityError: null value in column "published_date" of relation "bookwyrm_status" violates not-null constraint

There's no mention of reviews or ratings. Maybe the presence of a review/rating messes with the column sequence and leads the Published Date to appear null?

@mouse-reeve
Copy link
Member

From the error, it sounds like the issue is that it isn't correctly assigning a date to the status that is created when it finds a review or rating

@futzle
Copy link

futzle commented Sep 22, 2023

Here's the whole log for that import:

Sep 21 21:29:43 outside celery[543]: [2023-09-21 21:29:43,034: INFO/MainProcess] Task bookwyrm.models.import_job.import_item_task[008eaac6-6ff8-4eeb-ba8d-7fb63b12eb1b] received
[...]
Sep 21 21:29:43 outside celery[825]: [2023-09-21 21:29:43,160: ERROR/ForkPoolWorker-1] Task bookwyrm.models.import_job.import_item_task[008eaac6-6ff8-4eeb-ba8d-7fb63b12eb1b] raised unexpected: IntegrityError('null value in column "published_date" of relation "bookwyrm_status" violates not-null constraint\nDETAIL:  Failing row contains (7252, null, 2023-09-21 21:29:43.157715+00, t, public, f, null, 1041, 2023-09-21 21:29:43.15774+00, null, null, f, null, null, null, null, null, t).\n')
Sep 21 21:29:43 outside celery[825]: Traceback (most recent call last):
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
Sep 21 21:29:43 outside celery[825]:     return self.cursor.execute(sql, params)
Sep 21 21:29:43 outside celery[825]: psycopg2.errors.NotNullViolation: null value in column "published_date" of relation "bookwyrm_status" violates not-null constraint
Sep 21 21:29:43 outside celery[825]: DETAIL:  Failing row contains (7252, null, 2023-09-21 21:29:43.157715+00, t, public, f, null, 1041, 2023-09-21 21:29:43.15774+00, null, null, f, null, null, null, null, null, t).
Sep 21 21:29:43 outside celery[825]: The above exception was the direct cause of the following exception:
Sep 21 21:29:43 outside celery[825]: Traceback (most recent call last):
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task
Sep 21 21:29:43 outside celery[825]:     R = retval = fun(*args, **kwargs)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/celery/app/trace.py", line 734, in __protected_call__
Sep 21 21:29:43 outside celery[825]:     return self.run(*args, **kwargs)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/bookwyrm/models/import_job.py", line 370, in import_item_task
Sep 21 21:29:43 outside celery[825]:     handle_imported_book(item)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/bookwyrm/models/import_job.py", line 461, in handle_imported_book
Sep 21 21:29:43 outside celery[825]:     review.save(software="bookwyrm", priority=LOW)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/bookwyrm/models/status.py", line 421, in save
Sep 21 21:29:43 outside celery[825]:     super().save(*args, **kwargs)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/bookwyrm/models/status.py", line 412, in save
Sep 21 21:29:43 outside celery[825]:     super().save(*args, **kwargs)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/bookwyrm/models/status.py", line 86, in save
Sep 21 21:29:43 outside celery[825]:     super().save(*args, **kwargs)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/bookwyrm/models/activitypub_mixin.py", line 210, in save
Sep 21 21:29:43 outside celery[825]:     super().save(*args, **kwargs)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/django/db/models/base.py", line 739, in save
Sep 21 21:29:43 outside celery[825]:     self.save_base(using=using, force_insert=force_insert,
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/model_utils/tracker.py", line 375, in inner
Sep 21 21:29:43 outside celery[825]:     return original(instance, *args, **kwargs)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/django/db/models/base.py", line 775, in save_base
Sep 21 21:29:43 outside celery[825]:     parent_inserted = self._save_parents(cls, using, update_fields)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/django/db/models/base.py", line 803, in _save_parents
Sep 21 21:29:43 outside celery[825]:     parent_inserted = self._save_parents(cls=parent, using=using, update_fields=update_fields)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/django/db/models/base.py", line 804, in _save_parents
Sep 21 21:29:43 outside celery[825]:     updated = self._save_table(
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/django/db/models/base.py", line 881, in _save_table
Sep 21 21:29:43 outside celery[825]:     results = self._do_insert(cls._base_manager, using, fields, returning_fields, raw)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/django/db/models/base.py", line 919, in _do_insert
Sep 21 21:29:43 outside celery[825]:     return manager._insert(
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/django/db/models/manager.py", line 85, in manager_method
Sep 21 21:29:43 outside celery[825]:     return getattr(self.get_queryset(), name)(*args, **kwargs)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/django/db/models/query.py", line 1270, in _insert
Sep 21 21:29:43 outside celery[825]:     return query.get_compiler(using=using).execute_sql(returning_fields)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/django/db/models/sql/compiler.py", line 1416, in execute_sql
Sep 21 21:29:43 outside celery[825]:     cursor.execute(sql, params)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 66, in execute
Sep 21 21:29:43 outside celery[825]:     return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 75, in _execute_with_wrappers
Sep 21 21:29:43 outside celery[825]:     return executor(sql, params, many, context)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
Sep 21 21:29:43 outside celery[825]:     return self.cursor.execute(sql, params)
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/django/db/utils.py", line 90, in __exit__
Sep 21 21:29:43 outside celery[825]:     raise dj_exc_value.with_traceback(traceback) from exc_value
Sep 21 21:29:43 outside celery[825]:   File "/opt/bookwyrm/venv/lib/python3.9/site-packages/django/db/backends/utils.py", line 84, in _execute
Sep 21 21:29:43 outside celery[825]:     return self.cursor.execute(sql, params)
Sep 21 21:29:43 outside celery[825]: django.db.utils.IntegrityError: null value in column "published_date" of relation "bookwyrm_status" violates not-null constraint
Sep 21 21:29:43 outside celery[825]: DETAIL:  Failing row contains (7252, null, 2023-09-21 21:29:43.157715+00, t, public, f, null, 1041, 2023-09-21 21:29:43.15774+00, null, null, f, null, null, null, null, null, t).

After this error, the task disappears from celery but is still listed as Pending in the Imports screen.

@hughrun
Copy link
Contributor

hughrun commented Nov 25, 2023

I can replicate this.

Problem number 1 is because the BookWyrm CSV export doesn't include any dates at all - which is not optimal to say the least. In import_item_task we have this:

published_date_guess = item.date_read or item.date_added

This eventually causes an exception because both values are None. We deal with missing dates when adding the book to a shelf further up the function by falling back to or timezone.now() so I think we need to do that but also fix the core problem of not having any dates in the export file to begin with.

Problem number 2 happens because we also don't include any shelf or readthrough information in the export. This means there is nothing to connect these books to your shelves.

In fairness to BookWyrm, "BookWyrm" is not actually a listed import source in the imports page. But in fairness to @eldang, it seems reasonable that one might be able to successfully import a CSV exported from the same application!

Regardless of the work we've nearly completed with #3054 (previously #2980) I think we still should offer a simple CSV import/export for books and reviews.

I'll take a look at this today.

@1110sillabo
Copy link

I tried to move my bookwyrm account across instances. I resorted to a csv book export of the books in the old instances as @eldang did. Then, I imported the CSV into the new istance using the goodread option.
It did not crash because of the dates thanks to the fix of #3135 by @hughrun . But still, I could not import had the start and finish dates of the imported books. :(

So I cheated. I took a look at the row_mappings_guesses of the goodreads_import and then edited the headers of the exported CSV I got from bookwyrm old instnace. I reimported the bookwyrm-produced-csv-I-mocked-as-goodreads and I had my dates back. :)

I guess I could have done something similar adding a bookwyrm_import.py that overrides the row_mappings_guesses to match with the one Bookwyrm expects, i.e. those that are already there after using te bookwyrm exporter.

Is this enough to have a "simple CSV import/export for books"? I am afraid there is more involved in importing a book than what I could grasp.

Nonetheless, I feel such feature is useful. Despite the huge work in #3054 a user does not always have the chance to use the User Account import/export flow.
What if exporting users is not an available option? Site setting is user_exports_enabled = models.BooleanField(default=False) after #3226 ... How many instances will actually have that on? What if the instance a user wants to migrate to has no user-import route?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants