d-portal database dump #393
Replies: 8 comments 2 replies
-
Update: |
Beta Was this translation helpful? Give feedback.
-
@wakibi @akmiller01 what do you think? |
Beta Was this translation helpful? Give feedback.
-
Unpacking the file right now. It's about 38 GB fully extracted, so we might not want to run a sync daily. Coincidentally we're both on Postgres 12.5, so there should be no compatibility issues. We would probably need to edit the SQL before importing, as ownership is given to a user
|
Beta Was this translation helpful? Give feedback.
-
There are options to import schema, and then table-by-table import data @wakibi https://thequantitative.medium.com/restoring-individual-tables-from-postgresql-pg-dump-using-pg-restore-options-ef3ce2b41ab6 Any other thoughts on how we could selectively import things into the proper schema/table names? Maybe we could even do string replacement in Bash e.g.
|
Beta Was this translation helpful? Give feedback.
-
Tested an import over the weekend. It took 1750 minutes to fully load the .sql file. Unfortunately it's not a binary pg_dump, the file is just raw SQL, so we cannot use pg_restore to selectively import bits and pieces. With a full import taking about 30 hours (which would also significantly slow down all queries against the database during those 30 hours), I don't think this particular file is feasible. If we want a d-portal mirror, I would suggest we reach back out to Kriss and Shi and ask for a streamlined export of just what we need. Otherwise it would be much more efficient to extract budget data ourselves. |
Beta Was this translation helpful? Give feedback.
-
With that performance it can’t be of much use to anyone. Worth a chat with Kris?
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Alex Miller <[email protected]>
Sent: Monday, January 25, 2021 3:12:48 PM
To: devinit/ddw-analyst-ui <[email protected]>
Cc: Bill Anderson <[email protected]>; Author <[email protected]>
Subject: Re: [devinit/ddw-analyst-ui] d-portal database dump (#393)
Tested an import over the weekend. It took 1750 minutes to fully load the .sql file. Unfortunately it's not a binary pg_dump, the file is just raw SQL, so we cannot use pg_restore to selectively import bits and pieces.
With a full import taking about 30 hours (which would also significantly slow down all queries against the database during those 30 hours), I don't think this particular file is feasible.
If we want a d-portal mirror, I would suggest we reach back out to Kriss and Shi and ask for a streamlined export of just what we need. Otherwise it would be much more efficient to extract budget data ourselves.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#393 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AA2UN2XA75KEIBXOBWKUCHTS3WC7BANCNFSM4WDYOSLA>.
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
|
Beta Was this translation helpful? Give feedback.
-
If the main reason we want this is for budget data, I think it might be better to work with an IATI tool that we know has long term support eg the registry. |
Beta Was this translation helpful? Give feedback.
-
Hello, Thanks for having a go at this and sharing your insights at the process. So as well as the file we've mentioned, there are also a couple more options, all updated nightly: pg_dump custom format so pg_restore can be used with its various options A zip of all the raw cached xml For us, a full restore of the database takes about 7 hours or 2 hours if you use multithreading with pg_restore.
Almost all of the time is spent rebuilding indexes so it's not the data itself that takes the time. We create a lot of indexes because, as a public facing site, we need the database queries to run as fast as possible. @akmiller01 I'm wondering if most of the 30 hours import you experienced was from slow disk access.
|
Beta Was this translation helpful? Give feedback.
-
Is this worth importing so that we have access to all non-transaction data?
From Shi on Discord
Would a database dump like this be useful to anyone? It's about 3GB in size and is the entire d-portal database which you have live access to via http://d-portal.org/dquery/ It can be imported locally and queried using the same sql code that is used on the web interface so you can run large queries on it without clogging up d-portal.
http://d-portal.org/db/dstore.sql.gz
This is currently a test file but could be updated nightly after each import. We've been getting lots of specific (and expensive) queries from people about getting data out of d-portal in many unique ways (like IATI/D-Portal#589) and thought this might be useful.
Let us know if it is and we can make it better o/
This is a PostgreSQL dump though and uses the json data type in postgresql. Technically, you could spin up a postgresql server and import it just as is and start querying.
@akmiller01 @dean-breed @wakibi @k8hughes
Beta Was this translation helpful? Give feedback.
All reactions