-
Notifications
You must be signed in to change notification settings - Fork 0
Script Descriptions
This Wiki page is used to describe all the scripts being used in the caas-aspace-repository. All descriptions should include the following info:
- A short description of the script's purpose
- A link to any testing script(s) or automated testing for this specific script
- A list of requirements to get the script to run. This can include:
- Packages/Libraries (ex. ArchivesSnake, loguru), including links to their documentation
- Folders/Directories where files will be written to/read from (ex. logs, test_data)
- Any special requirements for credentials (ex. secrets.py file, environment variables)
Optional information can include, but is not limited to:
- Arguments to run the script (ex. files being passed to the script, --help, -dR)
- Additional context for the script's purpose or design
- Screenshots of the script
A compilation of useful functions shared across Python scripts. Included are:
- ASpaceAPI(aspace_api, aspace_un, aspace_pw) class - handles common functions when working with the API. Connects to the ASnake client upon instantiation.
- get_repo_info(self) - Gets all the repository information for an ArchivesSpace instance in a list and assigns it to self.repo_info
- get_objects(self, repository_uri, record_type, parameters=('all_ids', True)) - Intakes a repository URI and returns all the digital object IDs as a list for that repository
- get_object(self, record_type, object_id, repo_uri='') - Get and return a digital object JSON metadata from its URI
- update_object(self, object_uri, updated_json) - Posts the updated JSON metadata for the given object_uri to ArchivesSpace
- ASpaceDatabase(as_db_un, as_db_pw, as_db_host, as_db_name, as_db_port) class- Handles the connection to and data retrieval from the ArchivesSpace database
- connect_db(self) - Connects to the ArchivesSpace test database with credentials provided in local secrets.py file
- query_database(self, statement) - Runs a query on the database
- close_connection(self) - Closes the cursor and connection to the ArchivesSpace database
- client_login(as_api, as_un, as_pw) function - Login to the ArchivesSnake client and return client
- read_csv(csv_file) function - reads a csv file and returns csv_dict, a list of values in rows in the CSV file
- check_url(url) function - inputs a URL to check and if it returns 200 status code, returns True, otherwise will log and print the error status code
- record_error(message, status_input) function - Prints and logs an error message and the code/parameters causing the error
- Packages:
- ArchivesSpace username, password, API URL in a secrets.py file
- logs directory for storing local log files
This script iterates through all the digital objects in every repository in SI's ArchivesSpace instance - except Test, Training, and NMAH-AF, parses them for any data in the following fields: agents, dates, extents, languages, notes, and subjects, and then deletes any data within those fields except digitized date and uploads the updated digital object back to ArchivesSpace
- Packages:
- ArchivesSpace username, password, API URL in a secrets.py file
- logs directory for storing local log files
- test_data/dometadata_testdata.py file, with the following variables:
-
test_record_type = string
- the object endpoint ArchivesSpace uses; ex. 'digital_objects' -
test_object_id = int
- the number of the digital object you want to use for testing (must have metadata in above-mentioned fields) -
test_object_repo_uri = string
- the repository URI where the test digital object is; ex. '/repositories/12' -
test_object_user_identifier = string
- the identifier that user's input in the digital_object_id field for testing; ex. 'NMAI.AC.066.ref21.1' -
test_digital_object_dates = dict
- JSON data from a digital object that contains multiple date subrecords -
test_digital_object_dates_deleted = dict
JSON data from the same digital object as above but without any data in the dates field (i.e.dates = []
)
-
This script takes CSV files listing specific collections from EEPA repository, extracts the resource URIs listed in each CSV, uses the ArchivesSpace API to grab the Abstract or Scope and Contents note from the JSON data, and writes the note to the provided CSV in a new column.
- CSV input(s) containing the following columns: ead_id,title,dates,publish,level,extents,uri
- Note: This script originally had 3 CSVs to iterate through, but any number of CSVs should work
- ArchivesSnake
- ArchivesSpace username, password, API URL in a secrets.py file
- logs directory for storing local log files
- test_data/eepacameroon_testdata.py file, with 3 variables:
-
test_abstract_only_json = dict
- JSON data from a resource that contains only an abstract note -
test_scope_only_json = dict
- JSON data from a resource that contains only a scope note -
test_no_abstract_scope_json = dict
- JSON data from a resource that contains no abstract or scope note - Note for the above variables and values: these are for testing. You can get these from your API by running a
client.get
request for resources using their URI and the .json() function to return data in JSON format.
-
This script reads a CSV containing all the resource and accession identifiers in ArchivesSpace and prints a dictionary containing all the unique, non-alphanumeric characters in the identifiers and their counts
- CSV input containing the following columns: id, repo_id, identifier, title, ead_id, recordType
- identifier should be structured like so: "['id_0','id_1','id_2','id_3']"
This script takes a CSV of resources and archival objects from every repository with "Missing Title" titles in note lists and removes the title from the metadata, then posts the update to ArchivesSpace
- Packages:
- ArchivesSpace username, password, API URL in a secrets.py file
- logs directory for storing local log files
- test_data/missingtitles_testdata.py file, with the following:
-
test_object_metadata = {ArchivesSpace resource or archival object metadata}
for testing. Can get this from your API by using aclient.get
request for a resource or archival object that has a "Missing Title" in one of its notes with a list. -
test_notes = [ArchivesSpace resource or archival object notes list]
for testing. Can get this from your API using aclient.get
request for a resource or archival object that has a "Missing Title" in one of its notes with a list and taking all the data found in"notes" = [list of notes]
-
- test_data/MissingTitles_BeGone.csv - a csv file containing the URIs of the objects that have "Missing Title" in their
notes. URIs should be in the 4th spot (
row[3]
)
This script collects all users from ArchivesSpace, parses their usernames to separate any starting with 'z-' and ending with '-expired-' into just the text in-between, then updates the username in ArchivesSpace with the new username
- Packages:
- ArchivesSpace username, password, API URL in a secrets.py file
- logs directory for storing local log files
- test_data/znames_testdata.py file, with
viewer_user = {ArchivesSpace viewer user metadata}
for testing. Can get this from your API by getting aclient.get
request for the `viewer' user in your ArchivesSpace instance.
This script creates new subjects from a provided CSV. It is currently customized to support the needs of NMAI, but this hardcoded NMAI metadata can be changed/updated in the future.
- ArchivesSnake
- Environment-based ArchivesSpace username, password, API URL in a .env.{environment} file:
- On your local:
- Create a new
.env.dev
file containing local credentials export ENV=dev
- Run script
- Create a new
- On test:
- Create a new
.env.test
file containing test credentials export ENV=test
- Run script
- Create a new
- On prod:
- Create a new
.env.prod
file containing prod credentials export ENV=prod
- CAREFULLY run script
- Create a new
- On your local:
- logs directory for storing local log files
Unittests for mergesubjects_tests.py
- test_data/subjects_testdata.py file, containing the following:
-
test_merge_subject_destination = {JSON representation of an existing subject that will survive the merge}
for testing.
Ifnewsubjects_tests.py
has been run previously, you can use one of the subjects created by that test. -
test_merge_subject_candidate = {JSON representation of an existing subject that will be removed during the merge}
for testing.
Ifnewsubjects_tests.py
has been run previously, you can use one of the subjects created by that test.
-
- test_data/mergesubjects_testdata.csv - a csv file of subjects to be merged, containing:
- aspace_subject_id - id of the merge destination/subject to be retained. If newsubjects_tests.py previously run, this can be one of the subjects created by those tests.
- title - title of the merge destination/subject to be retained. The title must match the existing subject with the above id.
- aspace_subject_id2 - id of the merge candidate/subject to be removed. If newsubjects_tests.py previously run, this can be one of the subjects created by those tests.
- Merge into - title of the merge candidate/subject to be removed. The title must match the existing subject with the above id.
This script creates new subjects from a provided CSV. It is currently customized to support the needs of NMAI, but this hardcoded NMAI metadata can be changed/updated in the future.
- ArchivesSnake
- Environment-based ArchivesSpace username, password, API URL in a .env.{environment} file:
- On your local:
- Create a new
.env.dev
file containing local credentials export ENV=dev
- Run script
- Create a new
- On test:
- Create a new
.env.test
file containing test credentials export ENV=test
- Run script
- Create a new
- On prod:
- Create a new
.env.prod
file containing prod credentials export ENV=prod
- CAREFULLY run script
- Create a new
- On your local:
- logs directory for storing local log files
Unittests for newsubjects_tests.py
- test_data/subjects_testdata.py file, containing the following:
-
test_new_subject_metadata = {JSON representation of a new subject}
for testing. -
duplicate_new_subject = test_new_subject_metadata
ensures we can count onduplicate_new_subject
to produce a not unique error during testing.
-
- test_data/newsubjects_testdata.csv - a csv file of new subjects to be created, containing:
- new_title
- new_scope_note
- new_EMu_ID
This script takes a CSV file containing the URIs of locations to be updated - with at least one of the headers labeled URI - and adds a 'owner repo' = {'ref': 'repository/<repo_numer>'} key-value to the location JSON retrieved from the API, then posts the updated JSON to ArchivesSpace
- utilities.py
- Packages:
- ArchivesSpace username, password, API URL in a .env.dev, .env.test, and .env.prod file
- logs directory for storing local log files
- test_data directory for accessing the CSV file and storing jsonlines output file
This script updates existing ArchivesSpace subjects from a provided CSV. It is currently customized to support the needs of NMAI, but can be changed/updated in the future.
- ArchivesSnake
- Environment-based ArchivesSpace username, password, API URL in a .env.{environment} file:
- On your local:
- Create a new
.env.dev
file containing local credentials export ENV=dev
- Run script
- Create a new
- On test:
- Create a new
.env.test
file containing test credentials export ENV=test
- Run script
- Create a new
- On prod:
- Create a new
.env.prod
file containing prod credentials export ENV=prod
- CAREFULLY run script
- Create a new
- On your local:
- logs directory for storing local log files
Unittests for updatesubjects_tests.py
- test_data/subjects_testdata.py file, containing the following:
-
test_update_subject_metadata = {JSON representation of an existing subject}
for testing. Ifnewsubjects_tests.py
has been run previously, you can use one of the subjects created by that test.
-
- test_data/newsubjects_testdata.csv - a csv file of changes to be made to an existing subject, containing:
- aspace_subject_id - id of the subject to update, this can match that in test_data/subjects_testdata.py
- new_title
- new_scope_note
- new_EMu_ID
Retrieves all agent_persons that are linked to in the CFCH repository. Updating this script to another repository is possible by changing the ao.repo_id code to the desired repository.
There are no tests with this script. Need to research how to test SQL queries or if it's necessary. This query does not modify any data, so testing may not be necessary. I did run it against test before running against prod.
- Credentials to the ASpace Test and Prod database