Skip to content

Latest commit

 

History

History
571 lines (435 loc) · 37.1 KB

README.md

File metadata and controls

571 lines (435 loc) · 37.1 KB

Tech Documentation

Centralized storage of the CWRC technical documentation covering all projects under the CWRC umbrella.

Table of Contents

Projects

CWRC Repository

The CWRC Repository circa 2010-2024 was based on the Islandora Foundation software stack. In 2024, the CWRC Repository migrated to the LEAF software stack based on Islandora 2.0.

Technical information on the CWRC infrastructure can be found in the following Git repository:

Public references to CWRC, LEAF, and the broader Islandora community:

CWRC-Writer

There are two main parts to a full CWRC-Writer installation, and each part runs more or less independently:

CWRC-Writer - The editor itself that runs as javascript in the web browser.

CWRC-Server - the complementary backend services that run on a server and provide document storage, XML validation, and entity lookup. These can be implemented however one would like - in any language or on any platform.

You might then ask: ‘If the server can be implemented willy-nilly, how does the CWRC-Writer know how to make calls to the server to get documents, etc?” Good question - the answer is that a bit more javascript code has to be written that knows how to interact with the given server. The CWRC-Writer expects that this javascript will be written as node.js modules (CommonJS) and that each module will export a specific API to which the CWRC-Writer knows how to make calls (e.g., loadDocument()). The server-specific modules take care of the plumbing: making calls to the server and setting returned data in the CWRC-Writer. The modules also provide their own dialogs since any interaction with a given server is likely different. We bundle these supporting modules together with the CWRC-Writer (which is itself setup as node.js module) using browserify, which creates a single javascript file that is then included in the index.html file.

Two CWRC-Writer installations that can be used as examples of how to put together a full CWRC-Writer installation are the Islandora-CWRC-Writer and the CWRC-GitWriter.

An overview of development practices for CWRC-Writer packages:

CWRC-Writer-Dev-Docs


Misc

REST APIs

General: Islandora REST

CWRC offers a the Islandora REST as a means to interact programatically with repository. This section summerizes the more detailed documentation available here: https://github.com/discoverygarden/islandora_rest/blob/7.x/README.md

Definitions:

  • PID: persistent identifier - FedoraCommons identifier for an object and part of the URI (commons.cwrc.ca/{PID} where {PID} is replaced with the object's PID

  • DSID: DataStream ID or FedoraCommons datastream identifier - ID of the location where content is stored

General strategy (used outside of Drupal e.g., on another server)
  • Step 1. Authentication against cwrc.ca (Islandora REST Drupal Module). A command-line curl example that can be translated into programming language of choice.
curl -X POST -i -H "Content-type: application/json" -c token.txt -b token.txt -X POST https://${SERVER_NAME}/rest/user/login -d '{ "username":"${USERNAME}","password":"${PASSWORD}"}'
  • Step 2. Lookup an object by PID (Persistent IDentifier)
curl -b token.txt -X GET https://${SERVER_NAME}/islandora/rest/v1/object/${PID}`

Notes:

  • More details on the authentication and the basics to setup a session via cookies (only required if code is running outside of Drupal (e.g., microservice or batch job) and items are not publicly visable) are located in the following section of Auth
  • Server-side setup (not applicable for client access): an internal Google Doc including the above details and some repository side setup is included at the following link (but shouldn't be needed in this context) link
general usage of the REST API - common calls
given a {PID}

// lookup properties of the object via the REST endpoint
https://${SERVER_NAME}/islandora/rest/v1/object/{PID}

the JSON response contains properties

// lookup content of a specified datastream via the REST endpoint
`https://${SERVER_NAME}/islandora/rest/v1/object/${PID}/datastream/${DSID}/?content=true`


Example REST calls:

// lookup properties of the object via the REST endpoint
`https://${SERVER_NAME}/islandora/rest/v1/object/orlando%3Ab4859cdd-8c58-46e9-bf2a-28bf8090fcbc`

// lookup content of a specified datastream via the REST endpoint
`https://${SERVER_NAME}/islandora/rest/v1/object/orlando%3Ab4859cdd-8c58-46e9-bf2a-28bf8090fcbc/datastream/CWRC/?content=true`

Downloading Content Pseudocode: given a collection PID, authenticate and download a specified datastream from all items in the collection
  1. Authenticate: creates a token that is passed in as part of subsequent API requests. The user must have view access to the collection and all objects within the collection plus Drupal permissions to use the Islandora REST API (https://${SERVER_NAME}/admin/people/permissions): View Objects & View Datastreams
curl -X POST -i -H "Content-type: application/json" -c token.txt -b token.txt -X POST https://${SERVER_NAME}/rest/user/login -d '{ "username":"${USERNAME}","password":"${PASSWORD}"}'
  1. define a set of objects (i.e., list of PIDs) to process, for example, lookup objects by ${COLLECTION_PID}. Note: a Solr query is an option to define the set objects; rows & start can be used for pagination of results plus the JSON response contains numFound. RELS_EXT_isMemberOfCollection_uri_mt allows defining the set by a CWRC collection. The fl parameter filters the response; remove to see the entire set of Solr fields. The parameter sort=fgs_label_s+asc will add a sort to the results. The fgs_label_s is a single valued Solr field (multivalued solr fields cannot with the sort). There is no Solr field for surname or name field that starts with surname -- one could be added.
curl -b token.txt -X GET "https://${SERVER_NAME}/islandora/rest/v1/solr/RELS_EXT_isMemberOfCollection_uri_mt:\"${COLLECTION_PID}\"?fl=PID&rows=999999&start=0&wt=json&sort=fgs_label_s+asc"
  1. foreach object (i.e., PID) in step 2, retrieve the associated metadata
curl -b token.txt -X GET https://${SERVER_NAME}/islandora/rest/v1/object/${PID}
  1. in the JSON metadata acquired during the previous step, use the models property to lookup the datastream ID (DSID) containing the XML -- below is the mapping
  • if cwrc:documentCModel then DSID = CWRC // event/entry
  • if cwrc:citationCModel then DSID = MODS // bibliography
  • if cwrc:person-entityCModel then DSID = PERSON // person entity
  • if cwrc:organization-entityCModel then DSID = ORGANIZATION // organization entity
  1. request the contents (XML) within the specified datastream ${DSID} attached to the object ${PID}

Note: a download can occur whether or not you or someone else holds a lock on the object (see update content pseudocode for info on the locking mechanism).

curl -b token.txt -X GET https://${SERVER_NAME}/islandora/rest/v1/object/${PID}/datastream/${DSID}?content=true

More information on the REST API used above can be found here: https://github.com/discoverygarden/islandora_rest/blob/7.x/README.md

Updating Content Pseudocode: given an object PID, lock the object, download the specified datastream, process, and then upload the content back to CWRC
  1. Determine how long you will need to process the objects and ask a CWRC admin to set the collection object locking time /islandora/object/${COLLECTION_ID}/manage/collection ==> Manage lock objects.

  2. Follow the steps in the Downloading content psuedocode example above to gather the object contents you wish to process

  3. Add to the above steps an API call to lock the object to prevent users from changing the item while you yourself are changing that item (overwriting others work since to original download) details

    • Check if lock exists
      curl -b token.txt -X GET https://${SERVER_NAME}/islandora/rest/v1/object/${PID}/lock
      
    • Aquire lock
      curl -b token.txt -X POST https://${SERVER_NAME}/islandora/rest/v1/object/${PID}/lock
      
  4. Process downloaded items

  5. Once ready to place items back into the repository, update object datastream in repository (see notes below)

curl -b token.txt -X POST -F "method=PUT" -F "file=@${SOURCE_FILE}" https://${SERVER_NAME}/islandora/rest/v1/object/${PID}/datastream/${DSID}
  1. Add workflow information describing the change details here : ask a CWRC admin for the workflow parameters to add via the activity parameter
curl -b token.txt -G -X GET "https://${SERVER_NAME}/islandora_workflow_rest/v1/add_workflow" -d PID=${PID} -d activity='{"category":"metadata_contribution","stamp":"orlando:ENH","status":"c","note":"entity"}'

Note: documentation regarding the REST API update: "... mock PUT / DELETE requests as POST requests by adding an additional form-data field method to inform the server which method was actually intended.  At the moment multi-part PUT requests such as the one required to modify an existing datastream's content and properties are not implemented you can mock these PUT requests using aforementioned mechanism.POST and include an additional form-data field method with the value PUT...."

Note: other approaches to prevent write collisions:

  • the metadata for a datastream (HTTP GET on the object DSID) contains a checksum. Saving the checksum at download time and then comparing to the checksum on the server at upload could act as a mechanism to verify the respository content has not been modified.

More information on the REST API used above can be found here: https://github.com/discoverygarden/islandora_rest/blob/7.x/README.md

Updating Content Pseudocode: given an object PID, lock the object, download the specified datastream, process, and then upload the content back to CWRC
  1. Determine how long you will need to process the objects and ask a CWRC admin to set the collection object locking time /islandora/object/${COLLECTION_ID}/manage/collection ==> Manage lock objects.

  2. Follow the steps in the Downloading content psuedocode example above to gather the object contents you wish to process

  3. Add to the above steps an API call to lock the object to prevent users from changing the item while you yourself are changing that item (overwriting others work since to original download) details

    • Check if lock exists
      curl -b token.txt -X GET https://${SERVER_NAME}/islandora/rest/v1/object/${PID}/lock
      
    • Aquire lock
      curl -b token.txt -X POST https://${SERVER_NAME}/islandora/rest/v1/object/${PID}/lock
      
  4. Process downloaded items

  5. Once ready to place items back into the repository, update object datastream in repository (see notes below)

curl -b token.txt -X POST -F "method=PUT" -F "file=@${SOURCE_FILE}" https://${SERVER_NAME}/islandora/rest/v1/object/${PID}/datastream/${DSID}
  1. Add workflow information describing the change details here : ask a CWRC admin for the workflow parameters to add via the activity parameter
curl -b token.txt -G -X GET "https://${SERVER_NAME}/islandora_workflow_rest/v1/add_workflow" -d PID=${PID} -d activity='{"category":"metadata_contribution","stamp":"orlando:ENH","status":"c","note":"entity"}'
  1. Remove lock (if lock exists): CWRC datastream will not release the lock on an update event as is the case with other datastreams will (details: exclusion list, and see the implementation of the CWRC-Writer save versus save and exit functionality.
curl -b token.txt -X DELETE https://${SERVER_NAME}/islandora/rest/v1/object/${PID}/lock

Note: documentation regarding the REST API update: "... mock PUT / DELETE requests as POST requests by adding an additional form-data field method to inform the server which method was actually intended.  At the moment multi-part PUT requests such as the one required to modify an existing datastream's content and properties are not implemented you can mock these PUT requests using aforementioned mechanism.POST and include an additional form-data field method with the value PUT...."

Note: other approaches to prevent write collisions:

  • the metadata for a datastream (HTTP GET on the object DSID) contains a checksum. Saving the checksum at download time and then comparing to the checksum on the server at upload could act as a mechanism to verify the respository content has not been modified. Note all object have a checksum
curl -b token.txt -X GET https://${SERVER_NAME}/islandora/rest/v1/object/${PID}/datastream/${DSID}?content=false
  • use the date on the created version
curl -b token.txt -X GET https://${SERVER_NAME}/islandora/rest/v1/object/${PID}/datastream/${DSID}?content=false

More information on the REST API used above can be found here: https://github.com/discoverygarden/islandora_rest/blob/7.x/README.md

Specific REST APIs: CWRC Workflow

How to access CWRC workflow information - https://github.com/cwrc/cwrc_workflow

Specific REST APIs: CWRC Entities

How to access CWRC entities - https://github.com/cwrc/cwrc_entities

Specific REST APIs: Object locking

How to lock and unlock and existing object - https://github.com/echidnacorp/islandora_object_lock, https://github.com/echidnacorp/islandora_object_lock/blob/7.x/islandora_object_lock_rest/README.md

Specific REST APIs: Credit Visualization

Lookup credit visualization details: https://github.com/cwrc/islandora_cwrc_credit_visualization

BagIT Extension

https://github.com/cwrc/islandora_bagit_extension

Preservation

To allow a preservation system to pull content from the CWRC repository, the following extenstion provides the required REST API's: BagIt Extension.

The user needs to have view access to all Fedora objects. If the anonymous uses does not have access, then the REST Login API is required by the preservation client.

Schema Validation

To allow the CWRC-Writer to validate a document against a schema via a web API (https://github.com/cwrc/cwrc-validator).

Authentication against APIs

CWRC uses the Drupal "Services" and "Rest Server"

curl -X POST -i -H "Content-type: application/json" -c cookies.txt -b cookies.txt -X POST http://dev.local/rest/user/login -d '{ "username":"zz","password":"zz"}'

After which point you need only include “-b cookies.txt” for all subsequent requests for them to be authenticated as the zz user.

Like so:

curl -b cookies.txt -X GET http://dev.local/islandora/rest/v1/object/islandora:root

Assessing via jquery JavaScript:

Services -> Edit Resources -> select tab "Server" -> enable "application/x-www-form-urlencoded" to prevent " Unsupported request content type application/x-www-form-urlencoded"
http://stackoverflow.com/questions/8535820/drupal-login-via-rest-server
https://www.drupal.org/node/2279819
https://www.drupal.org/node/1334758
Javascript login - based on 2015-08-18 e-mail troubleshooting with Ed Armstrong
http://stackoverflow.com/questions/8863571/cors-request-why-are-the-cookies-not-sent
need to add xhrFields so cookies sent
may need but unsure as of 2015-08-18:
Header add Access-Control-Allow-Credentials "true"
Header add Access-Control-Allow-Methods: "GET, POST, PUT, DELETE"
Header add Access-Control-Allow-Headers: "Authorization"
    $.ajax({
        url: cwrcurl,
        type: 'GET',
        callback: '?',
        datatype: 'application/json',
        success: function() { alert("Success"); },
        error: function() { alert('Failed!'); },
        xhrFields: {
            withCredentials: true
        }

CANARIE

ToDo: elaborate

Circa 2018-2021, CANARIE checks the health of the cwrc.ca site along with gathering usage stats plus housing links to information about the platform in CANARIE platform registry. The information provided to the CANARIE registry is provided by endpoints defined in cwrc_core_tweaks. URL's provided to the CANARIE registry either point to Drupal pages or to redirects such as the Fact Sheet redirect


CWRC Repository Drupal modules (all in Git format) as of 2017-07-18

Shared

CWRC modules

digitalpage.ca

modernistcommons.ca

spanishcivilwar.ca

Odds and ends

Handy commands

To find readme files find . -name README.md > /tmp/z sed 's%/[^/]$%%' /tmp/z sed 's%/([^/])$%/\1](\1)%' /tmp/z > /tmp/zzzzz grep -R 'url = ' */.git/config