This project a prototype implementation of an RO-Crate based collections archive and data portal based on the Portland Common Data model of collections suitable use in long-term archival repository systems that aim to follow the Protocols for Long Term Archival Repository Services PILARS.
The library can be used to mount the API endpoints into your own Fastify or Express app.
The aim is to provide an API-centred reference implementation of a repository with an access-controlled web portal for data dissemination and deposit:
- Backend-agnostic RO-Crate storage using pluggable storage-layers
- [OCFL] - the Oxford Common File Layout specification on disk or cloud-based object storage
- Object Storage as used in [PARADISEC]
- A simple RO-Crate file layout (TODO - likely a super-simple protocol for directory hierarchy with the presence on RO-Crate used to indicate that everything under that directory is a single RO-Crate)
- Authorisation based on the concept of Access (and Deposit, TODO) Licenses
using an arms-length process with an external authority such as a REMS
instance or other license/group management system
- Based on the principle that all data MUST have a implementation-neutral natural-language description of its access conditions stored with it
- Licenses may be based on 'traditional' access control lists maintained in software such as PARADISEC or a license manager such as REMS, (TODO: an example implementation using GitHub groups to manage license access, a simple ACL manager?)).
- Node.js (version 22 or higher)
- pnpm package manager
- Database: MySQL or PostgreSQL
- OpenSearch for search functionality
- Docker and Docker Compose (for development)
# Install the package
pnpm add arocapi
# Install required peer dependencies
pnpm add @prisma/client @opensearch-project/opensearch
pnpm add -D prismaCreate a .env file in your project root:
# Database Configuration (You can use an DB that Prisma supports)
DATABASE_URL="mysql://root:password@localhost:3306/catalog"
# OpenSearch Configuration
OPENSEARCH_URL="http://localhost:9200"Tip
We recommend using the Fastify as it will be a native integration
Create your Fastify application with Typescript support:
// src/index.ts
import { Client } from '@opensearch-project/opensearch';
import arocapi, { AllPublicAccessTransformer, AllPublicFileAccessTransformer } from 'arocapi';
import Fastify from 'fastify';
import { PrismaClient } from './generated/prisma/client.js';
import { Readable } from 'stream';
// NOTE: Only needed if you are going to use these yourself
declare module 'fastify' {
interface FastifyInstance {
prisma: PrismaClient;
opensearch: Client;
}
}
const prisma = new PrismaClient();
if (!process.env.OPENSEARCH_URL) {
throw new Error('OPENSEARCH_URL environment variable is not set');
}
const opensearchUrl = process.env.OPENSEARCH_URL;
const opensearch = new Client({ node: opensearchUrl });
const fastify = Fastify({
logger: true,
});
// For fully public datasets
fastify.register(arocapi, {
prisma,
opensearch,
accessTransformer: AllPublicAccessTransformer,
fileAccessTransformer: AllPublicFileAccessTransformer,
// Required: File handler for serving File entity content
fileHandler: {
get: async (file) => {
const fileUrl = `https://storage.example.com/${file.meta.storagePath}`;
return { type: 'redirect', url: fileUrl };
},
head: async (file) => ({
contentType: file.mediaType,
contentLength: file.size,
}),
},
// Required: RO-Crate handler for serving RO-Crate metadata
roCrateHandler: {
get: async (entity) => {
const jsonString = JSON.stringify(entity.rocrate, null, 2);
return {
type: 'stream',
stream: Readable.from([jsonString]),
metadata: {
contentType: 'application/ld+json',
contentLength: Buffer.byteLength(jsonString),
},
};
},
head: async (entity) => ({
contentType: 'application/ld+json',
contentLength: Buffer.byteLength(JSON.stringify(entity.rocrate)),
}),
},
});
try {
await fastify.listen({ port: 3000 });
} catch (err) {
fastify.log.error(err);
process.exit(1);
}Create your Express application:
// src/app.ts
import { Client } from '@opensearch-project/opensearch';
import Arocapi from 'arocapi/express';
import express from 'express';
import expressListRoutes from 'express-list-routes';
import { PrismaClient } from './generated/prisma/client.js';
const prisma = new PrismaClient();
if (!process.env.OPENSEARCH_URL) {
throw new Error('OPENSEARCH_URL environment variable is not set');
}
const opensearchUrl = process.env.OPENSEARCH_URL;
const opensearch = new Client({ node: opensearchUrl });
const app = express();
const arocapi = await Arocapi({ opensearch, prisma });
app.use('/api', arocapi);
const port = process.env.PORT || 3000;
app.listen(port, () => {
console.log(`API server listening on port ${port}`);
});# Initialize prisma with your preferred database provider
npx prisma init --datasource-provider mysql
# or for PostgreSQL: npx prisma init --datasource-provider postgresqlModify the prisma.config.ts file in your project root to support folder based schemas:
// prisma.config.ts
import 'dotenv/config';
import { defineConfig } from 'prisma/config';
export default defineConfig({
// Adds support for folder based schemas
schema: 'prisma',
});Create the directory structure and link to arocapi models:
# Create prisma models directory
mkdir -p prisma/models
# Link to the upstream arocapi models
ln -s ../../node_modules/arocapi/prisma/models prisma/arocapi# Create database migrations for this version of arocapi
npx prisma migrate dev --name="arocapi-$(pnpm list | grep arocapi | awk '{ print $2 }')"
# Generate the Prisma client
npx prisma generate
The arocapi provides the following endpoints:
GET /entities- List all entities with pagination and filteringGET /entity/:id- Get a specific entity by IDGET /entity/:id/rocrate- Download RO-Crate metadataGET /files- List files with pagination and filteringGET /file/:id- Download or access file contentPOST /search- Search entities using OpenSearch
The API provides a flexible transformer system for customising entity responses through two types of transformers:
The accessTransformer parameter is required for security. You must explicitly
choose how access control is handled for your repository.
For fully public datasets, use AllPublicAccessTransformer:
import arocapi, { AllPublicAccessTransformer } from 'arocapi';
await server.register(arocapi, {
prisma,
opensearch,
accessTransformer: AllPublicAccessTransformer,
});For restricted content, implement a custom access transformer:
const accessTransformer = async (entity, { request, fastify }) => {
// Custom logic to determine access
const user = await authenticateUser(request);
const canAccessContent = await checkLicense(entity.contentLicenseId, user);
return {
...entity,
access: {
metadata: true, // Metadata always visible
content: canAccessContent,
contentAuthorisationUrl: canAccessContent
? undefined
: 'https://rems.example.com/request-access',
},
};
};Warning
The accessTransformer is required to prevent accidental exposure of
restricted content. You must make an explicit choice about access control for
your repository.
Optional transformations for enriching or modifying response data. Multiple transformers can be chained together.
await server.register(arocapi, {
prisma,
opensearch,
accessTransformer: AllPublicAccessTransformer,
entityTransformers: [
// Add computed fields
(entity) => ({
...entity,
displayName: `${entity.name} [${entity.entityType.split('/').pop()}]`,
}),
// Add counts
async (entity, { fastify }) => {
const objectCount = entity.memberOf
? await fastify.prisma.entity.count({
where: { memberOf: entity.rocrateId },
})
: 0;
return {
...entity,
counts: {
objects: objectCount,
}
};
},
],
});Every entity response flows through this three-stage pipeline:
- Base transformer - Converts database entities to standard format
- Access transformer - Adds access control information
- Entity transformers - Optional additional transformations
Access control for restricted content:
accessTransformer: async (entity, { request, fastify }) => {
const hasAccess = await checkUserPermissions(request, entity.contentLicenseId);
return {
...entity,
access: {
metadata: true,
content: hasAccess,
},
};
}Adding computed or derived fields:
entityTransformers: [
(entity) => ({
...entity,
shortId: entity.id.split('/').pop(),
year: extractYear(entity.description),
}),
]Fetching related data asynchronously:
entityTransformers: [
async (entity, { fastify }) => ({
...entity,
stats: await fetchEntityStats(entity.id, fastify.prisma),
}),
]The API provides a separate transformer system for file responses, similar to entity transformers but specifically designed for files.
The fileAccessTransformer parameter is required for security. You must explicitly choose how access control is handled for file content.
Key Difference from Entity Transformers: File metadata (filename, size, mediaType, etc.) is always accessible. Only content access is controlled via the access.content field.
For fully public datasets, use AllPublicFileAccessTransformer:
import arocapi, { AllPublicAccessTransformer, AllPublicFileAccessTransformer } from 'arocapi';
await server.register(arocapi, {
prisma,
opensearch,
accessTransformer: AllPublicAccessTransformer,
fileAccessTransformer: AllPublicFileAccessTransformer,
});For restricted content, implement a custom file access transformer:
const fileAccessTransformer = async (file, { request, fastify }) => {
// Custom logic to determine file content access
const user = await authenticateUser(request);
const canAccessContent = await checkLicense(file.contentLicenseId, user);
return {
...file,
access: {
content: canAccessContent,
contentAuthorizationUrl: canAccessContent
? undefined
: 'https://rems.example.com/request-access',
},
};
};
await server.register(arocapi, {
prisma,
opensearch,
accessTransformer: AllPublicAccessTransformer,
fileAccessTransformer,
});Warning
The fileAccessTransformer is required to prevent accidental exposure of
restricted file content. You must make an explicit choice about access control
for your files.
Optional transformations for enriching or modifying file response data. Multiple file transformers can be chained together.
await server.register(arocapi, {
prisma,
opensearch,
accessTransformer: AllPublicAccessTransformer,
fileAccessTransformer: AllPublicFileAccessTransformer,
fileTransformers: [
// Add computed fields
(file) => ({
...file,
displayFilename: file.filename.toUpperCase(),
sizeInKB: Math.round(file.size / 1024),
extension: file.filename.split('.').pop(),
}),
// Fetch parent entity information
async (file, { fastify }) => {
const parent = await fastify.prisma.entity.findFirst({
where: { rocrateId: file.memberOf },
});
return {
...file,
parentEntity: parent ? {
id: parent.rocrateId,
name: parent.name,
type: parent.entityType,
} : null,
};
},
],
});Every file response flows through this three-stage pipeline:
- Base file transformer - Converts database File records to standard format
- File access transformer - Adds access control information (content only)
- File transformers - Optional additional transformations
File transformers are applied to the files listing route:
GET /files- File list (each file transformed)
Note: The /file/:id endpoint (for downloading file content) uses the fileHandler system, not file transformers. See the File Handler System section below for details.
The API provides two separate handler systems for serving different types of content. These are distinct from the file transformer system described above.
Important Distinction:
- File Transformers (above) - Transform file metadata in the
/fileslisting endpoint response - File Handler (this section) - Serves actual file content via the
/file/:iddownload endpoint - RO-Crate Handler (this section) - Serves RO-Crate metadata via the
/entity/:id/rocrateendpoint
The fileHandler parameter is required. It serves actual file content for the /file/:id endpoint.
This handler is responsible for:
- Streaming file content from your storage backend
- Generating redirect URLs to external storage (S3, CDN, etc.)
- Implementing access control for file downloads
- Supporting HTTP range requests for media streaming
The roCrateHandler parameter is required. It serves RO-Crate metadata as JSON-LD
for any entity type (Collection, Object, or File) via the /entity/:id/rocrate endpoint.
File Handler Example (S3 with redirect):
fileHandler: {
get: async (file) => {
const command = new GetObjectCommand({
Bucket: file.meta.bucket,
Key: file.meta.s3Key,
});
const url = await getSignedUrl(s3, command, { expiresIn: 3600 });
return { type: 'redirect', url };
},
head: async (file) => {
return {
contentType: file.mediaType,
contentLength: file.size,
};
},
}File Handler Example (local filesystem):
fileHandler: {
get: async (file) => {
const filePath = `/data/files/${file.meta.storagePath}`;
const stats = await stat(filePath);
return {
type: 'stream',
stream: createReadStream(filePath),
metadata: {
contentType: file.mediaType,
contentLength: stats.size,
lastModified: stats.mtime,
},
};
},
head: async (file) => {
const filePath = `/data/files/${file.meta.storagePath}`;
const stats = await stat(filePath);
return {
contentType: file.mediaType,
contentLength: stats.size,
lastModified: stats.mtime,
};
},
}RO-Crate Handler Example (stream from database):
roCrateHandler: {
get: async (entity) => {
const jsonString = JSON.stringify(entity.rocrate, null, 2);
return {
type: 'stream',
stream: Readable.from([jsonString]),
metadata: {
contentType: 'application/ld+json',
contentLength: Buffer.byteLength(jsonString),
},
};
},
head: async (entity) => {
const jsonString = JSON.stringify(entity.rocrate);
return {
contentType: 'application/ld+json',
contentLength: Buffer.byteLength(jsonString),
};
},
}The file handler must return one of two response types:
Redirect Response - Redirect to external file location:
{ type: 'redirect', url: 'https://storage.example.com/file.wav' }Stream Response - Serve file content directly:
{
type: 'stream',
stream: Readable, // Node.js readable stream
metadata: {
contentType: 'audio/wav',
contentLength: 1024,
etag?: '"abc123"', // Optional
lastModified?: new Date(), // Optional
},
}The /file/:id endpoint supports these query parameters:
disposition- 'inline' (default) or 'attachment' for download promptsfilename- Custom filename for Content-Disposition header (defaults to entity.name)noRedirect- If true with redirect response, returns JSON{location: url}instead of HTTP 302
The endpoint automatically handles HTTP range requests for partial content, useful for media streaming:
- Returns 206 Partial Content for valid range requests
- Returns 416 Range Not Satisfiable for invalid ranges
- Sets appropriate
Content-RangeandAccept-Rangesheaders
The meta JSON field in the Entity model stores implementation-specific
metadata for your file handler:
await prisma.entity.create({
data: {
rocrateId: 'http://example.com/file/123',
name: 'audio.wav',
entityType: 'http://schema.org/MediaObject',
// ... other required fields
meta: {
bucket: 's3://my-bucket',
storagePath: 'collections/col-01',
checksum: 'sha256:abc123...',
},
},
});Your file handler can use this metadata to locate files in your storage system.
-
Start the services:
# Start MySQL and OpenSearch using docker-compose docker compose up -d -
Set up the database:
# Generate Prisma client npx prisma generate # Apply database migrations npx prisma migrate deploy
-
Run your application:
# For development mode pnpm run dev # Or start directly node dist/app.js
-
Access your API:
- API: http://localhost:3000
- Database: localhost:3306 (MySQL)
- OpenSearch: http://localhost:9200
When using arocapi in your project, you can use these scripts:
# Development
pnpm run dev # Start in development mode with hot reload
pnpm run start # Start in production mode
# Database Management
npx prisma generate # Generate Prisma client
npx prisma migrate dev # Create and apply new migration
npx prisma migrate deploy # Apply existing migrations
npx prisma studio # Open Prisma Studio (database GUI)
# Database Console
docker compose exec db mysql -u root -p catalog # Connect to MySQL console
# Testing
pnpm run test # Run test cases (if configured)When implementing arocapi in your project, follow this structure:
your-project/
├── src/
│ ├── app.ts # Your main application file
│ └── generated/
│ └── prisma/ # Generated Prisma client
├── prisma/
│ ├── schema.prisma # Prisma schema
│ ├── models/ # Your custom models
│ └── upstream/ # Symlink to arocapi models
├── prisma.config.ts # Prisma configuration
├── .env # Environment variables
└── package.json