Authentication and authorisation allow the Grid to identify who is using it and what they are allowed to do.
We distinguish between three types of identity:
- internal machine users represented by an
InnerServicePrincipal
(calls originated from internal Grid service to another, e.g.thrall
callingmedia-api
) - external machine users represented by a
MachinePrincipal
(automated ingest, batch processing etc. done by another application) - human users represented by a
UserPrincipal
(people directly using the Grid via the UI).
These three types of users are usually be identified using a different strategy:
- the internal machine users are identified by the presence of a signature header (this signature is generated from a UUID and timestamp and signed using the shared Play secret) and carries the identity of the originating service and any intermediate services.
The mechanism for handling the other two types of user is pluggable, for example, at the Guardian:
- we identify external machine users by an API key in the header of an API request
- human users are typically identified by looking for a cookie that a user has in their browser. If they don't have the cookie, or the cookie is out of date, then we require them to authenticate in order to obtain a valid cookie before they can continue using the Grid.
In all cases, the Grid can make further inter-microservice calls. In order to support this a mechanism is provided to call other services on behalf of a principal.
Note: Authorisation is currently a work in progress so this sketches out the current thinking only.
When the Grid receives certain API requests it decides whether the principal making the request has permission to do so. The data which is used to make this decision can come from the principal itself, from an external source of data or a combination of the two.
This is essentially a function of (Principal, Action) => Boolean
. Action
can be a simple permission or it can have a
parameter (such as image attributes such as uploadedBy
or organisation
) allowing images to be visible to only
subsets of users.
Any Principal
(human or machine) has an identity
(such as an email address) and an attributes
field. The latter is
a TypedMap
which can be used to encapsulate any permission data obtained during the authentication process. This
permission data can then be used in the function implemented.
There are separate providers for user and external machine authentication which are configured using
authentication.providers.user
and authentication.providers.machine
respectfully. The provider configured at
authentication.providers.user
must implement UserAuthenticationProvider
and that configured for
authentication.providers.machine
must implement MachineAuthenticationProvider
.
Both providers follow a similar shape, although the user authentication is more complicated due to the additional support for logging a user in if they are not currently authenticated.
Both traits can be found in AuthenticationProvider.scala which will have the most up-to-date documentation. You should read the following documentation as a companion to the scala doc.
There are a small number of anticipated user providers (in production we'd expect installations to use one of the last two options):
- No-auth - we'll likely implement a no-op auth provider for the purpose of demonstrating the Grid via docker
- Basic authentication - we might also implement a very simple basic auth provider for the purpose of evaluating the Grid
- Federated auth - e.g. OIDC or SAML; this is similar to the original hardcoded authentication system in that a user is sent to a third party to authenticate and then a token is returned by the user which can then be validated by the authentication provider
- Proxy auth - in this case an HTTP proxy sits in front of the application, for example oauth2-proxy and authentication provider parses a header forwarded by the proxy service
A federated authentication provider is likely to need to implement all provider methods.
The existing PanDomainAuthenticationProvider
uses OIDC federated authentication with a cookie that sits on the "domain
root" (note that each microservice currently sits on a separate subdomain, although it wouldn't take much effort to
change this to have a single domain and route to individual microservices using different paths on that domain).
Unfortunately the PanDomainAuthenticationProvider
is tightly integrated into the Guardian's ecosystem so is unlikely
to be useful as anything more than a starting point.
If an unauthenticated user visits the Grid then they will be redirected to the OIDC service. They will return to a callback endpoint which validates the token from the OIDC service and sets a cryptographically signed cookie. Subsequent visits and API calls use the cookie to identify the user (until the cookie expires).
In general a provider for a federated system will implement authenticateRequest
to check for a value in
the Play session
1 which avoids the need to deal with cookie signing concerns. This description assumes that this approach is
being used.
The AuthenticationStatus
is used to signal to the Grid whether a user is authenticated (and if so, who they are) or
not. A user can fail authentication for a number of reasons but in most cases the Grid will then send the user for
authentication using the sendForAuthentication
function. This will typically redirect a user to the federated
authentication service with appropriate parameters (including the return URL). The user's browser will then take the
user through authentication and eventually land back at the return URL on the Grid. That return URL will call the
sendForAuthenticationCallback
function which must validate the token returned by the federated authentication service
prior to setting appropriate values in the Play session.
There are two other methods that must be implemented: the flushToken
endpoint should remove the authentication data
from the play session and onBehalfOf
must pass the whole cookie (with the name from the play config key
session.cookieName
) on to the downstream requests. To achieve this you will likely want to push the cookie value into
the attributes
map and then pull it out in much the same way as is implemented for the
PanDomainAuthenticationProvider
described above.
1 notes: for this to work you'll also need to ensure that play.http.secret.key
is configured to be the same
across all services and session.domain
is set to a shared domain root; whilst the session is tamper-proof, be aware
that data stored in the session is visible to the user.
If the Grid is behind a proxy that is handling authentication then it is likely that the provider only needs to
implement authenticateRequest
and onBehalfOf
. The former will extract and validate (if necessary) the HTTP header
containing the authentication token. This header will need to be stored in the attributes
field of the user. The
latter method will simply need to add the header to outgoing requests. The remaining methods can simply be implemented
with None
.
In the case of using proxy authentication, there is no need to run the auth
microservice.
Warning: One remaining issue is how the authentication proxy deals with users who are not logged in or whose
authentication has expired. When using a federated authentication service, the Grid signals to the kahuna single page
application that the user session has expired by returning a 419
status code for any API calls. Kahuna might need to
be modified to recognise other status codes and headers as a requirement for re-authenticating the user.
There are also a small number of anticipated MachineAuthenticationProviders
:
- A no-op provider to allow easy use via the docker demo
- An API key provider (the current default with keys in an S3 bucket)
- Alternative API key providers (possibly backed by a database or using a signing mechanism rather than a plain text key)
In each case there are only two methods that need to be implemented. The first is the authenticateRequest
which should
validate the appropriate HTTP header and create the MachinePrincipal as appropriate (storing the auth header in the
attributes
map for downstream requests). Secondly it will need to implement the onBehalfOf
method to allow
downstream calls by appending the auth header to requests.
The authentication provider was not merged at the time of writing these docs so the documentation doesn't yet exist.