Skip to content
jianhuashao edited this page Jun 24, 2012 · 212 revisions

Dataware Catalog Protocol v0.2.1

# Overview The Dataware Catalog Protocol enables the management of third-party applications obtaining access to a cluster of personal data resources under owner's control. It is developed for HTTP framework in [Horizon Dataware project](http://perscon.net/) and extended from [OAuth 2.0 Authorization protocol v26](http://tools.ietf.org/html/draft-ietf-oauth-v2-26). Consider compatibility to reader, some terms are inheriting from existing OAuth terminologies.

Content:

  1. Roles
  2. Service Registration
  3. Mutual Registration
  4. One-way Registration
  5. Access Enquiry
  6. Service Management
  7. Access Permission
  8. Sandbox Proof

# Roles: The protocol defines five roles:

resource server: It was normally used to be called DataSource in Horizon Dataware project. The server hosting the protected resources (usually personal data, or aggregation of personal data), capable of accepting and responding to protected resource requests using access tokens.

resource owner: An entity capable of granting access to a protected resource. When the resource owner is a person, it is referred to as an end-user.

client: It was normally used to be called Processor in Horizon Dataware project. An application making protected resource requests on behalf of the resource owner and with its authorization. The term client does not imply any particular implementation characteristics (e.g. whether the application executes on a server, a desktop, or other devices).

authorization server: It was normally used to be called Catalog in Horizon Dataware project. The server issuing access tokens to the client after successfully authenticating the resource owner and obtaining authorization. It would issue the management and control of access interaction from client to resource server.

catalog owner: An entity capable of management and controlled the authorization between client and resource. When catalog owner is a person, it is refereed to as an end-user. In common cases, catalog owner and resource owner can be same person or entity, as a person can have one catalog server to manage several resource servers. In some cases, catalog owner can act as a agent be different from resource owner.

The authorization server may be the same server as the resource server or a separate entity. A single authorization server may issue access tokens accepted by multiple resource servers. In additional to OAuth2, this protocol design the interaction between the authorization server and resource server if they are separate and need additional trust management.

Back to the Top of Page


Protocol Interaction Flow

Dataware Catalog Protocol Interaction Flow All interaction are based on HTTP same as OAuth, and details will be discussed and illustrated in later session.

Resource server types:

Resource server can be different in module:

  1. data resource: it is resource storing raw user data. Example could be Facebook, Twitter, Gmail, etc. They store the original data captured from users digital activities. It is a single data source.

  2. function resource: it is resource storing user data processing from other data resource. Example could be billing, logging, shopping history, etc. They came from existing data resources and provide as a gate to public. It is a aggregated data source. Function resources perform as mashups and connected resource in a distributed model.

Data can come in different way:

  1. existing data service API: Many companies provide OpenAPI and oauth interface for owners to access to their data.

  2. data shim from existing data service: If service does not provide API for access, a third-party shim can run to host data resource as a agent.

  3. dynamic data collector: If no service is hosting the data now, a third-party application can start to collect this new data source. It can run from Cloud, mobile, desktop, etc.

Forks (differences) from OAuth2:

  1. authorization server (catalog) is natively designed to separate from resource server. This design also brings a new role which is catalog owner. Current most industry implementations have authorization server and resource server to be the same server.

  2. In registration, x does not have to be definitely pre-registered on y before authorization. x will be dynamically recorded in y with its public routing information, like host information in HTTP header. For each specific authorization, an dynamic generated id from both x and y will be passed to each other for it to identify this specific authorization. Current most industry implementations have to create a app in authorisation server first, it is good for computing resource management, but lose flexibility to dynamic, this design use dynamic generated is can still achieve this management goal.

  3. Either resource and catalog can start the registration. Current most industry implementation have to force x to start the registration in y. This design allows both x and y can start the process. In this case, resource owner can actively give permission to third party or passive accept the access request from third party.

  4. The design adds a sandbox to proof the enquiry from client which is new to OAuth2. Catalog owner can set up the sandbox security and Developers have to implement sandbox by themselves. A simple example of sandbox would be directly proof all enquiry within the access scope.

  5. catalog only does authorization job. Other functions can be plugin as module as separate resource server. Example of other management functions can be: bill, log, user-agent, etc. This design would keep catalog simple and lightweight enough.

Back to the Top of Page


# Service registration

dataware.catalog service registration

Each service registration includes two identities. This protocol designs that each identifies can start the registration to create connection for both identities. There are two kinds of registration in the protocol. One is the registration between catalog and resource Server, the other is the registration between catalog and client. catalog performs the bridge and gate management between client and resource servers, so it involves in both registration as a wall.

Next session will demonstrate a concrete implementation of both registrations.

Information needs to be exchanged between both sides for registration:

  • xxx_callback: it help other side to reach back in HTTP.
  • xxx_access_token: it is the token for authentication by one side on other side. It is also help for identification.
  • xxx_validation: it defines the permission. More details on permission will be discussed in later session.

In each registration, there are two entities: registrant regist on register.

  • registrant: entity that wants to regist on register. In catalog and resource registration case in above diagram, registrant is catalog, because catalog starts the registration to regist on resource.
  • register: entity that registrant regist on. In catalog and resource registration case in above diagram, register is resource, because catalog starts the registration to regist on resource.
**regist_status** will be use to specify which target on service looking for. It will be used to identify which status come from. It will not be use after final status. There are eight status possible in registration activity:
  1. registrant_request
  2. register_owner_redirect
  3. register_owner_grant
  4. register_grant
  5. registrant_owner_redirect : register side may have different permission request as its function is different registrant.
  6. registrant_owner_grant
  7. registrant_confirm
  8. register_activate

For each communication between different server, it will provide params for other side to come back with identification. If communication is happened within same server, it does not require REGIST_STATUS. There are three common cross communication params:

  • regist_status: it would be one of 8 REGIST_STATUS discussed above.

  • regist_type: it would be either catalog_resource or client_catalog.

  • regist_callback: the path to reach back.

When request, should request to specific user id or just general request and waiting for user prove ?

In the real life, it is normally request to access the specific user's resources. It means service registration should have specific user id come in request. Some service like Twitter, the user id is public, some service like facebook, the user id is partly public, they are easy to get the user id and then pass it to registrant to request to that specific user. However, some service will not make user id public or accept to check user id exist in any way, in this case, registrant should generate a request for general user, and any user in register get this request token and aprove the registration, so that register would link the request with user.

No matter which cases, registrant should put a reminder information in the request so that user in register would be able to identify the users. In the registration between client and catalog, this reminder information will also be helpful as it will understand who is the client.

request to a specific user id: the user id should come with the request from registration. If user id is specific, so that register can refer to that user without require user to login first, so it would be able to identify the user and contact the user in email or other asynchronous manner. Register will also only deal the request with that specific users.

request to general user: now user id will come with the request from registration, so it can be distributed to any users. Once a user get it and login into register, he/she will be link with the request registration with register. In this case, register have to have user login first before grant to user permission.

It may cause this problem: if a user got the token, and another user login to regist this token in register

Back to the Top of Page


## MUTUAL REGISTRATION: Registration of catalog on resource server

In this example, catalog will init to start the registration process. Catalog is REGISTRANT and resource server is REGISTER.

1. REGISTRANT REQUEST

catalog REQUEST to resource server for registration. It will generate a temperate identification registrant_request_token to request to resource server that would feedback to identify session in catalog. It also declare the scope of access registrant_request_scope that requests to resource server. It also tells how resource server would get back to catalog with regist_callback. As in HTTP, it will automatically expose itself with http host name in HTTP HEAD. This design is mainly concern that many cases require that register should not be necessary be identify by registrant in advance.

  • regist_callback: URL that resource server can use to reach back to catalog. It must be reachable. As HTTP is stateless, so it is required resource server in step 4.

  • registrant_request_token: identification that catalog provides to resource server, so that catalog would identify which session dealing to, after feedback from resource server. It may be kept by resource server in next step for further identification. Internally, this will link to a catalog owner behind the sense, but not discussed in this document.

  • registrant_request_scope: permission that register wants to request. It could be permission for access action or access content. It may be different as defined by each individual resource server. More details discussed in Permission session.

For example:

HTTP https://resource.com/regist?
    regist_status=registrant_request&
    regist_type=catalog_resource&
    regist_callback=_URL_&
    registrant_request_token=_TOKEN_&
    registrant_request_scope=_CODE_

2. REGISTER OWNER REDIRECT

resource server REDIRECT REQUEST to resource owner to GRANT permission. resource server would firstly process the REQUEST from catalog, and generate a temperate register_redirect_token to identify request session from resource users.

  • register_redirect_token: temperate token in resource server to identify request session after resource owner's action.

For example:

HTTP https://resource.com/regist?
    regist_status=owner_redirect&
    regist_type=catalog_resource&
    register_redirect_token=_TOKEN_

3. REGISTER OWNER GRANT

resource owner GRANT permission to registration request from catalog in resource owner. Resource owner would need to login before grant action. After login, a resource owner would have a unique user identification register_user_token. A list of requested permission would display on the redirected request page for resource owner to decide. If resource owner decides to GRANT, a register_redirect_token and register_user_id would sent back to resource server.

  • register_redirect_token: it is a token sent from resource server to resource owner.

  • register_user_token: it is a unique number in resource server to identify which users they are referring to. It could be a temperate generated token for each authentication, by considering security issue. It would also be a permanent id mapping to a user in the server.

For example:

HTTP https://resource.com/regist?
    regist_status=owner_grant&
    regist_type=catalog_resource&
    register_redirect_token=_TOKEN_&
    register_user_token=_TOKEN_

4. REGISTER GRANT

resource server feedback GRANT permission back to catalog after resource owner GRANT it. By taking register_redirect_token and register_user_id, resource server identify which user grant permission to what catalog. It would generate a formal register_access_token for catalog to use to access to resource server, but this catalog would need to be confirmed from catalog before actual use. Resource server can also declare the validation code for access permission.

  • regist_callback: URL that catalog can use to reach back to resource server. It must be reachable. As HTTP is stateless, so it is required catalog in step 7.

  • register_access_token: it is the actual access token that catalog would need every time later to access to restricted content in resource server instead of resource owner. It is the most important token needed by catalog. This token would only be activated after REGISTRANT CONFIRM in later step.

  • register_validation: defines how register_access_token is limited in actual access. It would normally define limitation on how often access can max be, how long one connection can be, etc. It would implemented different in each individual resource server. This information would also be important for catalog to make public, so client would get general idea on how resource can be reached. The default implementation would be simply return back same scope requested by catalog. Details about this permission will be discussed in later Permission session session.

  • register_request_scope: Resource server will also need to request access permission from catalog. This request scope is different registrant_request_scope which is requested by catalog from resource server. There are many cases that register_request_scope and registrant_request_scope would be different, which will be discussed in Permission session.

  • registrant_request_token: it is a temperate identification generated by catalog to identify which session it is dealing to. It will expired once finishing this step.

For example:

HTTP https://catalog.com/regist?
    regist_status=register_grant&
    regist_type=catalog_resource&
    regist_callback=_URL_&
    register_access_token=_TOKEN_&
    register_validation=_CODE_&
    register_request_scope=_TOKEN_&
    registrant_request_token=_TOKEN_

5. REGISTRANT OWNER REDIRECT

catalog REDIRECT auth and request from resource server to CONFIRM permission. It is like price negotiation. Now is in the stage whether to process the response from other people. catalog would firstly process the ACCESS PERMISSION from resource server. It will then process the REQUEST from resource server for mutual auth, and generate a temperate registrant_request_token to identify request session from resource users. Because of this request_token, it will efficiently stop Man-in-the-Middle attacks.

  • registrant_redirect_token: temperate token in resource server to identify request session after resource owner's action.

For example:

HTTP https://catalog.com/regist?
    regist_status=registrant_owner_redirect&
    regist_type=catalog_resource&
    registrant_redirect_token=_TOKEN_

6. REGISTRANT OWNER GRANT

catalog owner accept GRANT from resource server and also GRANT permission to registration request from resource server in resource owner. catalog owner would need to login before grant action. After login, catalog owner would have a unique user identification registrant_user_token. A list of requested permission would display on the redirected request page for catalog owner to decide. If catalog owner decide to GRANT, a registrant_redirect_token and registrant_user_token would sent back to resource server.

  • registrant_redirect_token: it is a token sent from catalog to catalog owner.

  • registrant_user_token: it is a unique number in catalog to identify which users they are referring to. It could be a temperate generated token for each authentication, by considering security issue. It would also be a permanent id mapping to a user in the server.

For example:

HTTP https://catalog.com/regist?
    regist_status=owner_grant&
    regist_type=catalog_resource&
    registrant_redirect_token=_TOKEN_&
    registrant_user_token=_TOKEN_

7. REGISTRANT CONFIRM

catalog CONFIRM to receive and activate the access token from resource server. It will use register_access_token passing back from resource server to identify which registration refers to. It keeps the register_access_token and register_validation from resource server for later, and grant its own registrant_access_token and registrant_validation to resource server, so that both sides would access to each other (mutual access). It will also pass register_access_token back for resource server to identify which catalog session it is dealing to.

  • regist_callback: it helps resource server to turn back to catalog.

  • registrant_access_token: it is a token generated by catalog, so that resource sever would also access back to catalog for mutual access.

  • registrant_validation: it defines the validation that catalog permits resource server to access.

  • register_access_Token: it is a token generated by resource server to identify which session dealing to.

For example:

HTTP https://resoruce.com/regist?
    regist_status=registrant_confirm&
    regist_type=catalog_resource&
    regist_callback=_URL_&
    registrant_access_token=_TOKEN_&
    registrant_validation=_CODE_&
    register_request_token=_TOKEN_

8. REGISTER ACTIVATE

resource server ACTIVATE the registration after mutual registration. resource server activate the access permission of the register_access_token grant to catalog. It will then tell catalog to activate on its side. Catalog now can use register_access_token to access into resource server. Resource server will also assume that catalog will also activate its access token after this step. So that both side activate access token on its server.

  • registrant_access_token: it is a access token grant to resource server from catalog.

For example:

HTTP https://catalog/com/regist?
    regist_status=register_activate&
    regist_type=catalog_resource&
    registant_access_token=_TOKEN_    

Back to the Top of Page


## One-way Registration: Registration of client on catalog

In this example, client will init to start the registration process to register on catalog and access to catalog in the future. Client is REGISTRANT and catalog is REGISTER.

1. REGISTRANT REQUEST

client REQUEST to resource server for registration. It will generate a temperate identification registrant_request_token to request to catalog that would feedback to identify session in client. It also declare the scope of access registrant_request_scope that requests to catalog. It also tells how resource server would get back to catalog with regist_callback. As in HTTP, it will automatically expose itself with http host name in HTTP HEAD. This design is mainly concern that many cases require that register should not be necessary be identify by registrant in advance.

  • regist_callback: URL that catalog can use to reach back to catalog. It must be reachable. As HTTP is stateless, so it is required resource server in step 4.

  • registrant_request_token: identification that client provides to catalog, so that client would identify which session dealing to, after feedback from resource server. It may be kept by catalog in next step for further identification. Internally, this will link to a client owner behind the sense, but not discussed in this document.

  • registrant_request_scope: permission that register wants to request. It could be permission for access action or access content. It may be different as defined by each individual catalog. More details discussed in Permission session.

For example:

HTTP https://resource.com/regist?
    regist_status=registrant_request&
    regist_type=client_catalog&
    regist_callback=_URL_&
    registrant_request_token=_TOKEN_&
    registrant_request_scope=_CODE_

2. REGISTER OWNER REDIRECT

catalog REDIRECT REQUEST to resource owner to GRANT permission. resource server would firstly process the REQUEST from client, and generate a temperate register_redirect_token to identify request session from catalog users.

  • register_redirect_token: temperate token in catalog to identify request session after catalog owner's action.

For example:

HTTP https://resource.com/regist?
    regist_status=owner_redirect&
    regist_type=client_catalog&
    register_redirect_token=_TOKEN_

3. REGISTER OWNER GRANT

catalog owner GRANT permission to registration request from client in catalog owner. catalog owner would need to login before grant action. After login, a catalog owner would have a unique user identification register_user_token. A list of requested permission would display on the redirected request page for resource owner to decide. If resource owner decides to GRANT, a register_redirect_token and register_user_token would sent back to catalog.

  • register_redirect_token: it is a token sent from client to catalog owner.

  • register_user_token: it is a unique number in catalog to identify which users they are referring to. It could be a temperate generated token for each authentication, by considering security issue. It would also be a permanent id mapping to a user in the server.

For example:

HTTP https://resource.com/regist?
    regist_status=owner_grant&
    regist_type=client_catalog&
    register_redirect_token=_TOKEN_&
    register_user_token=_TOKEN_

4. REGISTER GRANT

caalog feedback GRANT permission back to client after catalog owner GRANT it. By taking register_redirect_token and register_user_token, catalog identify which user grant permission to what catalog. It would generate a formal register_access_token for client to use to access to catalog, but this catalog would need to be confirmed from client before actual use.

  • regist_callback: URL that client can use to reach back to catalog. It must be reachable. As HTTP is stateless, so it is required by client in step 5.

  • register_access_token: it is the actual access token that client would need every time later to access to restricted content in catalog instead of catalog owner. It is the most important token needed by client. This token would only be activated after REGISTRANT CONFIRM in later step.

  • register_validation: defines how register_access_token is limited in actual access. It would normally define limitation on how often access can be max, how long one connection can be, etc. It would implemented different in each individual catalog. This information would also be important for client to make public, so client would get general idea on how resource can be reached. The default implementation would be simply return back same scope requested by client. Details about this permission will be discussed in later Permission session session.

  • registrant_request_token: it is a temperate identification generated by client to identify which session it is dealing to. It will expired once finishing this step.

For example:

HTTP https://catalog.com/regist?
    regist_status=register_auth&
    regist_type=catalog_resource&
    regist_callback=_URL_&
    register_access_token=_TOKEN_&
    register_validation=_CODE_&
    registrant_request_token=_TOKEN_

5. REGISTRANT CONFIRM

client CONFIRM to receive and activate the access token from catalog. It will use register_access_token passing back from catalog to identify which registration refers to. It keeps the register_access_token and register_validation from catalog for later, and grant its own registrant_access_token to catalog, so that both sides would access to each other (mutual access). It will also pass register_access_token back for catalog to identify which client session it is dealing to.

  • regist_callback: it helps catalog to turn back to client.

  • registrant_access_token: it is a token generated by client, so that catalog would also access back to client for mutual access.

  • register_access_Token: it is a token generated by catalog to identify which session dealing to.

For example:

HTTP https://resoruce.com/regist?
    regist_status=registrant_confirm&
    regist_type=catalog_resource&
    regist_callback=_URL_&
    registrant_access_token=_TOKEN_&
    register_access_token=_TOKEN_

6. REGISTER ACTIVATE

catalog ACTIVATE the registration after one-way registration. catalog activates the access permission of the register_access_token grant to client. It will then tell client to activate on its side. client now can use register_access_token to access into resource server.

  • registrant_access_token: it is a access token grant to client from catalog.

For example:

HTTP https://catalog/com/regist?
    regist_status=register_activate&
    regist_type=catalog_resource&
    registant_access_token=_TOKEN_

Back to the Top of Page


# Access enquiry

After client register on catalog, it would ge a access_token to authenticate on catalog. For each query required by client to run on resource server needs to be authorised by catalog owner. Therefore, client needs to get its enquiry proof with its access_token for the first time. Catalog would perform a sandbox check on the enquiry, if it is passing catalog sandbox, catalog would send a token to tell resource server that catalog is happy to run this query. Resource server would then take over the happiness from catalog and check with resource server's sandbox, it will then feedback to catalog whether resource server is also happy to run the query. If resource server is happy to run the query, resource server will save query on its sever and generate a enquiry_token back to catalog. Catalog would pass the enquiry_token back to client with additional limitation on validation and path with url. Client can use this enquiry_token to invoke the query execution in multiple time with different params, but under the policy pass from catalog to client.

More details about sandbox in catalog and resource server will be discussed in later Sandbox Proof session.

Here demonstrate a concreate implementation of access enquiry

**enquiry_status** will be used to specify which target on service looking for. It will be used to identify which status come from. It will be used in the whole status of connection. There are six status in total: >1. client_enquiry >2. catalog_assign >3. resource_ready >4. catalog_proof >5. client_invoke >6. resource_feedback

For each communication between different server, it will provide params for other side to come back with identification.

  • enquiry_status: it would be one of six ENQUIRY_STATUS discussed above.
  • enquiry_callback: the path to reach back. It may reuse the callback kept in registration. Some implementation may want to have different callback, so it provides for flexibility.

1. CLIENT ENQUIRY

client ENQUIRY a specific query to catalog. client register on catalog is the authentication between them, for a specific query, client would enquiry for authorization. catalog would tell the details of query for catalog to proof. It will also use token grant from catalog catalog_access_token to authenticate on catalog for the query. A enquiry_callback would pass to catalog to reach back, it may be not necessary in real implementation as a callback may has been agreed and permanently used.

client would only request to each individual resource server. As two type of resource server are defined, one is single data resource server and the other is aggregated function resource server. If client would query on multiple data resource, it would either enquiry for each individual data resource sever, or enquiry to a aggregated function resource server. The reason for this design is to make catalog protocol simple and straight-forward enough.

  • enquiry_callback: it is a URL that catalog would need to reach back to client. It may be same to callback in registration. It is mainly providing for flexibility.

  • client_enquiry_token: it is a token generated by client to identify which query it is refer to.

  • catalog_access_token: it is a token granted by catalog in registration between client and catalog.

  • query: a code which would be invoked to execute on resource server.

For example:

HTTP https://catalog.com/enquiry?
    enquiry_status=client_enquiry&
    enquiry_callback=_URL_&
    client_enquiry_token=_TOKEN_&
    catalog_access_token=_TOKEN_&
    query=_CODE_

2. CATALOG SANDBOX GENERAL CHECKING

catalog would SANDBOX to check enquiry from client. SANDBOX here would check validity of ENQUIRY, and would be different for each individual real implementation. This SANDBOX could be action from a person or decision from a computing agent. The default implementation would directly proof all ENQUIRY. The protocol provides this as a interface, so real cases must implement it as a module and plugin into catalog. Catalog would perform a general proof checking before running query on resource server, and each individual resource server may also apply a more detailed executing proof checking.

more details about SANDBOX will be discussed in Sandbox proof session.

3. CATALOG ASSIGN

catalog would start to ASSIGN the authorization of ENQUIRY to resource server after passing SANDBOX proof checking. When SANDBOX proofed in catalog, it has pass the general checking on ENQUIRY, so catalog will tell resource server it passed and should do for executing sandbox checking. catalog would tell resource server a ASSIGN token so that resource server would check later in real execution.

  • enquiry_callback: it is a URL that resource server would need to reach back to client. It may be same to callback in registration. It is mainly providing for flexibility.

  • resource_access_token: catalog needs it to authenticate into resource server for this assign action.

  • catalog_enquiry_token: it would assign to resource server to proof a enquiry from client is proofed by catalog. it will also pass back to client after proofing from catalog and resource server. catalog would refer it for session both in client and resource server.

  • catalog_enquiry_validation: it is a validation decided by catalog. It could be modified by resource server after sandbox checking. It would also define some params like when the token would be expired.

  • query: it is the query code from client, and pass from catalog to resource server for checking.

For example:

HTTP https://resource.com/enquiry?
    enquiry_status=catalog_assign&
    enquiry_callback=_URL_&
    catalog_enquiry_token=_TOKEN_&
    catalog_enquiry_validation=_CODE_&
    resource_access_token=_TOKEN_&
    query=_CODE_

4. RESOURCE SANDBOX EXECUTION CHECKING

resource server would have SANDBOX after receiving ASSIGN order from catalog. It would perform a testing of executing query on the resource server to test on load ability. This sandbox would be different in each individual real implementation. This SANDBOX could be action from a person or decision from a computing agent. The default implementation would directly proof all ENQUIRY. The protocol provides this as a interface, so real cases must implement it as a module and plugin into resource server.

more details about SANDBOX will be discussed in Sandbox Proof session.

5. RESOURCE READY

resource server is READY to catalog assignment of enquiry from client. After RESOURCE SANDBOX EXECUTION CHECKING, resource server will keep query from catalog so that resource server would run same query every time (client may provide different params in some cases). resource server will also keep catalog_enquiry_token to apply policy in real execution of query. resource server will save catalog_enquiry_token for later management work between catalog and itself. resource server would then generate a resource_enquiry_token to pass to catalog and then catalog would pass it to client. Client would use this resource_enquiry_token to access to resource server. resource server may also add additional limitation to validation code generate by catalog.

  • enquiry_callback: it is a URL that catalog would need to reach back to client. It may be same to callback in registration. It is mainly providing for flexibility.

  • catalog_access_token: it is a token for resource to authenticate into catalog to do action work.

  • catalog_enquiry_token: it is a token for catalog to refer which enquiry dealing to.

  • resource_enquiry_token: it is a token for catalog passing to client, so that client can execute query on resource with the token.

  • resource_invoke_url: the actual access path for this enquiry invoking. It is design to dynamic, because resource stored on mobile devices may have dynamic links.

  • resource_enquiry_validation: resource server may apply additional limitation to catalog's validation policy.

For example:

HTTP https://catalog.com/enquiry?
    enquiry_status=resource_read&
    enquiry_callback=_URL_&
    resource_enquiry_token=_TOKEN_&
    resource_invoke_url=_URL_&
    resource_enquiry_validation=_CODE_&
    catalog_access_token=_TOKEN_&
    catalog_enquiry_token=_TOKEN_

6. CATALOG PROOF

catalog would send PROOF of enquiry to client. Once resource server are ready for query execution. catalog will combine its validation policy with validation policy from resource server into a whole validation to client. catalog will also pass the catalog_enquiry_validation for client to manage the communication between this specific enquiry. catalog will also pass the resource_enquiry_token to client so that client would access resource server to request run query which kept on resource server already.

  • enquiry_callback: it is a URL that client would need to reach back to catalog. It may be same to callback in registration. It is mainly providing for flexibility.

  • client_access_token: it is a token for catalog authenticate into client.

  • client_enquiry_token: it is a token for client to identify which query is dealing to.

  • catalog_enquiry_token: it is a token for catalog to identify which enquiry is dealing to. It may different to catalog_enquiry_token pass from catalog to resource server.

  • resource_enquiry_token: it is a token for client access to resource.

  • resource_invoke_url: it is a URL client should know where resource server is and how to reach it.

  • validation: it is a limitation apply for enquiry of executing query on resource server. It could be how often, etc. More detail about permission will be discussed on Permission session.

For example:

HTTP https://client.com/enquiry?
    enquiry_status=catalog_proof&
    enquiry_callback=_URL_&
    resource_enquiry_token=_TOKEN_&
    resource_invoke_url=_URL_&
    validation=_CODE_&
    catalog_enquiry_token=_TOKEN_&
    client_access_token=_TOKEN_&
    client_enquiry_token=_TOKEN_

7. CLIENT INVOKE

client invoke to run QUERY on resource server. The query has been kept on resource server after proofing from both catalog and resource server, client can not change it with resource_enquiry_token getting from last step. Client may pass different variable in for query to dynamic the flexibility. client can use resource_enquiry_token in multiple times under limitation.

  • enquiry_callback: it is a URL that resource would need to reach back to client with result.

  • resource_enquiry_token: it is a token client can authorise for this enquiry on resource server.

  • query_params: additional params needs to pass into run query dynamically.

  • client_enquiry_params: resource server may needs time to run query, and HTTP connection may be timeout. this callback may provide to resource server to send results back to client. client should have this params dynamic so that it will be able to identify which enquiry session is dealing to as HTTP is stateless.

For example:

HTTP https://resource.com/enquiry?
    enquiry_status=client_invoke&
    enquiry_callback=_URL_&
    resource_enquiry_token=_TOKEN_&
    client_enquiry_params=_CODE_&
    query_params=_CODE_

8. RESOURCE FEEDBACK

resource server send RESULT after query execution back to client.

  • enquiry_callback: it tells client how to each back to resource.

  • client_enquiry_params: it tells client which invoke is.

  • query_result: execution results

For example:

POST https://client.com/enquiry?
    enquiry_status=resource_feedback&
    enquiry_callback=_URL_&
    client_enquiry_params=_CODE_&
    query_result=CODE

Back to the Top of Page


# Service Management

Management happens between client and catalog, or between catalog and resource server. For example, client can manage a specific enquiry, catalog can manage the bridge between client and resource server, resource server can manage the access management form catalog, etc. Management is for both side to profile each other or update condition.

Here demonstrate a example that catalog would like to profile access statement on resource server.

manage_status: it helps to identify which status it is now.

  1. question
  2. answer

manage_action*: it defines what action can be performed for management. Anyone can define the action if it holds the action that other people want to have. Here defines some common actions, but as said action lists will be only determined in real implementation and demands.

  • registration_action:

  • enquiry_action:

  • profile_action:

......

1. QUESTION

catalog would like to ask question like how often a client would really access to resource server, and how many data has been query out to client.

  • resource_access_token: it is a token for catalog to authenticate on resource server.

  • manage_token: it is a token for resource to identify which enquiry of client is dealing to.

  • manage_action: it is a code to tell resource server what catalog would like to know.

  • manage_action_params: Additional params needs to pass, for example, if catalog want to manage for a specific enquiry, it could pass enquiry_token as a param.

For example:

HTTP https://resource.com/manage?
    manage_status=question&
    manage_callback=_URL_&
    manage_action=_ACTION_&
    manage_action_params=_CODE_&
    manage_token=_TOKEN_&
    access_token=_TOKEN_

2. ANSWER

resource sever ANSWER to QUESTION from catalog.

  • manage_token: a token for catalog to know which manage action it is dealing to.

  • catalog_access_Token: a token for resource server to authenticate on catalog.

  • manage_result: answer result.

For example:

HTTP https://catalog.com/action?
    manage_status=answer&
    manage_callback=_URL_&
    manage_token=_TOKEN_&
    manage_result=_CODE_&
    access_token=_TOKEN_

Back to the Top of Page


# Permission

It is better to provide the definition on permission that can be auth in dataware catalog protocol. However, each individual roles may have different definition on permission and each real implementation is much dependent on real cases. Here would only introduce some basic common way to define permission.

Permission can be classify on action permission and content permission, regarding to protected resource. action permission refers to what requester can perform, it could be read action, write action, delete action, etc. content permission refers to what protected content can deal to, for example friend list, user blog, social status update, tweet, etc. Each real resource may have different type of content, so it should be really required each resource server to define what content they are holding, and public it to seeker.

Here illustrates how Facebook and Twitter defines the permission of their user resources.

  1. Resource on twitter is quite straight-ford, it is about tweets, so it is mainly focus on action permission (see more on The Application Permission Model in Twitter official developer page). Here is a screenshort of what you can edit when register a App on Twitter: Twitter permission model

  2. Facebook holds more types of user contents. Therefore, Facebook defines fine-grant content type and required developers to specific achieve it. Each permission should be defined before presenting to final user for oauth authentication. (See more on Permission reference in Facebook official developer page). Here is also a screenshot. Facebook permission model

Besides content, permission can also be applied to HTTP connection itself. For example:

  1. How long a token can be validate for?

  2. How often a token can allow a HTTP request for?

  3. How many time a token can be used for? Both Twitter and Facebook have limitation for a validate auth token. It is about 300 times per hours for all token to a register application. Dataware catalog protocol leaves the flexibility on implementation, that it can limit on app level or on app but user level.

  4. Whether a authentication only allow a validate token come from a certain specified domain?

  5. Whether a token can be refresh before/after expiration?

Therefore, each service provider who will grant permission to others should have a complete permission list to public. If a request does not specify in request, it means to set to the default option.

Permission requested may be different from permission granted. Permission requested is that entity request permission from some other entities, it is the permission the entity desired. Entity will be grant permission based on permission requested and feedback to requester to accept.

Permission example.

{"action":"read, write", "content":"blog, friend_list", "expire_date":1000s, "access_frequency_max":10cps, "access_count_max":1000, "can_refresh":"no_allow"}

Back to the Top of Page


# Sandbox proof

Sandbox is design to proof enquiry from client to execute query and feedback result to client. Catalog will performs the agent to help proof the general check on parse, and then resource server would apply additional checking on executing. The protocol only design to have this function, but leave the actual implementation to coder, so it is basic a interface. The default behaviour of proof checking is to directly pass of enquiry. A simple implementation is to present all enquiry to owner behind and allow owner to simply click to proof or not. Horizon team developed advanced and complexed virtual machine to perform sandbox checking.

Virtual machine sandbox description: TODO

Back to the Top of Page


Token clean

Large amount of token will be generated to use in the protocol. So it should send up mechanism like Garbage Collection in JAVA to clean the unused token and expired token. The actual implementation should set up the automatic CRON method to clean the token, so that database performance can be guarantee.

Back to the Top of Page


Can catalog owner same to resource owner?

This is discussed under the cases that both catalog owner and resource owner are referring to people. The protocol can be easily applying to implementation that catalog owner and resource owner are computing agent. The principle would still be same.

The most common cases will have catalog owner and resource owner be the same person. It will be that a person want to register his resource on his catalog. A person may have multiple different resource servers, only a catalog, he wants to link all his resource and manage the access control in a central place which is catalog. In this kind of cases, catalog owner and resource owner are the same person.

In some other cases, catalog owner and resource owner would be different person. For example:

  1. I am working in university, some of my resource are linking to university work, for example my publication. A administer in my university can help to manage my publication information. In this case, I would like to share the read management of my personal publication information with university administrator. My personal publication information would be resource server and I am the resource owner, university publication portal is the catalog that would share to manage the access control and university administer is the catalog owner. They are different persons.

  2. I am working in university, I would like to integrate the university calendar with my personal calendar. My overall calendar is the catalog that want to manage all my calendars and I am the catalog owner. One of my calendar is my university calendar and it is shared by my university administer as she is responsible to arrange affairs for my work, she is the owner of my working calendar resource server. You can also see that she and me are different persons.

The protocol leave the flexibility that both entities can be same or different, so it would cover most cases for application.

Back to the Top of Page


Access media

People behind the entity would involve in the protocol. So the media would affect how protocol would implement to direct to owner. Owner would be redirect on web browser, E-mail, mobile app, etc. The protocol design to make all cases same as acces in different media.

When owner be redirected, no matter which media it is using, a url of entity with a unique session token would be presented. This unique redirect session token is mapping to a hashed information in the entity. To grant the permission, a user would be redirected to login in HTTP, so that a user token would be link to this redirect session token. The entity server behind would then processe redirect session token and user token to finish the rest steps. Because all token are rehashed based on internal database on server, so redirect token can be visible to others. It would stop the risk to expose sensitive information and prevent mistake behaviours.

If catalog owner and resource owner are the same person, web browser would be the best choice to use, as same person can login to catalog and resource server to finish the registration redirect, it would save time. If they are different person, email or mobile app, which media are not in same physical location, can be suggested, so that owner can be redirected on different place to finish the step, it would delay time in user redirect and user grant, but keep security.

Access media

Back to the Top of Page


Why MUTUAL registration between catalog and resource server?

We demonstrate two cases that either catalog or resource server should start the registration process:

  1. resource server should go head: A normal person should have multiple resources and one catalog. It is easy for each resource server to start the registration so that this user would repeat the registration process, as catalog is same for each registration.

  2. catalog should go head: A normal person have a same resource server be managed by multiple entities. My working related resource is belonging to me, but also can be managed by employer company. Therefore, multiple catalog can refer to same resource. It is easier to repeat the registration of catalog on resource, as resource is same for each registration.

In some other cases, resource server would be not permanently available, for example, mobile devices or hidden resource. Resource server may only be accessed for registration in certain period. In this case, it is easier for resource to start the registration as catalog can not alway know the availability of resource for registration.

Back to the Top of Page


Man-in-the-middle attack

Each server would have a token by itself and a token generated from other server to keep for communication. For decision, a user would involve to generate a user token after login to grant permission. Meantime, callback url would be help to identify which request is sent from, so it would filter out if bad request attack. A rating system may use reputation to help to detect attack.

Back to the Top of Page


Error list

  1. 400: %s(URL param) is not found in http request
  2. 400: %s(URL param) is found but incorrect in http request

TODO

Back to the Top of Page


Clone this wiki locally