Skip to content

Strategies

Ralph Soika edited this page Jul 30, 2017 · 5 revisions

In the following section I want to talk about archive strategies from the point of view of this project. There are three general archive strategies possible which I will briefly explain. Not all of these strategies are implemented by the Imixs-Archive System. But all of these strategies can be implemented in cases they fit better in a specific enterprise BPM scenario.

Asynchronous Mode Pull

This approach follows the idea of an external archive client. The client in this scenario can be a java application running on the hadoop cluster or on a separate host. The client pulls all available workitems from the Imixs-Workflow system and stores them into the hadoop cluster. The advantage of this strategy is that the archive process is independent from the workflow system and can be placed on isolated hardware. The client can pull the data via the Imixs Rest API. A workitem will not be updated or manipulated in this strategy. The disadvantage here is, that the Imixs-Workflow engine is not aware if and when a workitem will be pulled to be archived. Though also a client can update a workitem via the Rest API and store information about the archive process, this would not be an ideal scenario as the synchronization between the business process and the archive process are decoupled. The Client Pull approach can be extended with a 'Client Pull/Push' approach to realize a restore scenario of archived workflow data.

Pros

  • The process can be embedded into the Hadoop system
  • The process can use the MapReduce concepts
  • The process scales
  • The Workflow System is not influenced by the archive process (performance, memory)
  • The existing security from Imxis-Workflow can be used by Hadoop to access Imixs-Workflow and not additional security must be implemented.

Cons

  • The data between hadoop and imixs-workflow is not to any time synchron

Asynchronous Mode Push

The 'Client Push' approach is similar to the client pull, but with an implementation running inside the workflow server. The process can be controlled by the workflow system in general, but equals to the 'Client Pull' also in this scenario the business process and the archive process are decoupled. In different to the 'Client Push' the client can directly access the Imixs-Workflow API and no rest API is needed. To access the external hadoop cluster the Hadoop Rest API can be used. The client Push strategy can easily be extended with a 'Client Push/Pull' implementation. This implementation can be used to restore archived workitems. Also for the restore process, the control is on the side of the workflow server.

Pros

Cons

  • The data between hadoop and imixs-workflow is not to any time synchron

Synchronous Mode Push

In the 'Workflow Push' strategy the archive process is directly coupled to the workflow process. This means that the archive process can be controlled by the workflow model. The implementation is realized by a Imixs-Plug-In which is directly controlled by the engine. The plug-in access the hadoop cluster via the Hadoop Rest API. In this scenario the plugin can store archive data, like the Checksum, immediately into the workitem. This is a tightly coupled archive strategy.

Pros

  • The archive process can be directly controlled by the workflow engine (via a plug-in)
  • The data between hadoop and imixs-workflow is synchron at any time
  • A workitem can store archive information in synchronous way (e.g. checksumm)

Cons

  • The process is time consuming and slows down the overall performance from the workflow engine
  • The process is memory consuming
  • The process have to be embedded into the running transaction which increases the complexity
  • Hadoop must be accessible via the internet and additional security must be implemented on both sides.
Clone this wiki locally