Strategies

In the following section I want to talk about archive strategies from the point of view of this project. There are three general archive strategies possible which I will briefly explain. Not all of these strategies are implemented by the Imixs-Archive System. But all of these strategies can be implemented in cases they fit better in a specific enterprise BPM scenario.

Asynchronous Mode Pull

This approach follows the idea of an external archive client. The client in this scenario can be a java application running on the hadoop cluster or on a separate host. The client pulls all available workitems from the Imixs-Workflow system and stores them into the hadoop cluster. The advantage of this strategy is that the archive process is independent from the workflow system and can be placed on isolated hardware. The client can pull the data via the Imixs Rest API. A workitem will not be updated or manipulated in this strategy. The disadvantage here is, that the Imixs-Workflow engine is not aware if and when a workitem will be pulled to be archived. Though also a client can update a workitem via the Rest API and store information about the archive process, this would not be an ideal scenario as the synchronization between the business process and the archive process are decoupled. The Client Pull approach can be extended with a 'Client Pull/Push' approach to realize a restore scenario of archived workflow data.

Pros

The process can be embedded into the Hadoop system
The process can use the MapReduce concepts
The process scales
The Workflow System is not influenced by the archive process (performance, memory)
The existing security from Imxis-Workflow can be used by Hadoop to access Imixs-Workflow and not additional security must be implemented.

Cons

The data between hadoop and imixs-workflow is not to any time synchron

Asynchronous Mode Push

The 'Client Push' approach is similar to the client pull, but with an implementation running inside the workflow server. The process can be controlled by the workflow system in general, but equals to the 'Client Pull' also in this scenario the business process and the archive process are decoupled. In different to the 'Client Push' the client can directly access the Imixs-Workflow API and no rest API is needed. To access the external hadoop cluster the Hadoop Rest API can be used. The client Push strategy can easily be extended with a 'Client Push/Pull' implementation. This implementation can be used to restore archived workitems. Also for the restore process, the control is on the side of the workflow server.

Pros

Cons

The data between hadoop and imixs-workflow is not to any time synchron

Synchronous Mode Push

In the 'Workflow Push' strategy the archive process is directly coupled to the workflow process. This means that the archive process can be controlled by the workflow model. The implementation is realized by a Imixs-Plug-In which is directly controlled by the engine. The plug-in access the hadoop cluster via the Hadoop Rest API. In this scenario the plugin can store archive data, like the Checksum, immediately into the workitem. This is a tightly coupled archive strategy.

Pros

The archive process can be directly controlled by the workflow engine (via a plug-in)
The data between hadoop and imixs-workflow is synchron at any time
A workitem can store archive information in synchronous way (e.g. checksumm)

Cons

The process is time consuming and slows down the overall performance from the workflow engine
The process is memory consuming
The process have to be embedded into the running transaction which increases the complexity
Hadoop must be accessible via the internet and additional security must be implemented on both sides.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strategies

Asynchronous Mode Pull

Pros

Cons

Asynchronous Mode Push

Pros

Cons

Synchronous Mode Push

Pros

Cons

Clone this wiki locally