Skip to content

Commit

Permalink
documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
rsoika committed Oct 26, 2022
1 parent 427f018 commit 957bd00
Show file tree
Hide file tree
Showing 5 changed files with 179 additions and 126 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ The *Imixs-Archive-Service* provides an independent microservice archiving all d

*Imixs-Archive-Importer* provides a generic import service to be used to import documents form various external sources like a FTP server or a IMAP account.

[Imixs-Archive-OCR](https://github.com/imixs/imixs-archive/tree/master/imixs-archive-importer)
[Imixs-Archive-Importer](https://github.com/imixs/imixs-archive/tree/master/imixs-archive-importer)

### Imixs Signature

Expand Down
22 changes: 22 additions & 0 deletions imixs-archive-importer/CSV_IMPORTER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# The CSV Importer

The CDI bean *CSVImporterService* is a generic implementation to import documents form a CSV File located on a FTP server. The implementation access the FTP server via FSTP on the default port 990.

The selector attribute is used to define the path to the target file with the file extension '.csv'. For each entry in the CSV file a new document instance will be created. Currently no workflow processing is supported.

The *CSVImporterService* verifies the content of a CSV file by a MDA checksum. Only if the MDA checksum has changed the import process will be started. The MDA will be printed into the log and can be verified by a administrator.

## The CSV Options

The following option entries are mandatory:

- **type** - defines the document type of the stored entries
- **key** - defines the key item to be used to select the entity by a unique key.

The options 'type' and 'key' are mandatory.

**Note:** The *CSVImporterService* does currently not support combined keys. It must be ensured that the CSV files contains a column with unique keys.




8 changes: 8 additions & 0 deletions imixs-archive-importer/FTP_IMPORTER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# The FTP Importer

The CDI bean *FTPImporterService* is a concrete implementation to import documents form a FTP server. The implementation access the FTP server via FSTP on the default port 990.

The selector attribute is used to define an optional directory. If no selector is specified the documents are read from the / root directory.

**Note:** The *FTPImporterService* expects a secure FTPS connection.
FTPS is FTP using the [SSL/TLS protocol](https://en.wikipedia.org/wiki/Secure_Sockets_Layer) for encryption. FTPS adds support for the Transport Layer Security (TLS) and the Secure Sockets Layer (SSL) cryptographic protocols. This is different from from the SCP/SFTP which is not supported. If your FTP data source does not support FTPS the *FTPImporterService* can not be used.
95 changes: 95 additions & 0 deletions imixs-archive-importer/MAIL_IMPORTER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# The Mail IMAP Importer

The CDI Bean *IMAPIImporterService* is a concrete implementation to import documents form a IMAP server. This module provides also services to convert Java Mail Message objects into HTML or PDF.

## The Import Folder

With the 'Selector' attribute you can define the IMAP folder from which the Mail Importer imports mails. If no selector is set the default IMAP Folder 'INBOX' is scanned. This is typically the default folder for most IMAP servers.

## The Archive Folder

After the *IMAPIImporterService* has imported a single message successfully the message will be moved into a IMAP archive folder named 'imixs-archive'. This folder will be automatically created if it does not yet exist on the IMAP server.
You can change the name of the archive folder by setting the option 'archive.folder'

archive.folder=Invoice-Archive


## Filtering Mails by subject using regular expressions

It is possible to filter emails by a subject. Therefor a regular expression can be added by the option property named "subejct.regex" - e.g.:

filter.subject=(order|offer|invoice*)

In this example only messages with the text 'order', 'offer' or starting with 'invoice' will be imported.

## Detachment Mode

The *IMAPIImporterService* provides the ability to detach files from an mail message object. The behaviour can be configured by the option property named "detach.mode". The following detach modes are supported:

- PDF - only PDF files attached will be detached together with the origin .eml file to the workitem. This is the default mode.
- ALL - all files will be detached and the email content will be attached as a HTML file to the workitem.
- NONE - no files will be detached and the origin email object will be attached as a .eml file to the workitem.

To set the detachment mode add the option to the IMAP source:

detach.mode=ALL

## Preserve Origin Message

In case of detach.mode=ALL, the option 'preserve.origin' defines if the origin email will be attached.

preserve.origin=false

If the option is set to false, the .eml file will not be attached. The default value is 'true'.

**Note:** In case of the detach.mode = 'PDF' or 'NONE' the origin mail file will always be attached.




## Custom Mail Options

The *IMAPIImporterService* connects to an mail server via IMAPS per default. IMAPS (IMAP over SSL/TLS) is assigned the port number 993. You can overwrite the default protocol by the property "mail.store.protocol" via custom mail options in the document source options. In this way it is also possible to set additional custom mail options concerning various aspects of your mail/imap server. See the following example, setting some extra java mail options:

mail.store.protocol=imap
mail.imap.ssl.enable=true
mail.imap.port=587
mail.imap.ssl.trust=*

You can find a list of all IMAP settings [here](https://www.tutorialspoint.com/javamail_api/javamail_api_imap_servers.htm)

## Authenticators

Connecting to a IMAP Message Store the IMAP Importer uses IMAPAuthentictor classes to access a mailbox using a specific authentication method.
The IMAPAuhtenticator can be defined by the optional Option `imap.authenticator`. If no authenticator is defined, the Service will default o the `org.imixs.archive.importer.mail.IMAPBasicAuthenticator` implementation which uses a BASIC authentication method.

### IMAP Outlook Authenticator

For Outlook365 the IMAP Authenticator `org.imixs.archive.importer.mail.IMAPOutlookAuthenticator` can be used to authenticate with the OAuth2 method. The Authenticator expects the following additional Mail Options:

* microsoft.tenantid - specifies the Microsoft Azure Tenant ID
* microsoft.clientid - specifies the Microsoft Client ID (the Client Principal accessing the Mail box)

**Note:** In case of using the IMAPOutlookAuthenticator the server name is ignored. The Server names to generate a OAuth Token and to open the Message Store are resolved internally by the Authenticator implementation.



## Gotenberg HTML PDF Converter

[Gotenberg](https://thecodingmachine.github.io/gotenberg/) is a Docker-powered stateless API for converting HTML, Markdown and Office documents to PDF. This service can be used to convert a Mail into a PDF document. This option applies only to the option 'detach.mode=ALL'.

To activate this feature add the following options to the import source:

detach.mode=ALL
gotenberg.service=http://gotenberg:3000

A Gotenberg service can be started as a Docker Container:

gotenberg:
image: thecodingmachine/gotenberg:6
ports:
- "3000:3000"




178 changes: 53 additions & 125 deletions imixs-archive-importer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,132 +55,60 @@ The CDI Bean *DocumentImportController* is used to display and select data sourc



# FTP Importer
# Importer Services

The CDI bean *FTPImporterService* is a concrete implementation to import documents form a FTP server. The implementation access the FTP server via FSTP on the default port 990.
The Imixs-Archive-Importer can be extended by various Import Services. This is a flexible mechanism based on CDI events to adapt the import of Workitems form different sources.

The selector attribute is used to define an optional directory. If no selector is specified the documents are read from the / root directory.
The following Standard Importer Services are available in the current implementation:

**Note:** The *FTPImporterService* expects a secure FTPS connection.
FTPS is FTP using the [SSL/TLS protocol](https://en.wikipedia.org/wiki/Secure_Sockets_Layer) for encryption. FTPS adds support for the Transport Layer Security (TLS) and the Secure Sockets Layer (SSL) cryptographic protocols. This is different from from the SCP/SFTP which is not supported. If your FTP data source does not support FTPS the *FTPImporterService* can not be used.

# Mail IMAP Importer

The CDI Bean *IMAPIImporterService* is a concrete implementation to import documents form a IMAP server. This module provides also services to convert Java Mail Message objects into HTML or PDF.

## The Import Folder

With the 'Selector' attribute you can define the IMAP folder from which the Mail Importer imports mails. If no selector is set the default IMAP Folder 'INBOX' is scanned. This is typically the default folder for most IMAP servers.

## The Archive Folder

After the *IMAPIImporterService* has imported a single message successfully the message will be moved into a IMAP archive folder named 'imixs-archive'. This folder will be automatically created if it does not yet exist on the IMAP server.
You can change the name of the archive folder by setting the option 'archive.folder'

archive.folder=Invoice-Archive


## Filtering Mails by subject using regular expressions

It is possible to filter emails by a subject. Therefor a regular expression can be added by the option property named "subejct.regex" - e.g.:

filter.subject=(order|offer|invoice*)

In this example only messages with the text 'order', 'offer' or starting with 'invoice' will be imported.

## Detachment Mode

The *IMAPIImporterService* provides the ability to detach files from an mail message object. The behaviour can be configured by the option property named "detach.mode". The following detach modes are supported:

- PDF - only PDF files attached will be detached together with the origin .eml file to the workitem. This is the default mode.
- ALL - all files will be detached and the email content will be attached as a HTML file to the workitem.
- NONE - no files will be detached and the origin email object will be attached as a .eml file to the workitem.

To set the detachment mode add the option to the IMAP source:

detach.mode=ALL

## Preserve Origin Message

In case of detach.mode=ALL, the option 'preserve.origin' defines if the origin email will be attached.

preserve.origin=false

If the option is set to false, the .eml file will not be attached. The default value is 'true'.

**Note:** In case of the detach.mode = 'PDF' or 'NONE' the origin mail file will always be attached.




## Custom Mail Options

The *IMAPIImporterService* connects to an mail server via IMAPS per default. IMAPS (IMAP over SSL/TLS) is assigned the port number 993. You can overwrite the default protocol by the property "mail.store.protocol" via custom mail options in the document source options. In this way it is also possible to set additional custom mail options concerning various aspects of your mail/imap server. See the following example, setting some extra java mail options:

mail.store.protocol=imap
mail.imap.ssl.enable=true
mail.imap.port=587
mail.imap.ssl.trust=*

You can find a list of all IMAP settings [here](https://www.tutorialspoint.com/javamail_api/javamail_api_imap_servers.htm)

## Authenticators

Connecting to a IMAP Message Store the IMAP Importer uses IMAPAuthentictor classes to access a mailbox using a specific authentication method.
The IMAPAuhtenticator can be defined by the optional Option `imap.authenticator`. If no authenticator is defined, the Service will default o the `org.imixs.archive.importer.mail.IMAPBasicAuthenticator` implementation which uses a BASIC authentication method.

### IMAP Outlook Authenticator

For Outlook365 the IMAP Authenticator `org.imixs.archive.importer.mail.IMAPOutlookAuthenticator` can be used to authenticate with the OAuth2 method. The Authenticator expects the following additional Mail Options:

* microsoft.tenantid - specifies the Microsoft Azure Tenant ID
* microsoft.clientid - specifies the Microsoft Client ID (the Client Principal accessing the Mail box)

**Note:** In case of using the IMAPOutlookAuthenticator the server name is ignored. The Server names to generate a OAuth Token and to open the Message Store are resolved internally by the Authenticator implementation.



## Gotenberg HTML PDF Converter

[Gotenberg](https://thecodingmachine.github.io/gotenberg/) is a Docker-powered stateless API for converting HTML, Markdown and Office documents to PDF. This service can be used to convert a Mail into a PDF document. This option applies only to the option 'detach.mode=ALL'.

To activate this feature add the following options to the import source:

detach.mode=ALL
gotenberg.service=http://gotenberg:3000

A Gotenberg service can be started as a Docker Container:

gotenberg:
image: thecodingmachine/gotenberg:6
ports:
- "3000:3000"







# CSV Importer

The CDI bean *CSVImporterService* is a generic implementation to import documents form a CSV File located on a FTP server. The implementation access the FTP server via FSTP on the default port 990.

The selector attribute is used to define the path to the target file with the file extension '.csv'. For each entry in the CSV file a new document instance will be created. Currently no workflow processing is supported.

The *CSVImporterService* verifies the content of a CSV file by a MDA checksum. Only if the MDA checksum has changed the import process will be started. The MDA will be printed into the log and can be verified by a administrator.

## The CSV Options

The following option entries are mandatory:

- **type** - defines the document type of the stored entries
- **key** - defines the key item to be used to select the entity by a unique key.
- [Mail IMAP Importer](MAIL_IMPORTER.md) - imports documents from a Mail Box using IMAP
- [FTP Importer](FTP_IMPORTER.md) - imports documents form a FTP server
- [CSV Importer](CSV_IMPORTER.md) - imports documents form a CSV file located on a FTP server

The options 'type' and 'key' are mandatory.

**Note:** The *CSVImporterService* does currently not support combined keys. It must be ensured that the CSV files contains a column with unique keys.




## Implementation of a Custom Import Service

An Importer Service Implementation is a CDI bean that at least implements an observer method for the CDI event `DocumentImportEvent`. This event provides the configuration details of the data source. The method should set the DocumentImportEvent.PROCESSING_COMPLETED in case the import was completed successful.

```java
@Named
public class CSVImportService {
@Inject
UpdateService indexUpdateService;

@Inject
DocumentImportService documentImportService;

/**
* This method reacts on a CDI ImportEvent and imports the data from a specific source
*/
public void onEvent(@Observes DocumentImportEvent event) {
..........
try {
String ftpServer = event.getSource().getItemValueString(DocumentImportService.SOURCE_ITEM_SERVER);
String ftpPort = event.getSource().getItemValueString(DocumentImportService.SOURCE_ITEM_PORT);
String ftpUser = event.getSource().getItemValueString(DocumentImportService.SOURCE_ITEM_USER);
String ftpPassword = event.getSource().getItemValueString(DocumentImportService.SOURCE_ITEM_PASSWORD);
String csvSelector = event.getSource().getItemValueString(DocumentImportService.SOURCE_ITEM_SELECTOR);
Properties sourceOptions = documentImportService.getOptionsProperties(event.getSource());
....
..............
.................

} catch (IOException e) {
logger.severe("I/O Error: " + e.getMessage());
documentImportService.logMessage("...import failed (replyCode=" + r + ") : " + e.getMessage(),
event);
event.setResult(DocumentImportEvent.PROCESSING_ERROR);
return;
} finally {
}

// flush index...
indexUpdateService.updateIndex();
// completed
event.setResult(DocumentImportEvent.PROCESSING_COMPLETED);

}

}
```

0 comments on commit 957bd00

Please sign in to comment.