websocket-tech-brief.txt

============================================================
First created on: February/22/2020
Last modified on: October/20/2023

Purpose of this toolkit
-----------------------
The streamsx.websocket toolkit provides the following C++ and Java
operators that can help you to receive text or binary data
from the remote client and server-based applications via 
WebSocket and HTTP (OR) send text or binary data from your 
IBM Streams applications to external client and server-based 
applications via WebSocket and HTTP.

1) WebSocketSource (server-based)
2) WebSocketSendReceive (client-based)
3) WebSocketSink (server-based)
4) HttpPost (client-based) [To send/post text and binary data via HTTP(S)]

WebSocketSource by default is a source operator that can be used 
to receive text or binary data from multiple client applications.
This operator supports message reception via both WebSocket and HTTP 
on plain as well as secure TLS endpoints. This source operator can 
optionally be turned into an analytic operator to process/analyze
the text or binary data received from the remote clients and then
send back a text or binary response back to that same WebSocket or
HTTP client. The WebSocket server running in this operator 
supports having multiple URL context paths for a given endpoint 
listening on a particular port. By using this feature, remote clients 
can connect to different URL context paths thereby letting the 
IBM Streams applications to tailor the data processing logic
based on which group(s) of remote clients sent the data. It supports 
both persistent (Keep-Alive) and non-persistent HTTP client connections. 
Persistent HTTP connection feature is beneficial for the non-browser based 
client applications that need to send hundreds of data items continuously 
in a tight loop or to send periodic heavy bursts of data items. 
This feature will keep the client's HTTP connection alive as long as 
there is data arriving from the client continuously. When HTTP support is 
enabled, it can serve the web browser based client applications for 
fetching html, css, js, image files etc. as well as REST-based client 
applications. Thus, users will get a five-in-one benefit (WebSocket, HTTP, 
plain, secure and response-ready) from this operator. It can be configured 
to start a plain or secure WebSocket or HTTP endpoint for the remote 
clients to connect and start sending and (optionally) receiving data. 
This is Receive-only with an option to make it a Receive-and-Send operator. 
This operator promotes the Many-To-One data access pattern.

WebSocketSendReceive is an analytic operator that can be used to
initiate a connection to an external WebSocket server-based
application in order to send and receive text or binary data via that 
connection. This is Send-and-Receive to/from a single server-based
remote WebSocket application. This operator promotes the One-To-One
data access pattern.

WebSocketSink is a sink operator that can be used to send (broadcast)
text or binary data to multiple clients. It can be configured to start
a plain or secure WebSocket endpoint for the remote clients to connect 
in order to start receiving data from this operator. The WebSocket
server running in this operator supports having multiple URL context 
paths for a given endpoint listening on a particular port. By using 
this feature, remote clients can connect to different URL context 
paths thereby letting the IBM Streams application logic to tailor 
which data items get sent to which group(s) of the remote clients.
This is Send-only to multiple clients. This operator promotes the 
One-To-Many data access pattern.

HttpPost is a utility operator provided by this toolkit to test the 
optional HTTP(S) text or binary data reception feature available in 
the WebSocketSource operator. This utility operator can send text or 
binary data and receive text or binary data in response from the remote server.
This operator allows clients to send data via HTTP GET, PUT and POST.
If other application scenarios see a fit for this utility operator, 
they can also use it as needed. If you clone this toolkit from the 
IBMStreams GitHub, then you must build this utility operator by running 
"ant clean" and "ant all" from the com.ibm.streamsx.websocket directory. 

In a Streams application, these operators can either be
used together or independent of each other. 

Technical positioning of this toolkit
-------------------------------------
WebSocket, a computer communication protocol has been in commercial use since
2012 after it became an official IETF standard. It enables two-way (full duplex)
communication between a client and a remote server over TCP with low overhead when 
compared to the other Web protocols such as HTTP or HTTPS. Another superb
benefit of WebSocket is that it can be overlaid on top of HTTP or HTTPS by making
the initial connection using HTTP or HTTPS and then upgrading that connection
to a full duplex TCP connection to the standard port 80 or port 443 thereby
being able to flow through the firewall.

At a very high level, this toolkit shares the same design goal as two other 
operators available in a different IBM Streams toolkit named 
com.ibm.streamsx.inetserver which provides two similar operators written 
in Java using a built-in Jetty web server. The com.ibm.streamsx.websocket 
provided operators are written using C++ by employing the most highly 
regarded C++ Boost library, which is expected to use less CPU, memory and
provide a better overall throughput with a good cost performance advantage.

The three data access patterns highlighted above as promoted by the operators
in this toolkit can be explained via the following real-life analogies.

Many to One: On a happy occasion like birthday, many family members and friends
send greeting messages to that one person who is enjoying the special day.

One to One: In a performance evaluation meeting, a manager and an employee
exchange information back and forth about the work accomplished. 

One to Many: In an annual meeting, a CEO gives data points about the company's
business performance to the curiously listening shareholders.
 
Requirements for this toolkit
-----------------------------
There are certain important requirements that need to be satisfied in order to 
use the IBM Streams websocket toolkit in Streams applications. 
Such requirements are explained below.

1. This toolkit uses WebSocket, HTTP to communicate with the remote client and/or
server applications. 

2. On the IBM Streams application development machine(s) (where the application code is 
compiled to create the application bundle), it is necessary to download and install 
the boost_1_73_0 as well as the websocketpp version 0.8.2. 
Please note that this is not needed on the Streams application execution machines.
For the essential steps to meet this requirement, please refer to the above-mentioned 
documentation URL or a section below in this file.

3. It is necessary to create a self-signed or Root CA signed TLS/SSL certificate 
in PEM format and point to that certificate file at the time of starting the 
IBM Streams application that invokes the WebSocketSource and WebSocketSink operators 
present in this toolkit. If you don't want to keep pointing to your TLS/SSL certificate 
file every time you start the IBM Streams application, you can also copy the full 
certificate file to your Streams application's etc directory as ws-server.pem 
and compile your application which will then be used by default. The other two
client-based operators WebSocketSendReceive and HttpPost can optionally be pointed to
their own TLS/SSL certificate at the time of getting invoked mainly for the purpose of
performing client (mutual) authentication. All the four operators in this toolkit can also
be pointed with the public certificate of their remote party in order to do trusted
server or client TLS certificate verification/authentication.

If you are comfortable with using a self-signed TLS/SSL certificate file in 
your environment, you can follow the instructions given in the following file 
that is shipped with this toolkit to create your own self-signed TLS/SSL certificate.

<YOUR_WEBSOCKET_TOOLKIT_HOME>/samples/WebSocketSourceTester/etc/creating-a-self-signed-certificate.txt

A copy of the text file mentioned above is available in the etc directory of every
example shipped with this toolkit. This file explains different procedures to create 
client/server-side private certificate as well as the public certificate to share
with the remote party that is getting connected to or connected from.

Major external dependency for this toolkit
------------------------------------------
Bulk of the WebSocket logic in this toolkit's operators relies on the
following open source C++ WebSocket header only library.
https://github.com/zaphoyd/websocketpp

Much of the WebSocket logic here also follows the
WebSocket client and server usage techniques explained in the
sample code snippets from the above mentioned websocketpp URL.

A great source of learning for WebSocket++ is here:
https://docs.websocketpp.org/index.html

This toolkit requires the following two open source packages that are
not shipped with this toolkit due to the open source code distribution
policies. Users of this toolkit must first understand the usage clauses
stipulated by these two packages and then bring these open source packages 
on their own inside of this toolkit as explained below. This needs to be
done only on the Linux machine(s) where the Streams application development
i.e. coding, compiling and SAB packaging is done. Only after doing that,
users can use this toolkit in their Streams applications.

1) boost_1_73_0
   --> Obtain the official boost version boost_1_73_0 from here:
           https://www.boost.org/users/history/version_1_73_0.html
   
   --> A few .so files from the boost_1_73_0/lib directory are
       copied into the lib directory of this toolkit.
       (It is needed for the dynamic loading of these .so files
        when the Streams application using this toolkit is launched.)
       
   --> The entire boost_1_73_0/include directory is copied into
       the include directory of this toolkit. [Around 200 MB in size]
       (It is needed for a successful compilation of the 
       Streams application that uses this toolkit. Please note that
       these include files will not bloat the size of that
       application's SAB file since the include directory will not be
       part of the SAB file.)
       
2) websocketpp v0.8.2
   --> The entire websocketpp directory is copied into
       the include directory of this toolkit. [Around 1.5 MB in size]
       (It is needed for a successful compilation of the 
       Streams application that uses this toolkit. Please note that
       these include files will not bloat the size of that
       application's SAB file  since the include directory will not be
       part of the SAB file.)

3) Open SSL libraries in Linux
On all your IBM Streams application machines, you have to ensure that the
openssl-devel-1.0.2k-12 and openssl-libs-1.0.2k-12 (or a higher version) are
installed. This can be verified via this command: rpm -qa | grep -i openssl

Downloading the dependencies and building the toolkit
-----------------------------------------------------
This toolkit is packaged with a comprehensive build.xml automation file 
that will help the users in downloading and building the toolkit in order to 
make it ready for use. Users will need network connectivity to the Internet 
from their Linux Streams application development machine(s) along with the 
open source ant tool. All that a user needs to do is to download and extract 
an official release version of this toolkit from the [IBMStreams GitHub]
(https://github.com/IBMStreams/streamsx.websocket/releases) and then run the 
following commands in sequence from the top-level directory 
(e-g: streamsx.websocket) of this toolkit.

ant clean-total           [Approximately 2 minutes]
ant all                   [Approximately 8 minutes]
ant download-clean        [Approximately 2 minutes]

If all those commands ran successfully, this toolkit is ready for use.

If there is no direct Internet access from the IBM Streams machine and if there is a need to go through a proxy server, then the `ant all` command may not work. In that case, please follow the specific instructions outlined in the following URL.

https://ibmstreams.github.io/streamsx.websocket/docs/user/overview/

A must do in the Streams applications that will use this toolkit
----------------------------------------------------------------
a) You must add this toolkit as a dependency in your application.
   --> In Streams Studio, you can add this toolkit location in the
       Streams Explorer view and then add this toolkit as a
       dependency inside your application project's Dependencies section.
       
   --> In a command line compile mode, simply add the -t option to
       point to this toolkit's top-level or its parent directory.

b) --> In Streams studio, you must double click on the BuildConfig of
       your application's main composite and then select "Other" in the
       dialog box that is opened. In the "Additional SPL compiler options", 
       you must add the following.
 
       --c++std=c++11
       
       Please note that there is nothing that needs to be entered in the other
       two C++ compiler and linker options in that same dialog box.

c) --> If you are building your application(s) from the command line, please refer to the
       Makefile provided in every example shipped with this toolkit. Before using 
       that Makefile, you must set the STREAMS_WEBSOCKET_TOOLKIT environment variable to 
       point to the full path of your streamsx.websocket/com.ibm.streamsx.websocket directory.
       To build your own applications, you can do the same as done in that Makefile.
       
Examples that showcase this toolkit's features
----------------------------------------------
This toolkit ships with the following examples that can be used as reference applications.
These examples showcase the full feature set of the WebSocketSource, WebSocketSendReceive,
WebSocketSink and the HttpPost operators that are available within this toolkit. 
Every example below will have the client and server-side application needed to test it.
All the examples went through extensive testing in the IBM Streams lab in New York and
they include excellent code documentation in the source files to guide the
application/solution development engineers. More details about these examples can be
obtained from the official documentation for this toolkit.

1) <YOUR_WEBSOCKET_TOOLKIT_HOME>/samples/WebSocketSourceTester
2) <YOUR_WEBSOCKET_TOOLKIT_HOME>/samples/WebSocketSourceWithResponseTester
3) <YOUR_WEBSOCKET_TOOLKIT_HOME>/samples/WebSocketSendReceiveTester
4) <YOUR_WEBSOCKET_TOOLKIT_HOME>/samples/WebSocketSinkTester
5) <YOUR_WEBSOCKET_TOOLKIT_HOME>/samples/HttpPostTester
6) <YOUR_WEBSOCKET_TOOLKIT_HOME>/samples/CustomVisualization

In the etc sub-directory of every example shown above, there is a shell script that 
can be used to run a given example with synthetic data and then verify the 
application behavior and the result. That shell script was originally written for 
running a given example in the IBM Streams lab in New York. It is easy to make 
minor changes in that shell script and use it in any other IBM Streams environment 
for running a given example.

There is also an example WebSocket C++ client application that can be run from a 
RHEL7 or CentOS7 machine to simulate high volume data traffic to be sent to the
WebSocketSourceTester application. That example client application is available in the 
streamsx.websocket/samples/WebSocketSourceTester/WSClientDataSimulator directory.

Importing this toolkit and its built-in examples into IBM Streams Studio
------------------------------------------------------------------------
1) Build the `streamsx.websocket` toolkit using the three ant commands mentioned in the previous section.

2) In Streams Studio, select `File->Import->IBM Streams Studio->SPL Project` and click `Next`.

3) In the resulting dialog box, click `Browse` and then select the `streamsx.websocket` directory from where you ran those ant commands.

4) Now, it should list the `com.ibm.streamsx.websocket` project which you must select and click `Finish`. It will take about 8 minutes to take a copy of that entire project into your Streams Studio workspace.

5) From your Streams Studio's Project Explorer view, right click on the `com.ibm.streamsx.websocket` project and select `Properties`. In the resulting dialog box, select `Java Build Path` from its left pane. In its right pane, select the `Libraries` tab and click `Add JARs` button. In the resulting `JAR Selection` dialog box, navigate to your `com.ibm.streamsx.websocket/opt/HTTPClient-x.y.z/lib` and select all the jar files present there and click `OK`. Now, click `Apply` and click `OK`.

6) At this time, the `com.ibm.streamsx.websocket` project in your Streams Studio workspace is ready to be added as a dependency in any of your own applications that want to use the WebSocket operators.


We will also show here the steps needed to import one of the built-in examples. You can use similar steps for other examples as well.

1) In Streams Studio, select `File->Import->IBM Streams Studio->SPL Project` and click `Next`.

2) In the resulting dialog box, click `Browse` and then select the `streamsx.websocket->samples` directory. (This is the location where you ran those ant commands earlier).

3) Now, it should list all the available examples from which you can select an example project that you want and click `Finish`.

4) You have to change the SPL build of the imported project from an External Builder to an Internal Builder. You can do that by right clicking on that imported project and then selecting `Configure SPL Build`. In the resulting dialog box, you can change the `Builder type` from an `External builder` to `Streams Studio internal builder`. Click `OK`.

5) Now, expand your imported top-level project, expand the namespace below it and then right-click the main composite name below it and select `New->Build Configuration`. (You have to do it for every main composite present in a given example.)

6) In the resulting dialog box, select Other and enter `--c++std=c++11` in the `Additional SPL compiler options` field. Click `OK`.

7) Your imported example project should build correctly now provided you meet all the other toolkit dependencies.

Official documentation for this toolkit
---------------------------------------
https://ibmstreams.github.io/streamsx.websocket/
============================================================