Machine Learning on BigData -- distributed ML related computation bassed on real-time micro-batch streams as well as big-data.
It is a layer on top of Martin Karlsons framework that adds machine learning, data mining, AI and mathematical staticstic capabilites to it, an instance which runs with upgraded versions 3.3.1 of Spark and PySpark and 3.10.6 of Python. The instance runs on Ubuntu 22.04 insteead of Alpine, which consumes some more resources, but makes the life of intergrating new s/w a great deal less painfu. It is currently capalbe of running most machine learning python packages inculding the model calibration utiility GridSearchCV with the the help of joblibspark, which enablies sckit-learn classes such as GridSearchCV to run on executors on Sparc worker nodes.
Email : [email protected]
post-graduate student, Artifical Intelligence Applied to Medicine AIM
Department of Physiology and Pharmaocolgy FYFA, Karolinska Institutet and
Deparment of Computer and Systems Sciences, Stockholm University
To provide next generation of intensive care methods, methods in anasthetics applying machine learning and artificial intelligece to swiftly provide relief and optimmal tratment to intensive care, which mostly includes care for those suffering from multi-desise conditions, sepsis, multplie organ failures -- based on ASAP AI analytics provided by a framwork such as this one. The figure to the left is what we have today and the one on the right is where wee are heading.
To emulate the Hadoop/Spark/Kafka cluster haivng services run in docker on a virtual machine VM with its own ip address. This will make it easeir to add datanode and worker nodes locally or on physical machine by cloing a virtual machine node and deployu it there.
- this is ony tested on debian based Linux distributions, It should work on all Linuxes and I see no reson for it not to work on Windows.
- latest Docker and VirtualBox
- 32 GB RAM
- 8 core CPU
Execute bash master-build.sh
to start the the build and start the containers.
Access Hadoop UI on ' http://localhost:9870 '
Access Spark Master UI on ' http://localhost:8080 '
Access Jupyter UI on ' http://localhost:8888 '