From ccb88c6830acfd812cacfaae0d9f2fef5ed66540 Mon Sep 17 00:00:00 2001 From: Hossein Moein Date: Mon, 1 Jan 2024 12:04:29 -0500 Subject: [PATCH 1/5] Streamlined paralle computing logic in code --- docs/HTML/DataFrame.html | 5 +- examples/hello_world.cc | 2 +- include/DataFrame/Internals/DataFrame.tcc | 124 +++++++++++------- include/DataFrame/Internals/DataFrame_get.tcc | 85 +++++++++--- 4 files changed, 147 insertions(+), 69 deletions(-) diff --git a/docs/HTML/DataFrame.html b/docs/HTML/DataFrame.html index f788782f..b0546aef 100644 --- a/docs/HTML/DataFrame.html +++ b/docs/HTML/DataFrame.html @@ -1440,7 +1440,7 @@

API Reference with code samples



Multithreading

- In general, multithreading could be very tricky. A lot of times you think by using multithreading you enhance the performance of your program. But in fact, you are hindering it. It requires measuring and careful adjustments. It is recommended to start with a single-threaded version and when that is working correctly, take measurements and adjust to move to multithreading version.
+ In general, multithreading could be very unintuitive. Often you think by using multithreading you enhance the performance of your program. But in fact, you are hindering it. It requires measuring and careful adjustments. It is recommended to start with a single-threaded version and when that is working correctly, take measurements and adjust to move to multithreading version.
DataFrame uses multithreading extensively and provides granular tools to adjust your environment. Let’s divide the multithreading subject in DataFrame into two categories:

1. User Multithreading

@@ -1450,7 +1450,8 @@

1. User Multithreading

  • In addition, instances of DataFrame are not multithreaded safe either. In other words, a single instance of DataFrame must not be used in multiple threads without protection, unless it is used as read-only.
  • 2. DataFrame Internal Multithreading

    - Whether or not you, as the user, use multithreading, DataFrame utilizes a versatile thread-pool to employ parallel computing extensively in almost all its functionalities. By default, there is no multithreading. All algorithms execute their single-threaded version. To enable multithreading, call either ThreadGranularity::set_optimum_thread_level() (recommended) or ThreadGranularity::set_thread_level(n). When Multithreading is enabled, most parallel algorithms trigger when number of data points exceeds 250k and number of threads exceeds 2.
    + Whether or not you, as the user, use multithreading, DataFrame utilizes a versatile thread-pool to employ parallel computing extensively in almost all its API's. By default, there is no multithreading. All algorithms execute their single-threaded version. To enable multithreading, call either ThreadGranularity::set_optimum_thread_level() (recommended) or ThreadGranularity::set_thread_level(n).
    + When Multithreading is enabled, most parallel algorithms trigger when number of data points exceeds 250k and number of threads exceeds 2. Therefore, if your process deals with datasets smaller than this, it doesn't make sense to populate the thread-pool with threads as they will be waste of resources.
    You do not need to worry about synchronization for DataFrame internal multithreading. It is done behind the scenes and unbeknown to you.