Skip to content

Commit

Permalink
Merge pull request #277 from hosseinmoein/Hossein/ThreadPoolOpt
Browse files Browse the repository at this point in the history
Streamlined paralle computing logic in code
  • Loading branch information
hosseinmoein authored Jan 3, 2024
2 parents d95f3a8 + 46c4429 commit 68fd03a
Show file tree
Hide file tree
Showing 15 changed files with 598 additions and 421 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
This is a C++ analytical library designed for data analysis similar to libraries in Python and R. For example, you would compare this to Pandas or R data.frame.<BR>
You can slice the data in many different ways. You can join, merge, group-by the data. You can run various statistical, summarization, financial, and ML algorithms on the data. You can add your custom algorithms easily. You can multi-column sort, custom pick and delete the data. And more …<BR>
DataFrame also includes a large collection of analytical algorithms in form of visitors. These are from basic stats such as <I>Mean</I>, <I>Std Deviation</I>, <I>Return</I>, … to more involved analysis such as <I>Affinity Propagation</I>, <I>Polynomial Fit</I>, <I>Fast Fourier transform of arbitrary length</I> … including a good collection of trading indicators. You can also easily add your own algorithms.<BR>
DataFrame also employs extensive multithreading in almost all its API’s, for large datasets. That makes DataFrame especially suitable for analyzing large datasets.<BR>
For basic operations to start you off, see [Hello World](examples/hello_world.cc). For a complete list of features with code samples, see [documentation](https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html).

I have followed a few <B>principles in this library</B>:<BR>
Expand Down
5 changes: 3 additions & 2 deletions docs/HTML/DataFrame.html
Original file line number Diff line number Diff line change
Expand Up @@ -1440,7 +1440,7 @@ <H2><font color="blue">API Reference with code samples</font></H2>
<BR><HR COLOR="Orange" SIZE="5">

<H2><font color="blue">Multithreading</font></H2>
In general, multithreading could be very tricky. A lot of times you think by using multithreading you enhance the performance of your program. But in fact, you are hindering it. It requires measuring and careful adjustments. It is recommended to start with a single-threaded version and when that is <I>working correctly</I>, take measurements and adjust to move to multithreading version.<BR>
In general, multithreading could be very unintuitive. Often you think by using multithreading you enhance the performance of your program. But in fact, you are hindering it. To do effective multithreading, you must do two things repeatedly; measure and adjust. It is recommended to start with a single-threaded version and when that is <I>working correctly</I>, take measurements and adjust to move to multithreading version.<BR>
DataFrame uses multithreading extensively and provides granular tools to adjust your environment. Let’s divide the multithreading subject in DataFrame into two categories:<BR>

<H4>1. User Multithreading</H4>
Expand All @@ -1450,7 +1450,8 @@ <H4>1. User Multithreading</H4>
<LI>In addition, instances of DataFrame are not multithreaded safe either. In other words, a single instance of DataFrame must not be used in multiple threads without protection, unless it is used as read-only.</LI>
</UL>
<H4>2. DataFrame Internal Multithreading</H4>
Whether or not you, as the user, use multithreading, DataFrame utilizes a versatile thread-pool to employ parallel computing extensively in almost all its functionalities. By default, there is no multithreading. All algorithms execute their single-threaded version. To enable multithreading, call either <I>ThreadGranularity::set_optimum_thread_level()</I> (recommended) or <I>ThreadGranularity::set_thread_level(n)</I>. When Multithreading is enabled, most parallel algorithms trigger when number of data points exceeds 250k and number of threads exceeds 2.<BR>
Whether or not you, as the user, use multithreading, DataFrame utilizes a versatile thread-pool to employ parallel computing extensively in almost all its API's. By default, there is no multithreading. All algorithms execute their single-threaded version. To enable multithreading, call either <I>ThreadGranularity::set_optimum_thread_level()</I> (recommended) or <I>ThreadGranularity::set_thread_level(n)</I>.<BR>
When Multithreading is enabled, most parallel algorithms trigger when number of data points exceeds 250k and number of threads exceeds 2. Therefore, if your process deals with datasets smaller than this, it doesn't make sense to populate the thread-pool with threads as they will be waste of resources.<BR>
You do not need to worry about synchronization for DataFrame internal multithreading. It is done behind the scenes and unbeknown to you.<BR>
<UL>
<LI> There are asynchronous versions of some methods. For example, you have sort()/sort_async(), visit()/visit_async(), ... The latter versions return a std::future and would execute in parallel.<BR>If you chose to use DataFrame async interfaces, it is highly recommended to call <I>ThreadGranularity::set_optimum_thread_level()</I>, So your thread-pool is populated with optimal number of threads. Otherwise, if thread-pool is empty, async interfaces will add one thread to it. Having only one thread in thread-pool could be suboptimal and hinder performance.</LI>
Expand Down
2 changes: 1 addition & 1 deletion examples/hello_world.cc
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ int main(int, char *[]) {
// it is recommended to call the following at the beginning of your program.
//
// NOTE: make sure you read and understand the Multithreading section
// in the documentations.
// in the documentations (threads could potentially hinder performance).
//
ThreadGranularity::set_optimum_thread_level();

Expand Down
Loading

0 comments on commit 68fd03a

Please sign in to comment.