Skip to content

Commit

Permalink
Implemented in_between()
Browse files Browse the repository at this point in the history
  • Loading branch information
hosseinmoein committed Aug 1, 2024
1 parent 6c9a50e commit 0f4bf5f
Show file tree
Hide file tree
Showing 9 changed files with 328 additions and 62 deletions.
42 changes: 23 additions & 19 deletions docs/HTML/DataFrame.html
Original file line number Diff line number Diff line change
Expand Up @@ -83,20 +83,20 @@
<UL>
<LI><font color="blue" size="+1"><B>Page Index</B></font></LI>
<UL>
<LI><a href="https://github.com/hosseinmoein/DataFrame?tab=readme-ov-file"><font size="+2">&#8592;</font> Back to Github</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#1"><font size="+2">&#9730;</font> Summary</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#2"><font size="+2">&#128477;</font> API Reference with code samples</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#3"><font size="+2">&#128640;</font> Multithreading</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#4"><font size="+2">&#127749;</font> Views</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#5"><font size="+2">&#128109;</font> Visitors</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#6"><font size="+2">&#10803;</font> Memory Alignment</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#7"><font size="+2">&Sum;</font> Numeric Generators</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#8"><font size="+2">&#128193;</font> Code Structure</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#9"><font size="+2">&#x1F6E0;</font> Build Instructions</a></LI>
<LI><a href="https://github.com/hosseinmoein/DataFrame?tab=readme-ov-file"><font size="+3">&#8592;</font> Back to Github</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#1"><font size="+3">&#9730;</font> Summary</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#2"><font size="+3">&#128477;</font> API Reference with code samples</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#3"><font size="+3">&#128640;</font> Multithreading</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#4"><font size="+3">&#128269;</font> Views</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#5"><font size="+3">&#128109;</font> Visitors</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#6"><font size="+3">&#10052;</font> Memory Alignment</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#7"><font size="+3">&#127922;</font> Numeric Generators</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#8"><font size="+3">&#128450;</font> Code Structure</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html#9"><font size="+3">&#x1F6E0;</font> Build Instructions</a></LI>
</UL>
</UL>

<H2 ID="1"><font color="blue">Summary <font size="+3">&#9730;</font></font></H2>
<H2 ID="1"><font color="blue">Summary <font size="+4">&#9730;</font></font></H2>
<P>
<font size="+1">DataFrame</font> is a templatized and heterogeneous C++ container designed for data analysis for statistical, machine-learning, or financial applications. You can think of data-frame as a two-dimensional data structure of columns and rows just like an Excel spreadsheet, or a SQL table. But in case of C++ DataFrame, your data needn't be two-dimensional necessarily. Columns in the C++ DataFrame could be vectors of any type, including DataFrames or other containers. So, a C++ DataFrame can be of any dimension. That's the logical layout of the data. C++ DataFrame also includes an intuitive API for data analysis and analytics. The API is designed to be open-ended meaning you can easily include your own custom algorithms.<BR>
Any data-frame inherently includes a schema. C++ DataFrame schema is either built dynamically at run-time or it comes from a file. Currently C++ DataFrame could be shared between different nodes (e.g. computers) in a couple of ways. It can be written into a file, or it can be serialized into a buffer and sent across and reconstituted on the other side.
Expand Down Expand Up @@ -136,7 +136,7 @@ <H2 ID="1"><font color="blue">Summary <font size="+3">&#9730;</font></font></H2>

<BR><HR COLOR="Gray" SIZE="5">

<H2 ID="2"><font color="blue">API Reference with code samples <font size="+3">&#128477;</font></font></H2>
<H2 ID="2"><font color="blue">API Reference with code samples <font size="+4">&#128477;</font></font></H2>
DataFrame library interface is separated into two main categories:
<OL>
<LI>Accessing, adding, slicing &amp; dicing, joining &amp; groupby'ing ... <B>(The first column in the table below)</B></LI>
Expand Down Expand Up @@ -296,6 +296,10 @@ <H2 ID="2"><font color="blue">API Reference with code samples <font size="+3">&#
<td title="Returns true if the column exists"><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/has_column.html">has_column</a>( 2 )</td>
</tr>

<tr class="item" onmouseover="this.style.backgroundColor='#ffff66';" onmouseout="this.style.backgroundColor='#d4e3e5';">
<td title="Returns a mask vector of 0s/1s for values between lower and upper bounds"><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/in_between.html">in_between</a>()</td>
</tr>

<tr class="item" onmouseover="this.style.backgroundColor='#ffff66';" onmouseout="this.style.backgroundColor='#d4e3e5';">
<td title="Returns number of inversions in the named column"><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/inversion_count.html">inversion_count</a>()</td>
</tr>
Expand Down Expand Up @@ -1539,7 +1543,7 @@ <H2 ID="2"><font color="blue">API Reference with code samples <font size="+3">&#

<BR><HR COLOR="Gray" SIZE="5">

<H2 ID="3"><font color="blue">Multithreading <font size="+3">&#128640;</font></font></H2>
<H2 ID="3"><font color="blue">Multithreading <font size="+4">&#128640;</font></font></H2>
In general, multithreading could be very unintuitive. Often you think by using multithreading you enhance the performance of your program. But in fact, you are hindering it. To do effective multithreading, you must do two things repeatedly; measure and adjust. In general (rule of thumb), you should use multithreading in two contradictory situations. First, when you have intensive CPU-bound operations like mathematical equations that can independently utilize different cores. Second, when you have multiple I/O-bound operations that can go on independently while they wait for each other. The key word here is <I>independently</I>. You must also realize that multithreading has an inherent overhead that not only affects your process but also other processes running on the same node. It is recommended to start with a single-threaded version and when that is <I>working correctly</I>, establish a baseline, take measurements, and implement a multithreaded solution.<BR>
DataFrame uses multithreading extensively and provides granular tools to adjust your environment. Let's divide the multithreading subject in DataFrame into two categories:<BR>

Expand All @@ -1560,7 +1564,7 @@ <H4>2. DataFrame Internal Multithreading</H4>

<BR><HR COLOR="Gray" SIZE="5">

<H2 ID="4"><font color="blue">Views <font size="+3">&#127749;</font></font></H2>
<H2 ID="4"><font color="blue">Views <font size="+4">&#128269;</font></font></H2>
<P>
Views have useful and practical use-cases. A view is a slice of a DataFrame that is a reference to the original DataFrame. It appears exactly the same as a DataFrame, but if you modify any data in the view, the corresponding data point(s) in the original DataFrame will also be modified and vice versa. There are certain things you cannot do in views. For example, you cannot add or delete columns, extend the index column, ...<BR><BR>

Expand All @@ -1582,7 +1586,7 @@ <H2 ID="4"><font color="blue">Views <font size="+3">&#127749;</font></font></H2>

<BR><HR COLOR="Gray" SIZE="5">

<H2 ID="5"><font color="blue">Visitors <font size="+3">&#128109;</font></font></H2>
<H2 ID="5"><font color="blue">Visitors <font size="+4">&#128109;</font></font></H2>
<P>
Visitors are the main mechanism to implement analytical (i.e. statistical, financial, machine-learning) algorithms. You can easily follow the visitor's interface to add your custom algorithm by which you will extend the DataFrame package. Visitors also play several roles that in other packages maybe handled by separate interfaces. Visitors play the role of <I>apply</I>, <I>transformer</I>, and <I>algorithms</I>. For example, a visitor can transform column(s) or it may take the column(s) as read-only and implement an algorithm.<BR>
There are two visitor interfaces:<BR>
Expand Down Expand Up @@ -1611,7 +1615,7 @@ <H2 ID="5"><font color="blue">Visitors <font size="+3">&#128109;</font></font></

<BR><HR COLOR="Gray" SIZE="5">

<H2 ID="6"><font color="blue">Memory Alignment <font size="+3">&#10803;</font></font></H2>
<H2 ID="6"><font color="blue">Memory Alignment <font size="+4">&#10052;</font></font></H2>
<P>
DataFrame gives you the ability to allocate memory on custom alignment boundaries.<BR>
You can use this feature to take advantage of <I>SIMD</I> instructions in modern CPU's. Since DataFrame algorithms are all done on vectors of data &#8212; columns, this can come handy in conjunction with compiler optimizations. Also, you can use alignment to prevent false cache-line sharing between multiple columns.<BR>
Expand All @@ -1621,7 +1625,7 @@ <H2 ID="6"><font color="blue">Memory Alignment <font size="+3">&#10803;</font></

<BR><HR COLOR="Gray" SIZE="5">

<H2 ID="7"><font color="blue">Numeric Generators <font size="+3">&Sum;</font></font></H2>
<H2 ID="7"><font color="blue">Numeric Generators <font size="+4">&#127922;</font></font></H2>
<P>
Random generators, and a few other numeric generators, were added as a series of convenient stand-alone functions to generate random numbers (it covers all C++ standard distributions). You can seamlessly use these routines to generate random DataFrame columns. The result vectors are space-optimized and and you can choose different memory alignments.<BR>
See this document and file <I>RandGen.h</I> and <I>dataframe_tester.cc.</I>
Expand All @@ -1630,7 +1634,7 @@ <H2 ID="7"><font color="blue">Numeric Generators <font size="+3">&Sum;</font></f

<BR><HR COLOR="Gray" SIZE="5">

<H2 ID="8"><font color="blue">Code Structure <font size="+3">&#128193;</font></font></H2>
<H2 ID="8"><font color="blue">Code Structure <font size="+4">&#128450;</font></font></H2>
<P>
The DataFrame library is <I>almost</I> a header-only library. Currently the only library source file is <I>DateTime.cc.</I><BR>
<BR>
Expand All @@ -1645,7 +1649,7 @@ <H2 ID="8"><font color="blue">Code Structure <font size="+3">&#128193;</font></f

<BR><HR COLOR="Gray" SIZE="5">

<H2 ID="9"><font color="blue">Build Instructions <font size="+3">&#x1F6E0;</font></font></H2>
<H2 ID="9"><font color="blue">Build Instructions <font size="+4">&#x1F6E0;</font></font></H2>
<P>
<font size="+1"><B>Using plain make and make-files:</B></font><BR>
Go to the <I>src</I> subdirectory, and execute build_all.sh. This will build the library and test executables for <I>Linux/Unix flavors only</I><BR><BR>
Expand Down
12 changes: 6 additions & 6 deletions docs/HTML/get_above_quantile_data.html
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
<tr bgcolor="Azure">
<td bgcolor="blue"> <font color="white">
<PRE><B>
template&lt;typename T, typename ... Ts&gt;
template&lt;comparable T, typename ... Ts&gt;
DataFrame
get_above_quantile_data(const char *col_name,
double quantile) const;
Expand All @@ -66,7 +66,7 @@
<tr bgcolor="Azure">
<td bgcolor="blue"> <font color="white">
<PRE><B>
template&lt;typename T, typename ... Ts&gt;
template&lt;comparable T, typename ... Ts&gt;
PtrView
get_above_quantile_view(const char *col_name,
double quantile);
Expand All @@ -92,7 +92,7 @@
<tr bgcolor="Azure">
<td bgcolor="blue"> <font color="white">
<PRE><B>
template&lt;typename T, typename ... Ts&gt;
template&lt;comparable T, typename ... Ts&gt;
ConstPtrView
get_above_quantile_view(const char *col_name,
double quantile) const;
Expand All @@ -113,7 +113,7 @@
<tr bgcolor="Azure">
<td bgcolor="blue"> <font color="white">
<PRE><B>
template&lt;typename T, typename ... Ts&gt;
template&lt;comparable T, typename ... Ts&gt;
DataFrame
get_below_quantile_data(const char *col_name,
double quantile) const;
Expand All @@ -136,7 +136,7 @@
<tr bgcolor="Azure">
<td bgcolor="blue"> <font color="white">
<PRE><B>
template&lt;typename T, typename ... Ts&gt;
template&lt;comparable T, typename ... Ts&gt;
PtrView
get_below_quantile_view(const char *col_name,
double quantile);
Expand All @@ -162,7 +162,7 @@
<tr bgcolor="Azure">
<td bgcolor="blue"> <font color="white">
<PRE><B>
template&lt;typename T, typename ... Ts&gt;
template&lt;comparable T, typename ... Ts&gt;
ConstPtrView
get_below_quantile_view(const char *col_name,
double quantile) const;
Expand Down
12 changes: 6 additions & 6 deletions docs/HTML/get_top_n_data.html
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@
<tr bgcolor="Azure">
<td bgcolor="blue"> <font color="white">
<PRE><B>
template&lt;typename T, typename ... Ts&gt;
template&lt;comparable T, typename ... Ts&gt;
DataFrame
get_top_n_data(const char *col_name, size_type n) const;
</B></PRE></font>
Expand All @@ -64,7 +64,7 @@
<tr bgcolor="Azure">
<td bgcolor="blue"> <font color="white">
<PRE><B>
template&lt;typename T, typename ... Ts&gt;
template&lt;comparable T, typename ... Ts&gt;
PtrView
get_top_n_view(const char *col_name, size_type n);
</B></PRE></font>
Expand All @@ -89,7 +89,7 @@
<tr bgcolor="Azure">
<td bgcolor="blue"> <font color="white">
<PRE><B>
template&lt;typename T, typename ... Ts&gt;
template&lt;comparable T, typename ... Ts&gt;
ConstPtrView
get_top_n_view(const char *col_name, size_type n) const;
</B></PRE></font>
Expand All @@ -109,7 +109,7 @@
<tr bgcolor="Azure">
<td bgcolor="blue"> <font color="white">
<PRE><B>
template&lt;typename T, typename ... Ts&gt;
template&lt;comparable T, typename ... Ts&gt;
DataFrame
get_bottom_n_data(const char *col_name, size_type n) const;
</B></PRE></font>
Expand All @@ -130,7 +130,7 @@
<tr bgcolor="Azure">
<td bgcolor="blue"> <font color="white">
<PRE><B>
template&lt;typename T, typename ... Ts&gt;
template&lt;comparable T, typename ... Ts&gt;
PtrView
get_bottom_n_view(const char *col_name, size_type n);
</B></PRE></font>
Expand All @@ -155,7 +155,7 @@
<tr bgcolor="Azure">
<td bgcolor="blue"> <font color="white">
<PRE><B>
template&lt;typename T, typename ... Ts&gt;
template&lt;comparable T, typename ... Ts&gt;
ConstPtrView
get_bottom_n_view(const char *col_name, size_type n) const;
</B></PRE></font>
Expand Down
Loading

0 comments on commit 0f4bf5f

Please sign in to comment.