Skip to content

Commit

Permalink
Merge pull request #321 from hosseinmoein/Hossein/ReadFast
Browse files Browse the repository at this point in the history
Faster read of csv2 format
  • Loading branch information
hosseinmoein authored Aug 12, 2024
2 parents d0aede8 + 9ad97bf commit c703bbe
Show file tree
Hide file tree
Showing 7 changed files with 552 additions and 471 deletions.
5 changes: 5 additions & 0 deletions benchmarks/dataframe_read_large_file.cc
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,11 @@ int main(int, char *[]) {
<< double(duration_cast<microseconds>(end - start).count()) / 1000000.0
<< " seconds\n";

/*
df.write<long, unsigned int, int, unsigned long>
("Large_File.dat", io_format::binary);
*/

return (0);
}

Expand Down
32 changes: 17 additions & 15 deletions docs/HTML/DateTime.html
Original file line number Diff line number Diff line change
Expand Up @@ -77,34 +77,34 @@
<UL>
<LI><font color="blue" size="+1"><B>Page Index</B></font></LI>
<UL>
<LI><a href="https://github.com/hosseinmoein/DataFrame?tab=readme-ov-file"><font size="+2">&#8592;</font> Back to Github</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DateTime.html#0"><font size="+2">&#9730;</font> Summary</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DateTime.html#1"><font size="+2">&#128193;</font> Code structure</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DateTime.html#2"><font size="+2">&#x1F6E0;</font> Build Instructions</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DateTime.html#3"><font size="+2">&#129513;</font> Example</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DateTime.html#4"><font size="+2">&#129419;</font> Types</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DateTime.html#5"><font size="+2">&#128477;</font> Member Functions</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DateTime.html#6"><font size="+2">&#127760;</font> Global DateTime Operators</a></LI>
<LI><a href="https://github.com/hosseinmoein/DataFrame?tab=readme-ov-file"><font size="+3">&#8592;</font> Back to Github</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DateTime.html#0"><font size="+3">&#9730;</font> Summary</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DateTime.html#1"><font size="+3">&#128450;</font> Code structure</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DateTime.html#2"><font size="+3">&#x1F6E0;</font> Build Instructions</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DateTime.html#3"><font size="+3">&#129513;</font> Example</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DateTime.html#4"><font size="+3">&#129419;</font> Types</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DateTime.html#5"><font size="+3">&#128477;</font> Member Functions</a></LI>
<LI><a href="https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DateTime.html#6"><font size="+3">&#127760;</font> Global DateTime Operators</a></LI>
</UL>
</UL>

<H2 ID="0"><font color="blue">Summary</font></H2>
<H2 ID="0"><font color="blue">Summary <font size="+4">&#9730;</font></font></font></H2>
<font size="+1">Since DataFrame is a statistical library, it often deals with time-series data. So, it needs to keep track of time.<BR>
The most efficient way of indexing DataFrame by time is to use an index type of time_t for second precision or double or long long integer for more precision. DateTime class provides a more elaborate handling of time. Also, it is a general handy DateTime object. DateTime is a cool and handy object to manipulate date/time with nanosecond precision and multi timezone capability. It has a very simple and intuitive interface that allows you to break date/time to their components, reassemble date/time from their components, advance or pullback date/time with different granularities, and more.</font><BR>

<BR><HR COLOR="Gray" SIZE="5">

<H2 ID="1"><font color="blue">Code structure</font></H2>
<H2 ID="1"><font color="blue">Code structure <font size="+4">&#128450;</font></font></H2>

<font size="+1">Both the header (DateTime.h) and source (DateTime.cc) files are part of the DataFrame project. They are in the usual include/Utils and src/Utils directories.</font><BR>

<H2 ID="2"><font color="blue">Build Instructions</font></H2>
<H2 ID="2"><font color="blue">Build Instructions <font size="+4">&#x1F6E0;</font></font></H2>

<font size="+1">Follow the DataFrame build instructions.</font><BR><BR>

<HR COLOR="Gray" SIZE="5">

<H2 ID="3"><font color="blue">Example</font></H2>
<H2 ID="3"><font color="blue">Example <font size="+4">&#129513;</font></font></H2>

<font size="+1">This library can have up to Nano second precision depending on what systems calls are available. These are some example code:i</font><BR>
<font size="+1">
Expand Down Expand Up @@ -136,7 +136,7 @@ <H2 ID="3"><font color="blue">Example</font></H2>

<HR COLOR="Gray" SIZE="5">

<H2 ID="4"><font color="blue">Types</font></H2>
<H2 ID="4"><font color="blue">Types <font size="+4">&#129419;</font></font></H2>
<font size="+1">These constants are used for formatting date/time into strings:</font><BR>
<font size="+1">
<pre class="code_syntax" style="color:#000000;background:#ffffff00;"><span class="line_wrapper"> <span style="color:#800000; font-weight:bold; ">enum</span> <span style="color:#800000; font-weight:bold; ">class</span> DT_FORMAT <span style="color:#800080; ">:</span> <span style="color:#800000; font-weight:bold; ">unsigned</span> <span style="color:#800000; font-weight:bold; ">short</span> <span style="color:#800000; font-weight:bold; ">int</span> <span style="color:#800080; ">{</span></span>
Expand Down Expand Up @@ -305,7 +305,7 @@ <H2 ID="4"><font color="blue">Types</font></H2>

<BR><HR COLOR="Gray" SIZE="5">

<H2 ID="5"><font color="blue">Member Functions</font></H2>
<H2 ID="5"><font color="blue">Member Functions <font size="+4">&#128477;</font></font></H2>
<font size="+1">
<pre class="code_syntax" style="color:#000000;background:#ffffff00;"><span class="line_wrapper"> <span style="color:#696969; ">// A constructor that creates a DateTime initialized to now.</span></span>
<span class="line_wrapper"> <span style="color:#696969; ">// tz: Desired time zone from DT_TIME_ZONE above.</span></span>
Expand Down Expand Up @@ -491,7 +491,7 @@ <H2 ID="5"><font color="blue">Member Functions</font></H2>
</font>

<BR><HR COLOR="Gray" SIZE="5">
<H2 ID="6"><font color="blue">Global DateTime Operators</font></H2>
<H2 ID="6"><font color="blue">Global DateTime Operators <font size="+4">&#127760;</font></font></H2>
<font size="+1">
<pre class="code_syntax" style="color:#000000;background:#ffffff00;"><span class="line_wrapper"><span style="color:#696969; ">// DateTime output operator to a stream</span></span>
<span class="line_wrapper"><span style="color:#696969; ">//</span></span>
Expand Down Expand Up @@ -544,6 +544,8 @@ <H2 ID="6"><font color="blue">Global DateTime Operators</font></H2>
<span class="line_wrapper"><span style="color:#800080; ">}</span><span style="color:#800080; ">;</span></span></pre>
</font>

<BR><a href="https://github.com/hosseinmoein/DataFrame?tab=readme-ov-file">&#8592; Back to Github</a><BR>

</body></html>

<!--
Expand Down
3 changes: 2 additions & 1 deletion include/DataFrame/Internals/DataFrame_private_decl.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

#pragma once

#include <cstdio>
#include <ranges>

// ----------------------------------------------------------------------------
Expand Down Expand Up @@ -66,7 +67,7 @@ void read_binary_(std::istream &file,
size_type starting_row,
size_type num_rows);
void read_csv_(std::istream &file, bool columns_only);
void read_csv2_(std::istream &file,
void read_csv2_(std::FILE *stream,
bool columns_only,
size_type starting_row,
size_type num_rows);
Expand Down
Loading

0 comments on commit c703bbe

Please sign in to comment.