Skip to content

Commit

Permalink
Update report.tex
Browse files Browse the repository at this point in the history
  • Loading branch information
dadit97 committed Sep 11, 2022
1 parent ff78ff5 commit 467e465
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion report/report.tex
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,14 @@ \section{Experiments}

\subsection{Preprocessing Performance}

The table below summarizes the time required by the \textbf{Colab enviroment} to complete various parts of the preprocessing section.
The table below summarizes the time required by the \textbf{Colab enviroment} to complete various parts of the preprocessing section.\\

In order to collect the data of PySpark, at the end of every section the method count() was launched, forcing the library to execute immediately the operation requested.\\

Without these requests, the lazy behaviour of PySpark data structures makes any type of benchmarking impossible.This probably makes the times of execution longer, due to
the execution of the count operations, and the \textbf{loss of some optimization} thath could be done if the library evaluates all the operations in one run.
This delay must be taken into consideration, but should not alter the overall results of the tests.\\

The numbers represents seconds of execution.

\begin{center}
Expand Down

0 comments on commit 467e465

Please sign in to comment.