Skip to content

Commit

Permalink
completed ch 7 highlighting
Browse files Browse the repository at this point in the history
  • Loading branch information
Tale152 committed Nov 19, 2022
1 parent 0f2a969 commit b76bf58
Show file tree
Hide file tree
Showing 2 changed files with 30 additions and 28 deletions.
16 changes: 9 additions & 7 deletions document/chapters/chapter_7/sections/5_proto-mapreduce.tex
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
\section{Proto-MapReduce}
The MapReduce implementation used in this prototype is a simplified version of what discussed in \textit{chapter \ref{the_mapreduce_paradigm}}; said simplifications, while making it easier to implement, have the side effect of negatively influencing performances but, given the available time limitations, they are still acceptable in a prototypical context where the focus is to demonstrate the feasibility of mobile devices Contribution.
\textbf{The MapReduce implementation used in this prototype is a simplified version} of what discussed in \textit{chapter \ref{the_mapreduce_paradigm}}; said simplifications, while making it \textbf{easier to implement}, have the \textbf{side effect of negatively influencing performances but}, given the available time limitations, they are \textbf{still acceptable in a prototypical context where the focus is to demonstrate the feasibility of mobile devices Contribution}.

The first difference comes from how the data are handled. As explained before, the MapReduce paradigms handles a number of splits M and applies progressively the Map function to said splits; the intermediate results are grouped by key (using a partitioning function) in order to obtain R (with R<M, typically) partitions which, upon allying the Reduce function, will produce R final results. This simplified version, on the contrary, takes the M splits and, after applying the Map function, maintains the original grouping, producing M intermediate results; said results are then computed applying the Reduce function to each of them, producing the M final results. As a consequence, the number of input data regions is equal to the regions in the final results.
\textbf{The first difference comes from how the data are handled}. As explained before, the MapReduce paradigms handles a number of splits M and applies progressively the Map function to said splits; the intermediate results are grouped by key (using a partitioning function) in order to obtain R (with R<M, typically) partitions which, upon allying the Reduce function, will produce R final results. This \textbf{simplified version}, on the contrary, \textbf{takes the M splits} and, after applying the Map function, \textbf{maintains the original grouping, producing M intermediate results}; said results are then \textbf{computed applying the Reduce function} to each of them, \textbf{producing the M final results}. As a consequence, \textbf{the number of input data regions is equal to the regions in the final results}.

\vspace{10mm}

\begin{figure}[!ht]
\centering
\includegraphics[scale=1.1]{document/chapters/chapter_7/images/proto_mapreduce.png}
\includegraphics[scale=1.18]{document/chapters/chapter_7/images/proto_mapreduce.png}
\caption{Proto-MapReduce Topology}
\label{fig:proto_mapreduce}
\end{figure}

The second important distinction comes from the topology of the connections between the entities involved in the MapReduce operation. Normally, a Map Worker that has completed a run of the Map function would send, under the coordination of the MapReduce Master, its intermediate results directly to the right Reduce worker; in this simplified version a Map Worker that has completed a Map operation sends its results to the MapReduce Master which, after gathering all the Map results for that particular region, will forward the data to an assigned Reduce Worker. The final topology for the MapReduce Service performed in this prototype is shown in \textit{figure \ref{fig:proto_mapreduce}} and is obtained performing these three steps:
\textbf{The second important distinction comes from the topology of the connections between the entities involved} in the MapReduce operation. Normally, a Map Worker that has completed a run of the Map function would send, under the coordination of the MapReduce Master, its intermediate results directly to the right Reduce worker; \textbf{in this simplified version a Map Worker that has completed a Map operation sends its results to the MapReduce Master which, after gathering all the Map results for that particular region, will forward the data to an assigned Reduce Worker}. The final topology for the MapReduce Service performed in this prototype is shown in \textit{figure \ref{fig:proto_mapreduce}} and is \textbf{obtained performing these three steps}:
\begin{enumerate}
\item \textbf{MapReduce Master recruitment}\\
The Invoking Endpoint Prototype performs a recruitment (seen in \textit{figure \ref{fig:recruitment_messages}}), connecting to a Node which receives a MapReduce Master Job (containing all info about resources requested with the Map and Reduce functions definitions as well). This marks the beginning of the next phase but, in the meantime, the Invoking Endpoint starts sending Tasks to the MapReduce Master (\textit{figure \ref{fig:p2p_messages}}), each containing a data region to compute.
Expand All @@ -20,8 +22,8 @@ \section{Proto-MapReduce}
Here the MapReduce Master performs the final recruitment, requesting the Reduce Workers specified in its MapReduce Master Job, sending a Reduce Worker Job (containing the Reduce function) to every new Node recruited. After this recruitment is completed, the MapReduce Master is allowed to send the progressively collected intermediate results to the Reduce Workers.
\end{enumerate}

Once all the data region are computed and the results are collected by the Invoking Endpoint, the P2P connections are closed, completing the Grid Service's execution.
\textbf{Once all the data region are computed and the results are collected by the Invoking Endpoint, the P2P connections are closed}, completing the execution.

As can be easily deduced, the Invoking Endpoint Prototype acts as a Master in its P2P connection to the MapReduce Master, while Map and Reduce Workers only act as Slaves in their connection to the same entity, making a Node (the MapReduce Master in this case) able to perform both the Master and the Slave roles at the same time, with the consequence of making the Map Worker to Reduce Worker connection perfectly feasible in a subsequent implementation of these Jobs.
As can be easily deduced, the Invoking Endpoint Prototype acts as a Master in its P2P connection to the MapReduce Master, while Map and Reduce Workers only act as Slaves in their connection to the same entity, \textbf{making a Node} (the MapReduce Master in this case) \textbf{able to perform both the Master and the Slave roles at the same time}, with the consequence of showing \textbf{that the Map Worker to Reduce Worker connection is perfectly feasible in a future implementation} of these Jobs.

Thanks this topology, every computationally heavy operation is delegated to the Grid's Nodes and, thus, the Invocation can be performed even from a low-spec device. On a final note, while performing a recruitment various parameters can be specified about the resources that a device needs to possess; while the Map or Reduce Worker role can be taken by any device, the MapReduce Master role is reserved to Desktop devices. This choice is made trying to attain more stability for the MapReduce process.
\textbf{Thanks this topology, every computationally heavy operation is delegated to the Grid's Nodes and, thus, the Invocation can be performed even from a low-spec device}. On a final note, while performing a recruitment various parameters can be specified about the resources that a device needs to possess; while the Map or Reduce Worker role can be taken by any device, \textbf{the MapReduce Master role is reserved to Desktop devices}. This choice is made \textbf{trying to obtain more stability} for the MapReduce process.
42 changes: 21 additions & 21 deletions document/chapters/chapter_7/sections/6_real-world_experiments.tex
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
\section{Real-world experiments}\label{real_world_experiments}
This section describes the experiments performed using the prototype in a real-world scenario performing a distributed MapReduce computation on distributed heterogeneous devices.
This section describes the \textbf{experiments performed using the prototype in a real-world scenario} performing a distributed MapReduce computation on distributed heterogeneous devices.

\subsection{Computation}
The operation chosen for the experiment is a basic classification based on the distance from centroids, each representing its corresponding class:
The operation chosen for the experiment is a \textbf{simple classification based on the distance from centroids, each representing its corresponding class}:
\begin{itemize}
\item \textbf{Red centroid}: (200,900)
\item \textbf{Green centroid} (700,100)
Expand All @@ -16,7 +16,7 @@ \subsection{Computation}
\label{fig:computation_start}
\end{figure}

Given a Cartesian plane (width: [0,1500], height: [0, 1000]) and a set of 2D points contained in it, the map function takes as input one of said points and calculates the euclidean distance from each one of the centroids; the centroid with minimum distance among the three is then chosen, obtaining a key-value output composed by the chosen class as the key and an array containing the computed point as the value (the array becomes relevant in the reduce function). It is important to note that in comparing the distance among two points, the square root characterizing the euclidean distance is not needed and, therefore, it is not calculated in the map function.
\textbf{Given a Cartesian plane} (width: [0,1500], height: [0, 1000]) \textbf{and a set of 2D points} contained in it, \textbf{the map function takes as input one of said points and calculates the euclidean distance from each one of the centroids}; the centroid with \textbf{minimum distance among the three is then chosen, obtaining a key-value output} composed by the chosen class as the key and an array containing the computed point as the value (the array becomes relevant in the reduce function). It is important to note that, in comparing the distance among two points, the square root characterizing the euclidean distance is not needed and, therefore, it is not calculated in the map function.

\begin{lstlisting}
const mapFunction = (p) => {
Expand All @@ -33,7 +33,7 @@ \subsection{Computation}
}
\end{lstlisting}

Every data region is computed by the Map Workers and, after every point in a particular region is classified, the output (visualized in \textit{figure \ref{fig:computation_region_computation}}) can be computed in the reduce function which simply reunites the intermediate results with the same key (hence belonging to the same class) in a single array.
Every data region is computed by the Map Workers and, \textbf{after every point in a particular region is classified, the output} (visualized in \textit{figure \ref{fig:computation_region_computation}}) \textbf{can be computed in the reduce function which simply reunites the intermediate results with the same key} (hence belonging to the same class) in a single array.

\begin{lstlisting}
const reduceFunction = (p1, p2) => {
Expand All @@ -49,7 +49,7 @@ \subsection{Computation}
\label{fig:computation_region_computation}
\end{figure}

As can be seen in \textit{figure \ref{fig:computation_final_result}}, after every region is mapped and then reduced, each point is assigned to one of the three classes.
As can be seen in \textit{figure \ref{fig:computation_final_result}}, after every region is mapped and then reduced, \textbf{each point is assigned to one of the three classes}.

\begin{figure}[!ht]
\centering
Expand All @@ -58,13 +58,13 @@ \subsection{Computation}
\label{fig:computation_final_result}
\end{figure}

Five experiments were performed:
\textbf{Five experiments} were performed:
\begin{itemize}
\item 1000 values (10 regions, 100 points for each region)
\item 10000 values (100 regions, 100 points for each region)(shown in \textit{figure \ref{fig:computation_final_result}})
\item 100000 values (100 regions, 1000 points for each region)
\item 1000000 values (1000 regions, 1000 points for each region)
\item 5000000 values (2000 regions, 2500 points for reach region)
\item \textbf{1000 values} (10 regions, 100 points for each region)
\item \textbf{10000 values} (100 regions, 100 points for each region)(shown in \textit{figure \ref{fig:computation_final_result}})
\item \textbf{100000 values} (100 regions, 1000 points for each region)
\item \textbf{1000000 values} (1000 regions, 1000 points for each region)
\item \textbf{5000000 values} (2000 regions, 2500 points for reach region)
\end{itemize}

\subsection{Setup}
Expand All @@ -84,17 +84,17 @@ \subsection{Setup}
\label{fig:experiment_devices_setup}
\end{figure}

The setup of Contributing Endpoints was thus composed by 2 Interconnected Desktop Clients and 5 Interconnected Mobile Clients, placed in a 100 km range in central Italy, which were forcibly chosen isolating them in a dedicated American server in order to perform multiple experiments with the same setup.
The setup of Contributing Endpoints was thus composed by \textbf{2 Interconnected Desktop Clients} and \textbf{5 Interconnected Mobile Clients}, placed in a \textbf{100 km range} in central Italy, which were forcibly chosen isolating them in a dedicated American server in order to perform multiple experiments with the same setup.

The Invoking Endpoint Prototype instance was placed in the E location (although it was not executed in the same computer which ran the Interconnected Desktop Client); said Invoking Endpoint requested the following resources for executing the MapReduce computation:
\textbf{The Invoking Endpoint Prototype instance was placed in the E location} (although it was not executed in the same computer which ran the Interconnected Desktop Client); said Invoking Endpoint requested the following resources for executing the MapReduce computation:
\begin{itemize}
\item 4 Map Workers
\item 2 Reduce Workers
\item \textbf{4 Map Workers}
\item \textbf{2 Reduce Workers}
\end{itemize}
Including the implicit MapReduce Master (which will be taken by one of the two computers), this adds up to the 7 devices specified earlier, which were used in each of the five experiments.
\textbf{Including the implicit MapReduce Master} (which will be taken by one of the two computers), \textbf{this adds up to the 7 devices specified earlier}, which were used in each of the five experiments.

\subsection{Results}
\textit{Figure \ref{fig:experiment_results}} shows the results for the five experiments performed, focusing on the total time and the average time taken by each value (both measured in milliseconds). Once again, these results were obtained using a very small pool of devices and the simplified nature of the MapReduce algorithm in this prototype significantly slows down the whole process (primarily because the intermediate results are first sent back to the MapReduce Master that then forwards them to the Reduce Worker, instead using a direct connection between Map Worker and Reduce Worker).
\textit{Figure \ref{fig:experiment_results}} shows the \textbf{results for the five experiments performed}, focusing on the \textbf{total time} and the \textbf{average time taken by each value} (both \textbf{measured in milliseconds}). Once again, these results were obtained using a very small pool of devices and the simplified nature of the MapReduce algorithm in this prototype significantly slows down the whole process (primarily because the intermediate results are first sent back to the MapReduce Master that then forwards them to the Reduce Worker, instead using a direct connection between Map Worker and Reduce Worker).

\begin{figure}[!ht]
\centering
Expand All @@ -103,13 +103,13 @@ \subsection{Results}
\label{fig:experiment_results}
\end{figure}

The first significant observation can be made looking at the first two experiments: the average time for each value drastically drops (\texttildelow6.7 times faster); this can be explained by considering that the total time also includes the recruitment phase where no computation is executed. In other terms, the number of values used in the first experiment is so small that their computation time becomes irrelevant, meaning that the recruitment phase is basically the only factor that influences the average time. The more values are computed (assuming the same number of devices are used), the less impactful the recruitment time becomes.

Finally, \textit{figure \ref{fig:experiment_results_avg_ms_per_value}} focuses on comparing the average time for each value; it becomes apparent that, despite the not optimized algorithms used in this prototype, the more values are computed, the greater the advantage becomes, showing that a distributed computation participated also by mobile devices is not only feasible, but it can also provide value to the Customer.
The \textbf{first significant observation} can be made l\textbf{ooking at the first two experiments}: the \textbf{average time for each value drastically drops} (\texttildelow6.7 times faster); this can be \textbf{explained by considering that the total time also includes the recruitment phase where no computation is executed}. In other terms, \textbf{the number of values used in the first experiment is so small that their computation time becomes irrelevant}, meaning that the recruitment phase is basically the only factor that influences the average time. \textbf{The more values are computed (assuming the same number of devices are used), the less impactful the recruitment time becomes}.

\begin{figure}[!ht]
\centering
\includegraphics[scale=0.55]{document/chapters/chapter_7/images/experiment_results_avg_ms_per_value.png}
\caption{Average time (milliseconds) for each value}
\label{fig:experiment_results_avg_ms_per_value}
\end{figure}
\end{figure}

Finally, \textit{figure \ref{fig:experiment_results_avg_ms_per_value}} focuses on \textbf{comparing the average time for each value}; it becomes apparent that, despite the not optimized algorithms used in this prototype, \textbf{the more values are computed, the greater the advantage becomes, showing that a distributed computation participated also by mobile devices is not only feasible, but it can also provide value to the Customer}.

0 comments on commit b76bf58

Please sign in to comment.