The following is a series of codes using iterative and recursive approaches to solve DSA problems, alongside the experimental analysis for each algorithm.
A strobogrammatic number is a number that looks the same when rotated upside down, such as 8, 101, 96. Here we consider 1 to be strobogrammatic. I developed two codes using a iterative and recursive approach that takes an integer n as input and returns all the strobogrammatic numbers of length n in a list.
In the iterative approach, if the number of digits is 1, the strobogrammatic number can be either 0, 1, or 8.
For digits greater than 2, we first define the extreme values as 1-1, 8-8, 6-9, or 9-6 (it cannot be 0 since the digit number would be less than expected). Then, for each empty place in the first half of the digits, between the left-extreme and the median (if the digit number is odd), we calculate the Cartesian product. Cartesian product of two sets
If the digit number is odd, a center value will be added for each combination generated. This value would assume either 0, 1, or 8. If the digit number is even, such a center value would not exist, and the code would be done, outputting all numbers.
If the length of a strobogrammatic number is 1, there are only three possible possibilities. The number can be 0, 1, or 8. However, when the number of digits is greater than 1, the number can be composed of 6 and 9 (and vice-versa) when they are equidistant concerning the median digit.
A better way to create such numbers from a recursive approach is by defining a sub-function that can be called for when the input size is bigger than 1. This function should define the extremities of the numbers to be either (1-1, 8-8, 6-9, or 9-6). These extremes would be disregarded for the recursion input, and then the subfunction would be called to return for the inner values for the new extremities. However, the values would now consider the equidistant pair 0-0. Therefore, its values would assume the (0-0, 1-1, 8-8, 6-9, 9-6) pairs. The code ends when the function reaches the base case: the center values, which can be either (0, 1, 8) if the input size is odd, or (00, 11, 88, 69, 96) when it is even.
I designed experiments to determine the average run time and number of steps that both implementations take for increasing values of
Iterative step counter:
N: 5 | steps: 464
N: 10 | steps: 117528
N: 15 | steps: 4812524
Recursive step counter:
N: 5 | steps: 137
N: 10 | steps: 5017
N: 15 | steps: 375017
From the two plots below, there is a divergence of efficiency between the number of steps and the running time for both algorithms. The iterative function increases its number of steps more than the recursive function for increased input values. This is surprising, given the recursive approach was supposed to call for more steps as the function grows. This might be caused by the itertools.product() function that is called inside the iterative approach. For more values between the left-extreme digit and median, the number of combinations also increases.
However, on the other hand, the runtime analysis is different. It is seen that the recursive takes more time to run as the input increase than the iterative. This again makes sense when thinking about the difference in running time between an inbuilt function itertools.product() (for the iterative) and the created function more than once (for the recursive).
As pointed out by Cormen et al. (2009), an algorithm running time is defined by its number of steps (still, the number of steps should have more appropriate assumptions than those made for this assignment). Therefore when comparing the two analyses, the inference that aligns more with this definition is the best implementation (or more efficient algorithm) is the recursive algorithm. The main reason is the number of steps the itertools.product() takes. The iterative algorithm number of steps becomes increasingly higher than the recursive approach for the same input.
Since this analysis only considers values below 20 (the code takes too much time for values above 15), it might be necessary to look up the functions for more significant values. However, given the above explanation, the algorithms might behave as predicted.
Binary search, where recursive and iterative Python implementations return a list with three elements (start-index, end-index, and number of matches found in the sorted array).
Algorithm for a post office that will send trucks to different houses in the city. The post office wants to minimize the number of trucks needed, so all the homes with the same postal code should only require one truck.
The houses' postal codes are stored in a sorted array. Many places can share the same postal code. To simplify the database, I mapped the postal codes to specific numbers. For example, 94102 has been mapped to 1.
Given a target postal code, the task is to return the beginning and end index of this target value in the array (inclusive) and the total number of postal codes. If this postal code is not in the database, return None. For example, given the postal array [1, 2, 2, 3, 4, 4, 4, 5, 5, 5, 5, 7], and our target 4, the answer should be [4, 6, 3], where 4 is the start-index, 6 is the end-index, and 3 is the number of postal codes of this value. If our target is 1 for the same postal array, the answer should be [0, 0, 1]. If the target is 6, the answer should be None.
The code compares each number in the postal array with the target number using a for
loop. Then its position would be added to a tracker
list. The process continues until the current number being checked is not the number of interest. We get the position of the first and the last and the counting from the tracker
list.
A function takes a position value and reads the number in that position in the postal array. If it matches, I will store the value of the place and how many times the number has been reached. For the following recursion, it will change the value of the position, adding one, and the same process will continue until all the places have been checked. In the end, the first and last position of the match with the number of matches will be outputted.
I used the number of steps and running time as the metrics. For the new code version, several arrays were created using the random function to increase different sizes and several target numbers. The plotted graphics show that both algorithms share a similar pattern for the number of steps, which is predictable since the recursive works by following an iterative indexing model. Therefore we should compare the running time. The iterative algorithm performs better than the recursive algorithm, as observed by the slope of the plotted functions. Thus, the best formulation is the iterative algorithm using the running time as a performative metric.
Insertion sort is a very good algorithm for small arrays, while merge sort is considered one of the fastest algorithms. We would like to investigate whether we can design a better algorithm than merge sort by analyzing a variety of input distributions and sizes.
Python implementation that runs merge sort until the array gets fewer than 10 elements () when it switches to applying insertion sort.
Python implementation that runs insertion sort until the array gets fewer than 10 elements (), and then the remaining array is sorted using merge sort.
Theoretical complexity analysis of the two algorithms I have implemented, and conveyed by the functions merge_sort_until_k and insertion_sort_until_k.
The following analysis will consider the worst-case output of the algorithm.
The merge_sort_until_k
will split the array in the middle recursively until it reaches the base case, where the split reaches a subarray of size merge()
function to merge back the subarray sorted by insertion. Therefore, the code has a merge performance, but instead of reaching the base case where there is a subarray with one element, it will get a bottom where there are at least
The images below show a recursive tree for the time complexity of the function. It follows the behavior of a merge sort algorithm, but when it reaches the base case of
In the image, the array size is
As in merge, we constantly divide the input array by two until we reach
We can isolate h by using the definition of a logarithm of base 2 (lg):
We apply insertion sort until we reach the base case or for each
We should, therefore, add the insertion sort analysis to the merge sort analysis because it does not matter
Once we have merge_sort_until_k
function:
Where
The complexity for the insertion_sort_until_k
is more intuitive.
Considering the worst-case output for the algorithm when $n > k4, we will always apply insertion for
However, in the code implementation, it was possible to see that after the sorting by insertion and merge sort of parts of the initial array, we are still merging the entire array again by calling merge sort. Since the final array is still going under merge sort (which is counterintuitive since it goes against the optimization of the algorithm, making it work more by adding two additional terms), we are still adding a third term
Therefore the final time complexity analysis for insertion_sort_until_k
is:
The first term is the insertion sort application of the first
Which algorithm should one choose for the same input and
From the previous item, it is possible to see that insertion_sort_until_k
has the worst long-term behavior than any other algorithm seen so far. It is worse than the merge_sort_until_k
, merge sort, and insertion sort. This is because the time complexity function, as demonstrated, shows an insertion_sort_until
can perform slightly better. When insertion_sort_until_k
will get to approach the performance of merge sort.
Conversely, the merge_sort_until_k
will perform better since it has a leading term merge_sort_until_k
performs better for small values of
The plots below will analyze the asymptotic behavior of both functions by scaling the average running time and the number of steps to evaluate the algorithm efficiency. It tested different
The time efficiency for both algorithms was calculated by plotting the average time each algorithm took to run for increased input size for random inputs. By using random inputs, the algorithm scales for the average output cases. Then it is not expected that either the best or worst-case output is going to happen. By using randomization, the algorithm is likely to perform in a natural setting where the input list is unordered but not in any extreme cases (purely sorted or sorted in descending order). By doing this, the analysis gets closer to reality and therefore becomes more accurate.
As inferred from the plot, the merge_sort_until_k
performs better than the insertion_sort_until_k
when insertion_sort_until_k
, the code will perform insertion of insertion_sort_until_k
. However, by running the plots for this case, the scaling time behavior does not improve considerably as expected.
On the other side, the time scaling behavior for merge_sort_until_k
is expected, as shown because it will split the list by applying merge sort until reaching the bottom case, which is insertion_sort_until_k
. Because the dividing approach will divide the array into small arrays, the application of insertion sort will be efficient concerning time since insertion sort is expected to take less time than merge sort for a small input size. After the application of insertion, the merge will finish its job by merging the array back. Merge gets better for higher values of
Therefore, from the time analysis, it is possible to see that merge_sort_until_k
will perform better than the insertion_sort_until_k
for any given
The step count efficiency analysis follows the same behavior as the time complexity analysis for small values of merge_sort_until_k
performs better than the insertion_sort_until_k
when merge_sort_until_k
will have a behavior closer to constant than the insertion_sort_until_k
. However, when
From both the analytical and experimental analysis, it was possible to conclude that merge_sort_until_k
will perform better than insertion_sort_until_k
for any small given
To better understand why merge sort is preferred until a specific insertion_sort_until_k
. However, when merge_sort_until_k
will only call insertion since insertion_sort_until_k
will behave as merge sort because no insertion will be applied, only merge sort.
Cormen, T., Leiserson, C., Rivest, R., & Stein, C. (2009). Introduction to Algorithms (Third). Cambridge, Mass.: Mit Press.