Algorithms - Introduction to the Basic Concepts

Chapter 2 - Getting Started

Loop Invariant, Case Analysis, Order of Growth

Checking the loop invariant

When analyzing an algorithm using a loop invariant (which consists of the elements originally in positions that have been sorted already) to see if it is correct, we check 3 properties:

  1. Initialization - Loop invariant must be true prior to the first iteration of the loop.
  2. Maintenance - Must be true before the beginning of a loop, and remain true before the next iteration of the loop.
  3. Termination - When the loop terminates, the loop invariant must yield correct results that helps shows that the algorithm is correct.
The first two properties helps to shows that the loop of the algorithm is correct.  The third properties is a confirmation that the algorithm has worked.

Lets put this into a better understanding now, by using insertion sort.

List given:  4 3 1 2 5 6

Initialization - Since insertion sort starts at the second element, the only part that is in the loop invariant would be the first element '4'.  Although trivial, 4 is indeed sorted (1 of 1).

Maintenance

List of actions:

Start loop: 4 | 3 1 2 5 6

3 is moved to position 0.
4 is shifted to position 1.
End loop.

Current loop: 3 4 | 1 2 5 6

1 is moved to position 0.
4 is shifted to position 2.
3 is shifted to position 1.
end loop.

Current loop: 1 3 4 | 5 6
end loop.

Current loop: 1 3 4 5 | 6
end loop.

Final loop: 1 3 4 5 6

Elements to the left of the line are considered to be a part of the loop invariant.  In each step, the numbers to the left of the line are in order.  Therefore we conclude that insertion sort satisfies the maintenance property. 

Termination - We examine the insertion sort's output, and conclude from our findings that this algorithm does in fact return a sorted list in ascending order.

Algorithm Performance (Case Analysis)

There are three ways to examine the time performance of an algorithm.  Best-case, average-case, and worst-case.  Out of these three cases, most programmers look at the worst-case run-time because:

  1. Worst-case performances gives you the upper bound for which the algorithm will take.  Having this knowledge will tell you at most what time it will take to do its job.
  2. In some algorithms, worst-case scenarios actually appear frequently.  In the case of a database searching for a term, which may not even be in the system.
  3. Average case operation time tends to be closer to worst-case.
Average-case is limited because there is an ambiguity for what exactly is the "average case".

This is why when conversing about algorithm performance, the norm is to talk about worst-case properties.

Lets take insertion sort for example.  If the list is already in order (best-case), then the algorithm would take n-time to check each element of the list.  But if the list were in reverse order, it would take n^2 time because the way insertion sort is implemented, it has a while loop dependent on the input enclosed in a for loop.  So for each element there is another iteration of the input values.  Therefore this double loop costs n^2 operations.  

Best-case for insertion sort: n
Worst-case for insertion sort: n^2

Order of Growth

The "Order of Growth" describes how an algorithm deals with continually larger sets of inputs.  As input sizes increase from orders of tens to billions, algorithm performance greatly depends on how it is implemented. The Order of Growth captures the increasing complexity that must take place for an increasing amount of input.  This subsection does not go into great deal about Big O notation, it gives a little insight into how we can examine growth.  

Analysis using graphs


First, given a formula for the run-time of an algorithm, the most important part of the equation is its leading term.  So in the case of: "an^2 + bn + c", with constants a,b,c.  We take "an^2", drop the constant because constants have little effect on the growth rate of an algorithm, and are left with "n^2".  Back to basic elementary school level math, we know that n-squared is a quadratic function, and this graph would come to mind.  Picture found here.
We see that as the x-input increases, the y-output values begin to dramatically increase.  If we do some simple calculus, we get that the derivative is 2x.  Which means that the rate of change doubles every input, therefore leaving us with a function that is increasing exponentially.  Algorithms with quadratic growth do not have scalability, and should be heavily avoided when dealing with large growing data sets when possible.  

Algorithm growth does not stop with n^2, but can also be expressed as n^3 and beyond, all with increasingly worse than the former.

There is also n-growth, also known as linear equation.  Found here.


Linear growth fairs much better with increases than quadratic equations.  

Now we come to the golden logn, which has an interesting property that many computer science researchers use to deal with huge data sets.  Picture found here.

Interesting enough, as n increases in a Log n function, the larger the n, the smaller the increase in output is. We look to these types of algorithms that contain log n performance to deal with large data sets.  

Here is an interesting graph that I made on a graphic calculator site to show you how growth changes depending of n-function.  Examine how x^2 begins to increase exponentially faster per input, while log x grows at a significantly lower pace compared to the other two functions.



Note that these are the base equations, and actual formulas may differ greatly than simple functions as shown above.  

Order of Growth vs Case Analysis

This is not to be confused with worst/best/average case analysis, such as in the section above.  Although they are similar in that they use functions to describe each case, they are fundamentally dealing with two different aspects of Algorithms.  Order of Growth deals with the increasing complexity depending on the increase in input.  Case analysis deals with deals with the actual input itself and how different sets of input may cause differing examples of performance in the algorithm.

Comments

Popular posts from this blog

Uncle Bob's Clean Architecture

C4 Model: Describing Software Architecture

Running RabbitMQ with Docker