《算法导论》第二章 入门
2.1 Insertion sort
Pseudocode is used to specify a given algorithm in English. Issues of data abstraction,
modularity, and error handling are often ignored in order to convey the essence of the
algorithm more concisely.
We use loop invariants to help us understand why an algorithm is correct.
Initialization: It is true prior to the first iteration of the loop.
Maintenance: If it is true before an iteration of the loop, it remains true before the next iteration.
Termination: When the loop terminates, the invariant shows that the algorithm is correct.
The first two properties are similar to mathematical induction which you prove a base case and an
inductive step. The difference is we apply the inductive step infinitely but here we stop when the
loop terminates.
Pseudocode conventions
Indentation indicates block structure instead of conventional indicators such as begin and end statement.
The constructs while, for and repeat-until and if-else have interpretations similar to those in C/Java/Python.
The symbol "//" indicates that the remainder of the line is a comment.
We pass parameters to a procedure by value.
The boolean operators "and" and "or" are short circuiting.
while, for, repeat-until和if-else语句与C, C++, Java, Python, Pascal中的语句类似。
// 后是注释。
2.2 Analyzing algorithms
We shall assume a generic one-processor, random-access machine(RAM) model of computation as
our implementation technology. The RAM model contains instructions(arithmetic, control) and data type
(integer,float) commonly found in real computers. Each such instruction takes a constant amount of time.
What if a RAM had an instruction that sorts? What if we could store huge amounts of data in one word and
operate on it all in constant time? These are unrealistic scenario.
There are some instructions in real computers but not listed above. Is exponentiation a constant-time instruction?
In general case, no: it takes several instructions to compute but many computers have a shift left instruction, which
in constant time shifts the bits of an integer by k positions to the left. We will avoid such gray areas in the RAM model.
And we do not attempt to model the memory hierarchy. Models that include the memory hierarchy are more complex
than the RAM model. Moreover, RAM-model analyses are usually excellent predictors of performance.
The mathematical tools required may include combinatorics, probability theory, algebraic dexterity, and ability to
identify the most significant terms in a formula.
Analysis of insertion sort
We need to define the terms "running time" and "size of input" more carefully.
The best notion for input size depends on the problem being studied.
For many problem such as sorting, the most natural measure is the number of items in the input (the array size).
For many other problems, such as multiplying two integers, the best measure is the total number of bits.
If the input to an algorithm is a graph, it is more appropriate to describe with two integers - vertices and edges.
The running time of an algorithm is the number of primitive operations or "steps" executed.
设n = A.length,循环头要比循环体多执行一次检测,用tj表示内层while循环的循环次数。
所以运行时间可以表示为an+b,即n的线性函数(linear function)。
最坏情况是数组是倒序的,则tj=j。运行时间表示为an²+bn+c,是n的二次函数(quadratic function)。
We shall usually concentrate on finding only the worst-case running time. We give three reasons:
1.The worst-case running time of an algorithm gives us an upper bound on the running time for any input.
2.For some algorithms, the worst case occurs fairly often. For example, in search a database for a
particular piece of information, the worst case will often occur when the information is not in the database.
3.The "average case" is often roughly as bad as the worst case. How long does it take to determine where
in subarray A[1..j-1] to insert element A[j]? On average, half the elements in A[1..j-1] are less than A[j], so
tj is about j/2. The average-case running time turns out to be a quadratic function too.
Order of growth
One more simplifying abstraction: consider only the leading term of a formula (e.g., an²).
Due to constant factors and lower-order terms, an algorithm whose running time has a higher order
of growth might take less time for small inputs than an algorithm whose running time has a lower
order of growth. But for large enough inputs, lower order of growth will run more quickly.
2.3 Designing algorithms
For insertion sort, we used an incremental approach: having sorted the subarray A[1..j-1],
we inserted the single element A[j] into its proper place, yielding the sorted subarray A[1..j].
Now we will use divide-and-conquer to design a sorting algorithm whose worst-case running
time is much less than insertion sort. One advantage of divide-and-conquer algorithms is that
their running time are often easily determined.
Divide the problem into a number of subproblems that are smaller instances of the same problem
(Divide the n-element sequence to be sorted into two subsequences of n/2 elements each).
Conquer the subproblems by solving them recursively. If the subproblem sizes are small enough,
however, just solve the subproblems in a straightforward manner.
(Sort the two subsequences recursively using merge sort).
Combine the solutions to the subproblems into the solution for original problem.
(Merge the two sorted subsequences to produce the sorted answer).
The recursion "bottoms out" when sequence to be sorted has length 1, in which case there is no
work to be done. So the key operation is the merging of two sorted sequences in the "combine" step.
We place on the bottom of each pile a sentinel card, which contains a special value that we use to
simplify our code. Here, we use infinity as the sentinel value.
Divide: just computes the middle of the subarray, which takes constant time. Thus, D(n) = Θ(1).
Conquer: We recursively solve two subproblems, each of size n/2, which contribute 2T(n/2) to the running time.
Combine: already noted that merge on an n-element subarray takes time Θ(n), and so C(n) = Θ(n).
设c代表解决规模为1的问题所需的时间,D(n) = Θ(1) = c, C(n) = Θ(n) = cn.
树高是lgn + 1,每层子问题解决时间的和都是cn,所以总的时间是cnlgn + cn,即Θ(nlgn)
