About this module¶
-
Prerequisites: Introduction to Arrays
-
Objectives: This module improves the linear search algorithm for sorted arrays. It also introduces binary search, a very efficient search algorithm for sorted arrays.
Sorted arrays¶
The linear search algorithm in module Introduction to Arrays is the only kind of search algorithm that works on unsorted arrays. In the worst case, all elements in the array must be examined to confirm that a value is not in the array.
However, we can make some improvements with sorted arrays.
Assume that array a is sorted, and assume n is the number of elements in array a. Furthermore, let us assume that the array is
sorted in a non-decreasing order. This implies that we can have
duplicate values in the array. The mathematical way to say an array is
sorted in a non-decreasing order is as follows:
a[i] <= a[i+1] for i ranging inclusively from 0 to n-2. We can also spell this out as
a[0] <= a[1] <= a[2]... a[n-1].
The following algorithm results in a value of true in variable sorted when it completes if and only if array a is sorted:
i=0; // line 1
sorted=true; // line 2
while (sorted && i<n-1) { // line 3
sorted = sorted && (a[i]<=a[i+1]); // line 4
i=i+1; // line 5
} // line 6
As an exercise, try tracing this algorithm using a sorted array and an unsorted array.
Linear search¶
We can improve our linear search algorithm to work more efficiently. The original algorithm is described as the following algorithm.
// Linear search algorithm that works for sorted and unsorted arrays
// Assume v contains the value to be search in an array a that has n elements
// Assume array a elements have specific values
i=0;
while ((i<n) && (a[i] != v))
i = i + 1;
if (i=n)
// conclude no element in a has a value of v
else
// conclude a[i] is the first element with a value of v
Note that if array a is sorted in a non-decreasing way, and we find that v < a[i], then we already know that
v < a[i] <= a[i+1] <= a[i+2]... a[n-1]. This also means that v != a[i+1]$, v != a[i+2], all the way
up to v != a[n-1].
This property of a sorted array means that we can exit the loop earlier
when we discover that v < a[i]. This is great news! On the other hand,
we need to modify the conditional statement because now we can exit with
i < n and still conclude that value $v$ is not anywhere in the
array. The condition that confirms that value v is not in the array
should be (i == n) || (a[i] != v).
As a result, the algorithm is modified as in the following algorithm.
// Improved linear search algorithm that works only for non-decreasingly sorted arrays.
i=0;
while ((i<n) && (a[i]<v))
i=i+1;
if ((i==n) || (a[i] != v))
// conclude no element in a has a value of v
else
// conclude a[i] is the first element in a that has a value of v
With this modification, the number of elements to examine is variable
when the value $v$ is not in the array. In the worst case,
when v > a[n-1], we still need to examine all n items in the array.
Binary search¶
Concept¶
Binary search works only if an array is sorted. The basic idea is to compare the value to find with an element in the middle of the array. Depending on the outcome, we can rule out one-half of the candidates or confirm that the value does exist.
Let us assume that we compare v with a[m], the first candidate has
an index of b, and the last candidate has an index of e. Then, there
are three possible outcomes:
v==a[m]: the search is over! We just confirmed that value $v$ can be found in the array.v < a[m]: we can rule outa[m],a[m+1],...a[e].v > a[m]: we can rule outa[b],a[b+1],...a[m].
If we pick $m$ to be half way between $b$ and $e$, then we can throw away at least one half of the candidates after one comparison.
Algorithm¶
The binary search algorithm is described in the following algorithm. This algorithm assumes that $a$ is an array with at least one item, and $v$ is the value that we are searching for.
// The binary search algorithm, assume array a has n items non-decreasingly sorted
// k is the value to search for
b=0; // line 1
e=n-1; // line 2
do { // line 3
m = (b+e)/2; // line 4
if (a[m] < v) // line 5
b = m + 1; // line 6
else // line 7
if (v < a[m]) // line 8
e = m - 1; // line 9
} while ((b <= e) && (a[m] != v)); // line 10
if (b>e) // line 11
// line 12: conclude v cannot be found in a
else // line 13
// line 14: conclude v is found in a
Let us dissect this algorithm.
- Lines 1 and
2 initialize variables
bande, respectively.bis the index of the first element that can still contain valuev. It is initialized to 0 because we need to consider all elements initially. For the same reason,eis initialized ton-1because that is the index of the last element in the entire array. - Everything between line 3 and line 10 is the logic of binary search.
- Line 4 computes the value of $m$ so it is
halfway between
bande. As long asbandeare integers, the result of the division is automatically truncated to the integer part. - Line 5 checks to see if we can remove the
first half of the candidates. If
a[m] < v, then we know thata[b] <= a[b+1] <= a[b+2]... a[m] < v, and so we can rule out all those elements as candidates. - Line 6: Once we know that we can eliminate the
first half of the candidates, we can do so by changing $b$. Note
that we do not change to
btombecausea[m] != v. Instead, we movebtom+1. This subtle point makes all the differences in terms of terminating the loop. - Lines 8 and 9 are the counterparts of lines
5 and
- Instead of ruling out the first half, these lines check to see if we can rule out the self half.
- Line 10 specifies when we have another iteration to perform. There are two reasons that are in the conjunction. First, when
b <= e, we have at least one candidate left. Note that whenb == e, it means that there is exactly one more element to considered. Second, whena[m] != v, the logic confirms that the algorithm has not just found an element that has the key (search value). - Line 11 checks to see if value $v$ is found in
the array or not. If
b > e, it means that we exited the loop because we ran out of candidates. Therefore, we can then confirm that valuevdoes not exist ina.
Now, that's efficient!¶
The binary search algorithm is very efficient because each comparison
rules out at least one half of the remaining candidates. This means
that if we start with 511 candidates, we'll end up with 255 after
one comparison, 127 after two, 63 after three, etc. It'll take 9 comparisons that confirm a value v is not in an
array of 511 elements.
Let n be the number of elements in a and q be the number of comparisons needed to confirm
v is not in a. Then $q = \lceil \log{(n)} / \log{(2)} \rceil$. The
symbol $\lceil x \rceil$ is the ceiling of $x$, which is the smallest
integer that is larger than or equal to $x$.
Using this formula, to look up a name in a phonebook with 8 billion entries will only take up to 33 comparisons!