The Self-Taught Computer Scientist


Chapter 3 Search Algorithms 33


Download 1.48 Mb.
Pdf ko'rish
bet38/147
Sana17.06.2023
Hajmi1.48 Mb.
#1540634
1   ...   34   35   36   37   38   39   40   41   ...   147
Bog'liq
books-library.net-11301817Az7X6

Chapter 3 Search Algorithms
33
for the numbers 0 through 9 because 0 through 9 in the ASCII table are not numeric values. They are 
characters for nonmathematical purposes, such as expressing the numerals in the street address 26 
Broadway Street, New York. Because ASCII maps each character to a 7- bit binary number, it can only 
represent a maximum of 128 different characters (2
7
is 128). Most computers, however, extend ASCII 
to 8 bits so that it can represent 256 characters.
While you can represent the 256 characters in the Latin- script alphabet with ASCII, it does not 
support enough characters to deal with the text from other writing systems, such as Japanese or 
Mandarin. To deal with this problem, computer scientists developed the Unicode character set. 
Character encoding means assigning a number to characters for digital representation. UTF- 8 is one 
of the character encoding methods computer scientists use to implement the Unicode character set.
Instead of using 7 or 8 bits like ASCII, UTF- 8 uses up to 32 bits to encode each character, which 
allows you to represent more than a million characters. UTF- 8 is compatible with ASCII because it 
uses the same bit representation for the Latin- script alphabet. For example, both ASCII and UTF- 8 
represent an uppercase 
A with 1000001.
Dec Bin
Char
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
[
\
]
^
_
`
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
{
I
}
~
[DEL]
space
!
"
#
$
%
&

(
)
*
+
,
-
.
/
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
0000 0000
0000 0001
0000 0010
0000 0011
0000 0100
0000 0101
0000 0110
0000 0111
0000 1000
0000 1001
0000 1010
0000 1011
0000 1100
0000 1101
0000 1110
0000 1111
0001 0000
0001 0001
0001 0010
0001 0011
0001 0100
0001 0101
0001 0110
0001 0111
0001 1000
0001 1001
0001 1010
0001 1011
0001 1100
0001 1101
0001 1110
0001 1111
0010 0000
0010 0001
0010 0010
0010 0011
0010 0100
0010 0101
0010 0110
0010 0111
0010 1000
0010 1001
0010 1010
0010 1011
0010 1100
0010 1101
0010 1110
0010 1111
0011 0000
0011 0001
0011 0010
0011 0011
0011 0100
0011 0101
0011 0110
0011 0111
0011 1000
0011 1001
0011 1010
0011 1011
0011 1100
0011 1101
0011 1110
0011 1111
0100 0000
0100 0001
0100 0010
0100 0011
0100 0100
0100 0101
0100 0110
0100 0111
0100 1000
0100 1001
0100 1010
0100 1011
0100 1100
0100 1101
0100 1110
0100 1111
0101 0000
0101 0001
0101 0010
0101 0011
0101 0100
0101 0101
0101 0110
0101 0111
0101 1000
0101 1001
0101 1010
0101 1011
0101 1100
0101 1101
0101 1110
0101 1111
0110 0000
0110 0001
0110 0010
0110 0011
0110 0100
0110 0101
0110 0110
0110 0111
0110 1000
0110 1001
0110 1010
0110 1011
0110 1100
0110 1101
0110 1110
0110 1111
0111 0000
0111 0001
0111 0010
0111 0011
0111 0100
0111 0101
0111 0110
0111 0111
0111 1000
0111 1001
0111 1010
0111 1011
0111 1100
0111 1101
0111 1110
0111 1111
[NUL]
[SOH]
[STX]
[ETX]
[EOT]
[ENQ]
[ACK]
[BEL]
[BS]
[TAB]
[LF]
[VT]
[FF]
[CR]
[SO]
[SI]
[DLE]
[DC1]
[DC2]
[DC3]
[DC4]
[NAK]
[SYN]
[ETB]
[CAN]
[EM]
[SUB]
[ESC]
[FS]
[GS]
[RS]
[US]
Dec Bin
Char
Dec Bin
Char
Dec Bin
Char
Figure 3.7: ASCII chart


Introduction to Algorithms
34
You can use Python’s built- in 
ord()
function to get a character’s ASCII value:
print(ord('a'))
>> 97 
As you can see, the ASCII value for 
a is 97 (in base 10).
The 
ord()
function is useful when you need to work directly with the underlying ASCII codes of 
different characters. To modify the binary search you wrote earlier to search for characters, you have 
to get and compare the ASCII values of characters. Each time through your loop, you check to see 
whether the ASCII code of each character is higher than, lower than, or equal to the character’s ASCII 
code you are looking for. Instead of showing you the solution here, I challenge you to code it yourself 
at the end of this chapter.
You now know how linear and binary searches work and when to use them when you are searching 
for data. While a binary search is very efficient, it is not the fastest way to search for data. In Part II, 
you will learn how to search for data using a hash table and why it is the most efficient kind of search.
Vocabulary
search algorithm: An algorithm that looks for data in a data set.
data set: A collection of data.
linear search: A search algorithm that iterates through each element in a set of data and compares 
it to the target number.
sorted data: Data arranged in a meaningful way.
binary search: Another algorithm for searching a list for a number, and it is faster than a linear search.
floor division operator: An operator that returns the whole value of division, rounded down.
exponentiation: A mathematical operation you write as 
b
n
(or 
b**n
in Python), where you multiply 
a number 
b times itself n times.
base: The 
b in the equation for exponentiation (b
n
).
exponent: The 
n in the equation for exponentiation (b
n
).
logarithm: The power you must raise a number to in order to produce another number.
character set: A map between characters and binary numbers.


Chapter 3 Search Algorithms
35
ASCII: The American Standard Code for Information Interchange character set.
Unicode: A character set that can store more characters than ASCII.
character encoding: Assigning a number to characters for digital representation.
UTF- 8: One of the methods of character encoding computer scientists use to implement the Unicode 
character set.
Challenge
1. 
Given a list of words in alphabetical order, write a function that performs a binary search for 
a word and returns whether it is in the list.



4
Sorting Algorithms
I think the bubble sort would be the wrong way to go.
Barack Obama
As a computer scientist, in addition to searching for data, you will often have to sort data. Sorting 
data means arranging it in a meaningful way. For example, if you have a list of numbers, you could 
sort them from the smallest to the largest (ascending). Or, imagine you are building an app that keeps 
track of the books each user has read. In an app like this, you might want to allow the user to see their 
books sorted in different ways. For example, you could give them the option to see their books sorted 
from the shortest book to the longest, oldest to newest, or newest to oldest.
There are many different sorting algorithms to help you sort data, each with strengths and weaknesses. 
For example, some sort algorithms work best in specific situations, like if an iterable is nearly sorted. In 
this chapter, you will learn about bubble sort, insertion sort, and merge sort. Other popular sorts include 
quicksort, shell sort, and heap sort. There are many sorting algorithms you will have to use in only rare 
circumstances, so after teaching you a few sorts, I spend the remainder of the chapter showing you how 
to use Python’s built- in functions to sort data, something you will often use in real- world programming.
When you are building your programs in the real world, you should almost always use your 
programming language’s built- in sorting function. You should not implement the classic sorting algo-
rithms discussed here (except in rare circumstances) because modern programming languages like 
Python have built- in sorting functions that are faster. However, learning a few of the classic sorting 
algorithms will help you better understand time complexity and teach you concepts you can use in 
situations other than sorting, such as the merge step in a merge sort.
Bubble Sort
Bubble sort is a sorting algorithm where you iterate through a list of numbers, compare each number 
to the next number, and swap them if they are out of order. Computer scientists call it a bubble sort 
because the numbers with the highest values “bubble up” to the end of the list, and the numbers with 
the smallest values move to the beginning of the list as the algorithm progresses.



Download 1.48 Mb.

Do'stlaringiz bilan baham:
1   ...   34   35   36   37   38   39   40   41   ...   147




Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2024
ma'muriyatiga murojaat qiling