|
|
| The previous section refers to "data", but what is "data"? Data is
information. It can be in the form of numbers, text, images, sound or a
host of other forms, and somehow, all of these types of information can
be manipulated by a computer. This is possible because ALL types of information
can ultimately be represented as collections of the smallest unit of information,
the "bit". Some items of data require billions of bits to represent them,
while others need just a single bit. The point is that any item of information
can be converted into a set of bits, and when in this form, the information
can be manipulated by a digital computer in ways limited only by our imaginations.
The conversion of all these forms of information into bits and back again
is done by the computer's I/O devices. Such conversion mechanisms cover
a large and complex field of study, well beyond the scope of this page.
Here we are concerned with the theory and use of this information after
conversion.
A bit is the smallest possible unit of information, if you have less than one bit, you have no information at all. An English language letter can have one of 26 values from "A" to "Z", an arabic digit can have fewer, one of 10 values "0" to "9". At the extreme is the bit, which can have only two values, usually referred to as "0" and "1". If we go one step further, we'd have a thing that is always "0", which coincidentally is how much information it would contain - none. A unit of information that is always the same tells us nothing, we need at least two possible states, which gives us the "bit". The two values of the bit can be interpreted as anything we wish, 0 or 1, Yes or No, True or False, Black and White or whatever. Sometimes this is all we need, but often we want to represent more information than a single bit can hold, so we use groups of bits. A group of bits can give many combinations of 0's and 1's, we just use as many bits as are required to give the number of combinations needed. One bit gives us two combinations, two bits gives four, three bits gives eight, and so on. If we need a million combinations to represent our information, we will need at least 20 bits. To represent a single decimal digit for example, eight combinations (3 bits) is not enough, we want ten, so 4 bits are required, which gives sixteen combinations. This is more than we need, so we just don't use the extra 6 combinations of these four bits, and consider them as invalid. Just as decimal digits can be grouped together to form numbers, so too
can groups of related bits, in this case each bit is called a "binary digit".
"Binary", meaning "pertaining to two", because each digit can have only
two values. In fact the word "bit" is a contraction of the words "Binary
digIT". (The reader may care to work out what an information unit with
3 possible values might be called.)
The table below shows how many combinations are available with a given number of bits. As can be seen, each additional bit doubles the number of combinations available. |
| Number of bits | Number of combinations |
|
|
2 |
| 2 | 4 |
| 3 | 8 |
| 4 | 16 |
| 5 | 32 |
| 6 | 64 |
| 7 | 128 |
| 8 | 256 |
| 9 | 512 |
| 10 |
|
| ... | ... |
| 16 | 65,536 |
| ... | ... |
| 20 | 1,048,576 |
| If we want to represent letters as well as digits, we need 36 combinations,
which is just 4 more than five bits allows, so we need to go to six bits
and 64 combinations. This will work, but it's also a waste, since almost
half of the storage we would use to represent alphanumeric characters this
way would have no meaning (the unused 28 combinations). Fortunately we
can employ these unused combinations for other special characters and punctuation.
If we also wanted both upper and lower case alphabets to be represented,
we would need seven bits.
We haven't yet seen which combinations of bits correspond to which characters.
This correspondence is arbitrary, and can be anything we want, as long
as we are consistent. But there are standard codes in existence such as
ASCII (American Standard Code for Information Interchange) and EBCDIC (Extended
Binary Coded Decimal Interchange Code) which define the relationships between
bit combinations and characters, and we should use one of those instead
of dreaming up our own.
But pure integer numbers don't need to be encoded like this. A collection
of binary digits (bits) by their nature can directly represent a number,
just as decimal digits can. In both Decimal and Binary numbers, the rightmost
digit is the "units" position, and each position to its left has an increasingly
higher value. Each digit position is worth 10 times (decimal) or 2 times
(binary) the value of the position to its right. The number 10 or
2 here is called the "base" or "radix" of the numbering system being used.
To accommodate larger numbers, we simply extend the digit positions leftwards as far as required, just as we earlier added bits until we had enough combinations to represent our piece of information. Each new digit position on the left is worth the radix times more than the position on its right. To arrive at the value of the number overall, we multiply each digit
by the value of the position it occupies, and add all these products together.
Notice that because the value of the digit positions increase much more slowly in binary as compared to decimal (by 2's instead of by 10's), many more digits are required to represent the same value. For example, the number "7094" in decimal requires 4 digits, whereas in binary it requires 13 digits ("1101110110110"). This is one of the biggest problems with binary data, while its natural for computers, it is cumbersome and hard to read for humans. Writing and reading information represented as binary would be an awkward and error prone process when carried out by humans. No problem, why not just use decimal all the time? Decimal is a convenient way of describing groups of binary bits when they represent a numeric value, but in computers a group of bits may be representing all sorts of things, not just numbers. Furthermore, from the example above, can you say which of the bits in 1101110110110 correspond to the "9" in 7094? The answer to that is no. So although the decimal form is convenient, it's no good for identifying subgroups of the bits it represents. Take for example the word "CAT", its 3 characters could be stored as
bits, using the ASCII coding standard. The three groups would be:-
Alternatively, consider the number 21831, its binary form is "101010101000111", but this could be also interpreted as two ASCII characters, giving "UG". Again, an unclear way to specify this number. If we know what type of data is being represented, we can say "CAT" instead of 010000110100000101010100 or 4407636. And we can say "21831" instead of 101010101000111 or UG. But to use the most correct and concise form, we have to know which way to interpret the data. Sometimes the most concise form is the binary form. Suppose that we have seven bits that represent which rooms in a house currently have the light switched on. In this case "1000110" is more concise than "the lights are on in the kitchen, the lounge room and the master bedroom". Treating this bit pattern as characters gives us "F", and as a number gives us 70. Both of these interpretations are meaningless. What we need is a more concise way of writing down binary information, but without needing to know how that information has been encoded. Decimal and Binary are not the only numbering systems. Such systems can be based on any radix, but some radixes are more useful than others. The two most common are octal and hexadecimal, with radixes 8 and 16 respectively. Octal made some sense years ago, on computers that used 6 bits to represent a character, but these days hexadecimal, or simply "hex" is the most useful. Hex digits can have one of 16 values, corresponding to 0 through 15.
The characters used to represent these 16 values are "0" to "9" and
"A" to "F", ie."0123456789ABCDEF". Each hex digit correspondings to a group
of four bits and their 16 combinations. The table below shows all the bit
patterns discussed above using hexadecimal notation.
The advantages of hexadecimal notation are:-
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| It's easy to see that this can get complex and awkward if we use different
numbers of bits for each different type of data we wish to represent. To
keep things simple, bits are generally arranged and manipulated in groups
of eight, known as "bytes". This is a useful compromise between considerations
of simplicity, wastage and flexibility. The byte is also a practical unit
because the 256 combinations of its eight bits is enough to represent all
characters and special symbols. All modern computers work with byte sized
units.
In the days of the early machines, memory was more limited and expensive; the luxury of eight bit units could not be afforded. Units of 6 bits were used to represent a character, however this did not allow for lower case letters, and there was not even enough combinations left over to cater for all the special symbols and punctuation. When a computer transfers information between its CPU and memory, the part of the CPU that sends or receives the data is called a "register", which is an array of bits. Different computers have different numbers of registers in their CPU's, and there may also be various types of registers in any given CPU. Some registers have special purposes, and some are called "general purpose registers", which can be used in whatever way the programmer chooses. The various special and general purpose registers may each have different numbers of bits, but computers are generally classified by the number of bits in their general purpose registers. This number of bits in the general purpose registers is called the "word length" of the computer. Over the years many designs using various different word lengths were built, the most common sizes being 8,12,16,18,32,36 and 64 bits. With experience, it was realized that synergisms arose, and that "things just fitted together better" when the word length was a power of two. This is not surprising considering that the fundamental bit is has two states, and combinations of them give numbers of combinations that are a power of two. Nowadays, all computers have word lengths that are a power of two, i.e.. 8,16 32 or 64 bits. It's also no accident that these sizes are also multiples of eight, the size of a byte. In the old days, when six bits were used to represent a character, the 12,18 and 36 bit machines made a little more sense, but they still lacked the synergism that came from using a power of two, which is mainly why they no longer exist today. |
Copyright 2002 by Rob Storey
| Send EMail |