Binary Digital Data

 
The nature of digital data
The previous section refers to "data", but what is "data"? Data is information. It can be in the form of numbers, text, images, sound or a host of other forms, and somehow, all of these types of information can be manipulated by a computer. This is possible because ALL types of information can ultimately be represented as collections of the smallest unit of information, the "bit". Some items of data require billions of bits to represent them, while others need just a single bit. The point is that any item of information can be converted into a set of bits, and when in this form, the information can be manipulated by a digital computer in ways limited only by our imaginations. The conversion of all these forms of information into bits and back again is done by the computer's I/O devices. Such conversion mechanisms cover a large and complex field of study, well beyond the scope of this page. Here we are concerned with the theory and use of this information after conversion. 

A bit is the smallest possible unit of information, if you have less than one bit, you have no information at all.  An English language letter can have one of 26 values from "A" to "Z", an arabic digit can have fewer, one of 10 values "0" to "9". At the extreme is the bit, which can have only two values, usually referred to as "0" and "1". If we go one step further, we'd have a thing that is always "0", which coincidentally is how much information it would contain - none. A unit of information that is always the same tells us nothing, we need at least two possible states, which gives us the "bit". 

The two values of the bit can be interpreted as anything we wish, 0 or 1, Yes or No, True or False, Black and White or whatever. Sometimes this is all we need, but often we want to represent more information than a single bit can hold, so we use groups of bits. A group of bits can give many combinations of 0's and 1's, we just use as many bits as are required to give the number of combinations needed. One bit gives us two combinations, two bits gives four, three bits gives eight, and so on. If we need a million combinations to represent our information, we will need at least 20 bits. To represent a single decimal digit for example, eight combinations (3 bits) is not enough, we want ten, so 4 bits are required, which gives sixteen combinations. This is more than we need, so we just don't use the extra 6 combinations of these four bits, and consider them as invalid.

Just as decimal digits can be grouped together to form numbers, so too can groups of related bits, in this case each bit is called a "binary digit". "Binary", meaning "pertaining to two", because each digit can have only two values. In fact the word "bit" is a contraction of the words "Binary digIT". (The reader may care to work out what an information unit with 3 possible values might be called.)
Examples of binary notation  are "101010", :"01011101". and "1111111111111111".

The table below shows how many combinations are available with a given number of bits. As can be seen, each additional bit doubles the number of combinations available.


 
Number of bits Number of combinations
1
2
2 4
3 8
4 16
5 32
6 64
7 128
8 256
9 512
10
1,024
... ...
16 65,536
... ...
20 1,048,576

 
If we want to represent letters as well as digits, we need 36 combinations, which is just 4 more than five bits allows, so we need to go to six bits and 64 combinations. This will work, but it's also a waste, since almost half of the storage we would use to represent alphanumeric characters this way would have no meaning (the unused 28 combinations). Fortunately we can employ these unused combinations for other special characters and punctuation. If we also wanted both upper and lower case alphabets to be represented, we would need seven bits. 

We haven't yet seen which combinations of bits correspond to which characters. This correspondence is arbitrary, and can be anything we want, as long as we are consistent. But there are standard codes in existence such as ASCII (American Standard Code for Information Interchange) and EBCDIC (Extended Binary Coded Decimal Interchange Code) which define the relationships between bit combinations and characters, and we should use one of those instead of dreaming up our own. 
For example, the bit combinations for a few characters in both ASCII and EBCDIC are shown below:-
 

Character ASCII code EBCDIC code On Mars?
7 00110111 11110111 00000111
A 01000001 11000001 10000001
z 01011010 10101001 01000001
? 00111111 01101111 11001111

But pure integer numbers don't need to be encoded like this. A collection of binary digits (bits) by their nature can directly represent a number, just as decimal digits can. In both Decimal and Binary numbers, the rightmost digit is the "units" position, and each position to its left has an increasingly higher value. Each digit position is worth 10 times (decimal) or 2 times (binary)  the value of the position to its right. The number 10 or 2 here is called the "base" or "radix" of the numbering system being used.
The values of these digit positions are:-
 

Radix Values (Numerically)
10 (decimal) Thousands  Hundreds  Tens  Units 1000  100  10  1
2 (binary) Eight's Fours Twos Units 8  4  2  1

To accommodate larger numbers, we simply extend the digit positions leftwards as far as required, just as we earlier added bits until we had enough combinations to represent our piece of information. Each new digit position on the left is worth the radix times more than the position on its right.

To arrive at the value of the number overall, we multiply each digit by the value of the position it occupies, and add all these products together.
For example:-
 

Radix Number Expanded Sums of products Total value
Decimal 1234 1x1000 + 2x100 + 3x10 + 4x1 1000+200+30+4 1234 (no surprise there!)
Binary 11011 1x16 + 1x8 + 0x4 + 1x2 + 1x1 16+8+0+2+1 27

Notice that because the value of the digit positions increase much more slowly in binary as compared to decimal (by 2's instead of by 10's), many more digits are required to represent the same value. For example, the number "7094" in decimal requires 4 digits, whereas in binary it requires 13 digits ("1101110110110").  This is one of the biggest problems with binary data, while its natural for computers, it is cumbersome and hard to read for humans. Writing and reading information represented as binary would be an awkward and error prone process when carried out by humans. 

No problem, why not just use decimal all the time?  Decimal is a convenient way of describing groups of binary bits when they represent a numeric value, but in computers a group of bits may be representing all sorts of things, not just numbers. Furthermore, from the example above, can you say which of the bits in 1101110110110 correspond to the "9" in 7094? The answer to that is no. So although the decimal form is convenient, it's no good for identifying subgroups of the bits it represents.

Take for example the word "CAT", its 3 characters could be stored as bits, using the ASCII coding standard. The three groups would be:-
C=01000011 A=01000001 and T=01010100. Put together we get "010000110100000101010100", which is what the computer must use, but for us, it's much more cumbersome than "CAT". This pattern of bits could also be interpreted as a number, whose value is 4407636. While this number corresponds exactly to the bit pattern above, it is a very strange and unnatural way to say "CAT". 

Alternatively, consider the number 21831, its binary form is "101010101000111", but this could be also interpreted as two ASCII characters, giving "UG". Again, an unclear way to specify this number. If we know what type of data is being represented, we can say "CAT" instead of  010000110100000101010100 or  4407636. And we can say "21831" instead of 101010101000111 or UG. But to use the most correct and concise form, we have to know which way to interpret the data.

Sometimes the most concise form is the binary form. Suppose that we have seven bits that represent which rooms in a house currently have the light switched on. In this case "1000110" is more concise than "the lights are on in the kitchen, the lounge room and the master bedroom". Treating this bit pattern as characters gives us "F", and as a number gives us 70. Both of these interpretations are meaningless. 

What we need is a more concise way of writing down binary information, but without needing to know how that information has been encoded.

Decimal and Binary are not the only numbering systems. Such systems can be based on any radix, but some radixes are more useful than others. The two most common are octal and  hexadecimal, with radixes 8 and 16 respectively. Octal made some sense years ago, on computers that used 6 bits to represent a character, but these days hexadecimal, or simply "hex" is the most useful. 

Hex digits can have one of 16 values, corresponding to 0 through 15. The characters used to represent these 16 values are  "0" to "9" and "A" to "F", ie."0123456789ABCDEF". Each hex digit correspondings to a group of four bits and their 16 combinations. The table below shows all the bit patterns discussed above using hexadecimal notation.
 

Actual meaning Binary Hex Character Decimal
7094
1101110110110
1BB6
invalid
7094
CAT
010000110100000101010100
434154
CAT
4407636
21831
101010101000111
5547
UG
21831
House Lights
1000110
46
F
70

The advantages of hexadecimal notation are:-

  • Values are much more concise than the binary, and even a little more than the decimal. 
  • They are never invalid. (larger groups of bits would eventually exceed  practical limits and become invalid in decimal)
  • The individual bits that each hex digit represents are easily identified, and vice versa. 
  • When various actual data types are stored together, there is no need to know where the boundaries between the items are.
  • All data types become equally meaningless when represented as hex! (Just as the underlying binary bits usually are)

 
 
Digital data and computers
It's easy to see that this can get complex and awkward if we use different numbers of bits for each different type of data we wish to represent. To keep things simple, bits are generally arranged and manipulated in groups of eight, known as "bytes". This is a useful compromise between considerations of simplicity, wastage and flexibility. The byte is also a practical unit because the 256 combinations of its eight bits is enough to represent all characters and special symbols. All modern computers work with byte sized units. 

In the days of the early machines, memory was more limited and expensive; the luxury of eight bit units could not be afforded. Units of 6 bits were used to represent a character, however this did not allow for lower case letters, and there was not even enough combinations left over to cater for all the special symbols and punctuation.

When a computer transfers information between its CPU and memory, the part of the CPU that sends or receives the data is called a "register", which is an array of bits. Different computers have different numbers of registers in their CPU's, and there may also be various types of registers in any given CPU. Some registers have special purposes, and some are called "general purpose registers", which can be used in whatever way the programmer chooses. The various special and general purpose registers may each have different numbers of bits, but computers are generally classified by the number of bits in their general purpose registers. 

This number of bits in the general purpose registers is called the "word length" of the computer. Over the years many designs using various different word lengths were built, the most common sizes being 8,12,16,18,32,36 and 64 bits. With experience, it was realized that synergisms arose, and that "things just fitted together better" when the word length was a power of two. This is not surprising considering that the fundamental bit is has two states, and combinations of them give numbers of combinations that are a power of two. Nowadays, all computers have word lengths that are a power of two, i.e.. 8,16 32 or 64 bits. It's also no accident that these sizes are also multiples of eight, the size of a byte. In the old days, when six bits were used to represent a character, the 12,18 and 36 bit machines made a little more sense, but they still lacked the synergism that came from using a power of two, which is mainly why they no longer exist today. 

Copyright 2002 by Rob Storey

 Back to "Evolution of Computer memory"



 
 
 
 

My RetroComputing home page Intabits home page Send EMail