How efficiently does Morse code encode letters?

telegraph

Morse code was designed so that the most frequently used letters have the shortest codes. In general, code length increases as frequency decreases.

How efficient is Morse code? We’ll compare letter frequencies based on Google’s research with the length of each code, and make the standard assumption that a dash is three times as long as a dot.

|--------+------+--------+-----------|
| Letter | Code | Length | Frequency |
|--------+------+--------+-----------|
| E      | .    |      1 |    12.49% |
| T      | -    |      3 |     9.28% |
| A      | .-   |      4 |     8.04% |
| O      | ---  |      9 |     7.64% |
| I      | ..   |      2 |     7.57% |
| N      | -.   |      4 |     7.23% |
| S      | ...  |      3 |     6.51% |
| R      | .-.  |      5 |     6.28% |
| H      | .... |      4 |     5.05% |
| L      | .-.. |      6 |     4.07% |
| D      | -..  |      5 |     3.82% |
| C      | -.-. |      8 |     3.34% |
| U      | ..-  |      5 |     2.73% |
| M      | --   |      6 |     2.51% |
| F      | ..-. |      6 |     2.40% |
| P      | .--. |      8 |     2.14% |
| G      | --.  |      7 |     1.87% |
| W      | .--  |      7 |     1.68% |
| Y      | -.-- |     10 |     1.66% |
| B      | -... |      6 |     1.48% |
| V      | ...- |      6 |     1.05% |
| K      | -.-  |      7 |     0.54% |
| X      | -..- |      8 |     0.23% |
| J      | .--- |     10 |     0.16% |
| Q      | --.- |     10 |     0.12% |
| Z      | --.. |      8 |     0.09% |
|--------+------+--------+-----------|

There’s room for improvement. Assigning the letter O such a long code, for example, was clearly not optimal.

But how much difference does it make? If we were to rearrange the codes so that they corresponded to letter frequency, how much shorter would a typical text transmission be?

Multiplying the code lengths by their frequency, we find that an average letter, weighted by frequency, has code length 4.5268.

What if we rearranged the codes? Then we would get 4.1257 which would be about 9% more efficient. To put it another way, Morse code achieved 91% of the efficiency that it could have achieved with the same codes. This is relative to Google’s English corpus. A different corpus would give slightly different results.

Toward the bottom of the table above, letter frequencies correspond poorly to code lengths, though this hardly matters for efficiency. But some of the choices near the top of the table are puzzling. The relative frequency of the first few letters has remained stable over time and was well known long before Google. (See ETAOIN SHRDLU.) Maybe there were factors other than efficiency that influenced how the most frequently used characters were encoded.

Update: Some sources I looked at said that a dash is three times as long as a dot, including the space between dots or dashes. Others said there is a pause as long as a dot between elements. The latter is the official standard of the International Telecommunications Union.

If you use the official timing, it takes an average time equal to 6.0054 dots to transmit an English letter, and this could be improved to 5.6616. By that measure Morse code is about 93.5% efficient. (I only added time for space inside the code for a letter because the space between letters is the same no matter how they are coded.)

14 thoughts on “How efficient is Morse code?”

Aaron Soley

8 February 2017 at 09:19

A dash is considered three times as long as a dot, but the spaces between dots/dashes within a letter are also considered as long as a dot. So N should be length 5 and H length 7, rather than both being 4.

Michael Haufe

8 February 2017 at 09:40

Why didn’t you include numbers?

Ed Davies

8 February 2017 at 09:54

You forgot the spaces: A (· —) is 5, not 4, dots long including the space between the dot and the dash.

John

8 February 2017 at 10:23

@Michael: I just wanted to look at text because that’s simpler. Some messages are dense with numbers, some have no numbers at all. The relative frequency of letters and digits would vary greatly by context.

@Aaron, @Ed: Some sources I looked at said that a dash is 3x as long as a dot, including the space between them. Others didn’t. I updated the post to include measuring the time as you suggested.

Jens Meyer

8 February 2017 at 10:35

Preface: I learned Morse code in the military (Navy) and used it for long-range communications (ship-to-ship and ship-to-shore).
I have to second the previous comments in that spaces between words need to be considered as well.
However, the analysis breaks apart because plain text communications via Morse code is actually not it’s primary function. A lot of Morse code based communications use what we would now call tokens, i.e. in my world it was three character words sometimes followed by numeric values. Secondly, communications would typically happen in an encrypted fashion, in our case the encrypted text was based on five-character words, this will then mess up your letter frequency ranking. Lastly, you will need to differentiate between the American Morse code and the ITU standard codes, as they use slightly different code tables.

Ghassan chammaz

8 February 2017 at 12:47

Hi. As a radio ham i must object on the dash being length 3. It is actually length 2.you can count its time. A dot produce a DIH and a dash produces a DAH and not a DAAH.
Ghassan AC2RA

Bob Cunningham

8 February 2017 at 13:58

Another factor to consider is robustness in the presence of noise, and how well a code takes advantage of human hearing.

Are there indications of such optimization within Morse code? Is there data indicating which codes are most often missed and/or confused?

Despite years of effort, my ability to read Morse was terrible compared to my uncle’s. He could easily pull messages out of what sounded like white noise to me, despite my having more acute hearing and perfect pitch.

We did an experiment to compare our relative abilities to decode a blinking LED, and we both sucked equally. The eyes clearly have a lower sample rate.

Yet I could send much faster with a noticeably lower error rate. I could read my own keying at full-rate (for error detection), but not other folks. Go figure.

Ed Davies

8 February 2017 at 16:57

@AC2RA, I think this is about as official as it gets:

http://www.itu.int/rec/R-REC-M.1677-1-200910-I/

“2 Spacing and length of the signals
2.1 A dash is equal to three dots.
2.2 The space between the signals forming the same letter is equal to one dot.
2.3 The space between two letters is equal to three dots.
2.4 The space between two words is equal to seven dots.”

Bill Wear

23 February 2017 at 15:02

An earlier comment was correct, in that the letter frequency in Morse does not correspond in any way to the frequency in normal communication. Most messages start out with “CQ” (both four-pip letters), and almost always include such standard abbreviations as “WX” (e.g., “WX rainy and overcast”), “AR” (over), “BK” (go ahead, your turn), and the infamous “QR” signals (“QRG”,”QRH”). The big point, though, is that Morse isn’t as much about efficiency of signal as efficiency of *communication*. “CQ” is hard to mistake for something else. “SOS” is probably known to two-thirds of the general population. The Q codes are easily recognizable, because they’re never followed by a “U”. You can get a better idea of pip frequency at http://www.hamuniverse.com/qsignals.html. Maybe there’s a followup article from a different slant, like how the pips are arranged for clarity of message with fewer repeats?

John

23 February 2017 at 15:48

I hadn’t heard of Q codes until I read Neal Stephenson’s book Seveneves last year. They come up a few times in that book.

Russell Finn

23 February 2017 at 16:49

Regarding the relative inefficiency of the letter O in “International” Morse, it should be noted that in the older “American” Morse, O was one of the letters that contained a “long” internal gap; it was encoded as dit–dit and so its length was only 4. (Compare the letter I, which is dit-dit of length 3, and the “word” EE, which is dit—dit.)

When International Morse eliminated the long internal gap, O had to be re-encoded, and dahdahdah was chosen. Other such letters went to four-element codes, with the exception of R which “stole” F’s encoding. (See the Wikipedia article on American Morse for details.)

— K3RSF

Russell Finn

23 February 2017 at 16:53

The website re-encoded my consecutive hyphens as en- and em-dashes, obscuring my intent; let me try again (with * for dit and ” for a space of the same length):
Letter I = *”*
Letter O = *””*
“Word” EE = *”””*

Bill K

23 February 2017 at 20:31

I agee with many of the posts.. But I do wonder the point of such a discussion. Like at least one other person above I learnt CW in the military. It is morse CODE.. not morse speak. Most of what was sent and received was in 5 letter code groups. Very strict procedural rules, Q codes for even further brevity ( OPS BREVITY ). Unless discussing machine generated morse, the difference between a DOT and a DASH is rather dependant upon the individual. Anyway – thanks for generating this – CW is becoming or has become something of a lost art – long may it live! While military communications moved away from CW in the mid 1970s and became digitised, satellite based, even internet based, I still love it and have recently got into amateur radio for the very purpose of getting back on the bike. I do find the use of high speed plain language a challenge – so used to logging an incoming message, then logging then sending a brief reply.. Having to think on my feet and send as though I am typing or speaking – well it will take practice – and whether it is more or less efficient, or whether my dashes are 3 times the length of my dots, well frankly as long as I can communicate and enjoy doing so, I don’t care!

But thanks for the question! Long Live CW! (Straight key of course!!!)

23 June 2023 at 12:28

Responding to Bill K, I see “Morse Code” as “Morse Encoding”, that is, an encoding of the English alphabet into sounds or symbols of sounds (. _), analogous to ASCII “encoding” (or “code”), one of many encodings of characters in the computer industry. The fact that it was used a lot to send encrypted messages is not, IMO, why it is called “code”, but rather because of its encoding of an alphabet and punctuation. However, the point that efficiency depends on what is sent is a good point. Encrypted messages tend to be frequency-neutral, whereas typical conservations would have a distinct frequency profile. But John Cook’s analysis is still very interesting pertaining to the content he was talking about.

Comments are closed.