Putting spell check in 64k of RAM
Abhinav Upadhyay walks us through a wonderful bit of computer history. He talks about how Steve Johnson at AT&T wrote one of the first spell checkers. His method could encode a word in just 14 bits of memory; so a dictionary with 30,000 entries would take up a fraction under 52 kB. This is even better compression than gzip – and it can perform fast lookups.
Once the dictionary grew to 30,000 words, the Bloom filter approach became impractical. Douglas McIlroy’s solution was to store differences between sorted hash codes , after discovering these differences followed a geometric distribution. These followed a distribution that could be easily run length encoded with something called Golomb’s code.
It’s a fantastic examination of applied computer science. Definitely worth a read
Articles: