Friday, January 29, 2016

Alist vs. hash-table

An alist is a simple data structure that holds key-value pairs in a linked list. When a key is looked up, the list is searched to find it. The time it takes is proportional to the length of the list, or the number of entries.

A hash-table is a more complex data structure that holds key-value pairs in a set of "hash buckets". When a key is looked up, it is first "hashed" to find the correct bucket, then that bucket is searched for the entry. The time it takes depends on a number of things, the hash algorithm, the number of buckets, the number of entries in the bucket, etc. A hash-table can be faster than an alist because the hashing step is quick and the subsequent search step will have very few entries to search.

In theory, an alist takes time proportional to the number of entries, but a hash-table takes constant time independent of the number of entries. Let's find out if this is true for MIT/GNU Scheme.

I wrote a little program that measures how long it takes to look things up in an alist vs. a hash table. Here's what I measured:
It does indeed seem that alists are linear and hash tables are constant in lookup time. But the hashing step of a hash table does take time, so short alists end up being faster that hash tables. The breakeven point looks like a tad over 25 elements. So if you expect 25 or fewer entries, an alist will perform better than a hash table. (Of course different implementations will have different break even points.)

A tree data structure is slightly more complex than an alist, but simpler than a hash table. Looking up an entry in a tree takes time proportional to the logarithm of the number of entries. The logarithm function grows quite slowly, so a tree performs pretty well over a very large range of entries. A tree is slower than an alist until you have about 15 entries. At this point, the linear search of an alist cannot compete with the logarithmic search of a tree. The time it takes to search a tree grows, but quite slowly. It takes more than 100 entries before a tree becomes as slow as a hash table.
With a big enough tree, the growth is so slow that you can pretend it is constant.

1 comment:

Paul F. Dietz said...

I ran into the slowness of hashing in another context: the Common Lisp function INTERN. I was using this to convert strings to keywords when parsing json. I could speed up the conversion by special casing its for the common strings occurring as json field names, using someone's highly optimized STRING-CASE macro to build a fast decision tree.