I assumed that I wasn’t the first person to think that dynamic programming was a good way to match names, but perhaps I was the first to apply it to web forms in this way. I searched through published research papers and found a paper by Top, Dowla, and Gansemer called A Dynamic Programming Algorithm for Name Matching. I later found out that what I was trying to do was known as approximate string matching, and there was a note on Wikipedia which mentions that Sellers actually proposed using dynamic programming for approximate string matching in 1980.
The approach of Top, Dowla, and Gansemer was aimed at off-line use, so I was concerned that it would not be fast enough on moderately sized data sets to be a realistic option. It turns out that the comparison runs in
O(n * m) time and the backtracking runs in
O(n + m) time, which is not too bad. We will be comparing names, which tend to be short; if we assume that all of our names will be less than 32 characters (or whatever the size limit on the name field is), then we can take the upper bounds of the comparison time as a constant. I must admit that some of my peers did not like that I took what looks to be almost
O(n^2) (n * m = n^2; if n == m) and just turn it into
O(1). From my perspective the time it takes to make one comparison is bound by the time it takes to compare two strings which are the maximum length allowed in the name field; that time is constant depending upon which machine the comparison is run on. It’s no longer about the curve, it’s about how long it takes to compare the two longest possible strings. Perhaps I should work out a more rigorous mathematical proof though.
The real problem now is that we need to run that comparison for each name in our list of names, k number of times. So, now we have
If you would like more technical information, such as how to use this script, then go check out the fuzziac.js GitHub page.