[Week Nine] Programming Contest - Levenshtein Distance

Welcome back to week nine of the weekly programming contests! If you are new to the contest or need a refresher don’t forget to read up on the rules and information Posted Here

This weeks challenge is to implement the method public static int lev(String s1, String s2) (the method and vars can be renamed). This method returns the Levenshtein Distance of the two method parameters. Read the link if you don’t know what levenshtein distance is.

Copied from Wikipedia:

The Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other

Examples:
lev("fluff", "puff") -> 2 lev("hello", "goodbye") -> 7 lev("harry", "potter") -> 6 lev("mill", "bill") -> 1 lev("death", "smiles") -> 6 lev("blood", "red") -> 4 lev("ghost", "protocol") -> 7 lev("iron", "man") -> 3 lev("sponge", "powered") -> 5

Imports are allowed for this challenge!

Challenge Closes: Sunday 1/3/16

Good luck!

Using two matrices:

public static int d(String s, String t){int sL=s.length();int tL=t.length();if(sL*tL==0)return sL==0?tL:sL;int[]v0=new int[tL+1],v1=new int[tL+1];for(int i=0;i<v0.length;i++)v0[i]=i;for(int i=0;i<sL;i++){v1[0]=i+1;for(int j=0;j<tL;j++){int cost=(s.charAt(i)==t.charAt(j))?0:1;v1[j+1]=Math.min(v1[j]+1,Math.min(v0[j+1]+1,v0[j]+cost));}System.arraycopy(v1,0,v0,0,v0.length);}return v1[tL];}

388 characters

Legible version:

    public static int d(String s, String t) {
        int sL = s.length();
        int tL = t.length();
        if (sL * tL == 0) {
            return sL == 0 ? tL : sL;
        }
        int[] v0 = new int[tL + 1], v1 = new int[tL + 1];
        for (int i = 0; i < v0.length; i++) {
            v0[i] = i;
        }
        for (int i = 0; i < sL; i++) {
            v1[0] = i + 1;
            for (int j = 0; j < tL; j++) {
                int cost = (s.charAt(i) == t.charAt(j)) ? 0 : 1;
                v1[j + 1] = Math.min(v1[j] + 1, Math.min(v0[j + 1] + 1, v0[j] + cost));
            }
            System.arraycopy(v1, 0, v0, 0, v0.length);
        }
        return v1[tL];
    }

686 characters (including whitespace)

Unit testing:

import static org.hamcrest.core.Is.is;
import static org.junit.Assert.assertThat;

import org.junit.Test;

    @Test
    public void test() {
        final int distance = d("fluff", "puff");
        assertThat(distance, is(2));

        final int hello = d("hello", "goodbye");
        assertThat(hello, is(7));

        final int harry = d("harry", "potter");
        assertThat(harry, is(6));

        final int test = d("", "Test");
        assertThat(test, is(4));

        final int mill = d("mill", "bill");
        assertThat(mill, is(1));

        final int death = d("death", "smiles");
        assertThat(death, is(6));

        final int blood = d("blood", "red");
        assertThat(blood, is(4));

        final int ghost = d("ghost", "protocol");
        assertThat(ghost, is(7));

        final int iron = d("iron", "man");
        assertThat(iron, is(3));

        final int sponge = d("sponge", "powered");
        assertThat(sponge, is(5));
    }

Recursive method, 226 characters long:

public static int lev(String a,String b){int c=a.length(),d=b.length();if(c*d==0)return c==0?d:c;String e=a.substring(1),f=b.substring(1);return Math.min(Math.min(lev(e,b),lev(a,f))+1,lev(e,f)+(a.charAt(0)==b.charAt(0)?0:1));}

Legible version:

public static int lev(String a, String b) {
    int aLength = a.length(), bLength = b.length();
    // Shortened version of this (it works when the variable names are shortened to 1 character names):
    // if (aLength == 0) return bLength;
    // if (bLength == 0) return aLength;
    if (aLength * bLength == 0) return aLength == 0 ? bLength : aLength;

    String aShortened = a.substring(1), bShortened = b.substring(1);
    return Math.min(Math.min(lev(aShortened, b), lev(a, bShortened)) + 1, lev(aShortened, bShortened) + (a.charAt(0) == b.charAt(0) ? 0 : 1));
}

Where are you getting the variables e & f in lev(e, f)?

e and f are cached return values of String.substring().

String e=a.substring(1),f=b.substring(1);

In the legible version:

String aShortened = a.substring(1), bShortened = b.substring(1);

You left them in your legible version :wink:

Fixed.

Another recursive one, 212 characters:

public static int d(String a,String b){int s=a.length(),t=b.length(),m=s>t?s:t,d,i=0,j;for(;i<s;i++)for(j=0;j<t;j++)m=a.charAt(i)!=b.charAt(j)?m:(d=(i>j?i:j)+d(a.substring(i+1),b.substring(j+1)))<m?d:m;return m;}

It’s based off of this:

public static int minSteps(String a, String b) {
    int minSteps = Math.max(a.length(), b.length());
    for (int i = 0; i < a.length(); i++) {
        for (int j = 0; j < b.length(); j++) {
            if (a.charAt(i) == b.charAt(j)) {
                int numSteps = Math.max(i, j) + minSteps(a.substring(i + 1), b.substring(j + 1));
                if (numSteps < minSteps)
                    minSteps = numSteps;
            }
        }
    }
    return minSteps;
}

Worst case, the strings have no common characters, so the distance is computed by max(a.length, b.length). That is the purpose of the first line. It then loops through each pair of characters in the strings. When it finds a common character, the character becomes the “pivot point”: it’s a character that is not substituted nor deleted nor inserted. More pivot points are found by calling recursively and passing the substrings occurring after the pivot point. The characters between the pivot points are all substituted or deleted or inserted, so contribute max(a.length, b.length) to the distance. All combinations of pivot points are considered and the combination with the shortest distance wins. For example, the ‘r’ in “potter” could match with any one of the 'r’s in “harry” or none at all. The choice of pivot points makes a difference, so we need to try them all.

ha---rry
potter--
-----^--
distance: 7

har--ry
potter-
-----^-
distance: 6

harry-
potter
------
distance: 6

I believe it’s 1/3/16 btw ^^

Nah, this one ends in 99 years :wink: lol fixed.

1 Like

Congratulations to @eah for winning this weeks programming challenge!

The next one should be up shortly.