Join the Stack Overflow Community
Stack Overflow is a community of 6.9 million programmers, just like you, helping each other.
Join them; it only takes a minute:
Sign up

Is there any String Matching code or Algorithm which gives us the approximately matched string from the dictionay(contains pre-defined Set of Strings)?

For example: If there are 10 String in the dictionary (Set of Strings), if user input some String, then the algorithm should tell you the near matched string from the dictionary. Its would be great, if i get the matched string with the matched value (or percentage).

share|improve this question
    
Welcome to SO. How big is the lookup dictionary? – Vitaliy Sep 3 '12 at 9:02

I think it's better to use lucene library it has a package called org.apache.lucene.search.spell you can use it easily. it provide 3 algorithms NGramDistance, LevensteinDistance, JaroWinklerDistance. try this

share|improve this answer

You can calculate a Levenshtein distance between your String and strings in your dictionary to find the closest matches. This may not be the best for spell checking as it gives no favour to letter being swapped around or words which are phonetically similar. e.g. question is closer to resting than kwizchum.

For more examples, you can read up on http://en.wikipedia.org/wiki/Approximate_string_matching

share|improve this answer

I just wanted to add that StringUtils also has a convenient Levenshtein Distance method since version 3.0

public static int getLevenshteinDistance(CharSequence s,
                     CharSequence t)

After that it's just as simple as iterating through the collection and remembering the closest match:

public static Object findClosestMatch(Collection<?> collection, Object target) {
    int distance = Integer.MAX_VALUE;
    Object closest = null;
    for (Object compareObject : collection) {
        int currentDistance = StringUtils.getLevenshteinDistance(compareObject.toString(), target.toString());
        if(currentDistance < distance) {
            distance = currentDistance;
            closest = compareObject;
        }
    }
    return closest;
}

Note that the method above does require collection to be null-safe and toString() to be sensfully implemented.

share|improve this answer
    
Isn't there a "distance = currentDistance" missing just inside the if? – Adrián Navarro Oct 21 '15 at 9:40
    
Woops indeed, I've had to remove some specific code, seems I was too eager :) – Pieter De Bie Oct 21 '15 at 10:32

You can try Levenshtein Distance techinque.

The simple idea you have four basic operations:

  • Insertion (hell -> hell o)
  • Replacement (nice -> r ice)
  • Deletion (bowlin g -> bowlin)
  • Swapping (brohter -> bro th er)

You algorithm should calculate the distance between your word and every word in dictionary. The smallest distance means that this word matches more accurate with given input.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.