Murilo :P

C++, Computação, Programação, Web e afins :)

Posts Tagged ‘string

Fast and Easy Levenshtein distance using a Trie (in C++)

with 8 comments

I implemented this clever algorithm described on Steven Hanov’s blog in C++. Actually this algorithm is a bit different from the original because I wanted to know what is the least Levenshtein distance given a word.

#include <iostream>
#include <map>
#include <string>
#include <vector>
#include <algorithm>
#include <cctype>

/*
 * Algorithm: Edit distance using a trie-tree (Dynamic Programming)
 * Author: Murilo Adriano Vasconcelos <muriloufg@gmail.com>
 */

using namespace std;

// Trie's node
struct trie
{
	typedef map<char, trie*> next_t;

	// The set with all the letters which this node is prefix
	next_t next;

	// If word is equal to "" is because there is no word in the
	//	dictionary which ends here.
	string word;

	trie() : next(map<char, trie*>()) {}

	void insert(string w)
	{
		w = string("$") + w;
		
		int sz = w.size();
		
		trie* n = this;
		for (int i = 0; i < sz; ++i) {
			if (n->next.find(w[i]) == n->next.end()) {
				n->next[w[i]] = new trie();
			}

			n = n->next[w[i]];
		}

		n->word = w;
	}
};

// The tree
trie tree;

// The minimum cost of a given word to be changed to a word of the dictionary
int min_cost;

//
void search_impl(trie* tree, char ch, vector<int> last_row, const string& word)
{
	int sz = last_row.size();

	vector<int> current_row(sz);
	current_row[0] = last_row[0] + 1;

	// Calculate the min cost of insertion, deletion, match or substution
	int insert_or_del, replace;
	for (int i = 1; i < sz; ++i) {
		insert_or_del = min(current_row[i-1] + 1, last_row[i] + 1);
		replace = (word[i-1] == ch) ? last_row[i-1] : (last_row[i-1] + 1);

		current_row[i] = min(insert_or_del, replace);
	}

	// When we find a cost that is less than the min_cost, is because
	// it is the minimum until the current row, so we update
	if ((current_row[sz-1] < min_cost) && (tree->word != "")) {
		min_cost = current_row[sz-1];
	}

	// If there is an element wich is smaller than the current minimum cost,
	// 	we can have another cost smaller than the current minimum cost
	if (*min_element(current_row.begin(), current_row.end()) < min_cost) {
		for (trie::next_t::iterator it = tree->next.begin(); it != tree->next.end(); ++it) {
			search_impl(it->second, it->first, current_row, word);
		}
	}
}

int search(string word)
{
	word = string("$") + word;
	
	int sz = word.size();
	min_cost = 0x3f3f3f3f;

	vector<int> current_row(sz + 1);

	// Naive DP initialization
	for (int i = 0; i < sz; ++i) current_row[i] = i;
	current_row[sz] = sz;
	
	
	
	// For each letter in the root map wich matches with a
	// 	letter in word, we must call the search
	for (int i = 0 ; i < sz; ++i) {
		if (tree.next.find(word[i]) != tree.next.end()) {
			search_impl(tree.next[word[i]], word[i], current_row, word);
		}
	}

	return min_cost;
}

The modification for match all words within a given Leveshtein distance and returning that set is trivial. Just modify the line 72 for storing that word if its Leveshtein distance is less than or equal to a given distance.

For more details, read this: Fast and Easy Levenshtein distance using a Trie
See ya!

Advertisements

Written by Murilo Adriano

1 de February de 2011 at 21:46

Como converter um número para uma std::string e vice-versa

with 9 comments

Que a biblioteca <iostream> oferece inúmeras facilidades para manipular I/O nós já sabemos mas, como converter um valor para uma std::string?

A bliblioteca <iostream> nos permite converter facilmente qualquer coisa para uma std::string usando a seguinte sintaxe (o exemplo a seguir converte um double, mas você pode substituir por qualquer coisa imprimível usando o operador <<): Read the rest of this entry »

Written by Murilo Adriano

1 de October de 2008 at 00:41

Posted in C/C++, Programação

Tagged with , , , , ,