General guidance: Levenshtein distance versus other similarity algorithms

Поиск

Список

Период

Сортировка

От	Rachel Owsley
Тема	General guidance: Levenshtein distance versus other similarity algorithms
Дата	23 июля 2012 г. 20:17:12
Msg-id	81F2AED71E996746829AC866496B2EA361B364D956@MAIL-NASH01.edo.local обсуждение исходный текст
Ответы	Re: General guidance: Levenshtein distance versus other similarity algorithms (Merlin Moncure <mmoncure@gmail.com>)
Список	pgsql-general

Дерево обсуждения

Hi,

I am hoping you can give me some guidance here. I’m using postgresql 9.1.

Basically, I’m trying to create a query on a table of businesses that will return all similar matches to a business name. This is a huge table, and there is a lot of variation in names. The length of the string can be up to 255. I’ve used regex, but there are always some variations of the name that are missed when I do a regex. So I decided to look at distance measures.

Has anyone compared the fuzzstrmatch package to pgsimilarity?

Would the levenshtein function in postgresql be the best way to go here? If so, should I use levenshtein in the contribution package or install the pgsimilarity package? Has anyone tried both implementations?

This would be my query:

Select * from table

WHERE levenshtein (column_name,’Name of the business’) <= 3

ORDER BY levenshtein (column_name, ‘Name of the business’)

Limit 10;

Thank you so much for your help.

Rachel

В списке pgsql-general по дате отправления:

Предыдущее

От: Marcus Túlio Ramos
Дата: 23 июля 2012 г., 20:16:56
Сообщение: Reporting tool for Npgsql

Следующее

От: Guillaume Lelarge
Дата: 23 июля 2012 г., 20:47:16
Сообщение: Re: meaning of "waiting" column in pg_stat_activity?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

General guidance: Levenshtein distance versus other similarity algorithms

Предыдущее

Следующее