Re: How to find double entries

Поиск
Список
Период
Сортировка
От Volkan YAZICI
Тема Re: How to find double entries
Дата
Msg-id 87zlru72b5.fsf@alamut.mobiliz.com.tr
обсуждение исходный текст
Ответ на How to find double entries  (Andreas <maps.on@gmx.net>)
Список pgsql-sql
On Wed, 16 Apr 2008, Andreas <maps.on@gmx.net> writes:
> how can I find double entries in varchar columns where the content is
> not 100% identical because of a spelling error or the person
> considered it "looked nicer" that way?
>
> I'd like to identify and then merge records of e.g.   'google',
> gogle', 'guugle' 
>
> Then I want to match abbrevations like  'A-Company Ltd.', 'a company
> ltd.', 'A-Company Limited'
>
> Is there a way to do this?
> It would be OK just to list candidats up to be manually checked
> afterwards.

You can try something similar to below example. (levenshtein(text, text)
function is supplied by fuzzystrmatch module.)

SELECT T1.col, T2.col FROM tbl AS T1,      INNER JOIN tbl AS T2              ON T1.col <> T2.col AND
levenshtein(T1.col,T2.col) < (length(T1.col) * 0.5)
 


Regards.


В списке pgsql-sql по дате отправления:

Предыдущее
От: "Tena Sakai"
Дата:
Сообщение: Re: How to find double entries
Следующее
От: "Sumaya"
Дата:
Сообщение: Multiple databases