"Fuzzy" Matches on Nicknames

Поиск
Список
Период
Сортировка
От Michael Sheaver
Тема "Fuzzy" Matches on Nicknames
Дата
Msg-id 18DF7A91-78F6-4F63-8A7E-BEBE3AEE7AC6@me.com
обсуждение исходный текст
Ответы Re: "Fuzzy" Matches on Nicknames  (rob stone <floriparob@gmail.com>)
Список pgsql-general
Greetings,

I have two tables that are populated using large datasets from disparate external systems, and I am trying to match
recordsby customer name between these two tables. I do not have any authoritative key, such as customerID or
nationalID,by which I can match them up, and I have found many cases where the same customer has different first names
inthe two datasets. A sampling of the differences is as follows: 

Michael <=> Mike
Tom <=> Thomas
Liz <=> Elizabeth
Margaret <=> Maggie

How can I build a query in PostgreSQL (v. 9.6) that will find possible matches like these on nicknames? My initial
guessis that I would have to either find or build some sort of intermediary table that contains associated names like
thoseabove. Sometimes though, there will be more than matching pairs, like: 

Jim <=> James <=> Jimmy <=> Jimmie
Bill <=> Will <=> Willie <=> William

and so forth.

Has anyone used or developed PostgreSQL queries that will find matches like these? I am running all my database
queries.on my local laptops (Win7 and macOS), so performance or uptime is no issue here. I am curious to see how others
inthis community have creatively solved this common problem. 

One of the PostgreSQL dictionaries (synonym, thesaurus etc.) might work here, but honestly I am clueless as to how to
setthis up or use it in queries successfully. 

Thanks,
Michael (aka Mike, aka Mikey)

В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: Index size
Следующее
От: rob stone
Дата:
Сообщение: Re: "Fuzzy" Matches on Nicknames