Re: extracting location info from string

Поиск
Список
Период
Сортировка
От Edward W. Rouse
Тема Re: extracting location info from string
Дата
Msg-id 0cb801cc195c$855493e0$8ffdbba0$@com
обсуждение исходный текст
Ответ на Re: extracting location info from string  (Lew <noone@lewscanon.com>)
Список pgsql-sql
I would think that changing the location column to hold a FK to a location table, and setting up the location table
withvarious columns for city, region, country and whatever else might be required would be the way to go. It reduces
columnbloat on the main table, provides reuse of location data and is easier to modify in the future. 

Edward W. Rouse


-----Original Message-----
From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Lew
Sent: Monday, May 23, 2011 12:25 AM
To: pgsql-sql@postgresql.org
Subject: Re: [SQL] extracting location info from string

On 05/22/2011 09:42 PM, Craig Ringer wrote:
> On 23/05/2011 9:11 AM, Andrej wrote:
>> On 23 May 2011 10:00, Tarlika Elisabeth Schmitz
>> <postgresql3@numerixtechnology.de> wrote:
>>> On Sun, 22 May 2011 21:05:26 +0100
>>> Tarlika Elisabeth Schmitz<postgresql3@numerixtechnology.de> wrote:
>>>
>>>> A column contains location information, which may contain any of the
>>>> following:
>>>>
>>>> 1) null
>>>> 2) country name (e.g. "France")
>>>> 3) city name, region name (e.g. "Bonn, Nordrhein-Westfalen")
>>>> 4) city name, Rg. region name (e.g. "Frankfurt, Rg. Hessen")
>>>> 5) city name, Rg region name (e.g. "Frankfurt, Rg Hessen")
>>>
>>>
>>> I also need to cope with variations of COUNTRY.NAME and REGION.NAME.

That isn't a table structure, that's a freeform text structure.  You didn't
state your question, Tarlika, but your database structure is terrible.  For
example, "region" and "country" should be different columns.  Really!

How you get your raw data into those columns can be interesting.

> This is a hard problem. You're dealing with free-form data that might be
> easily understood by humans, but relies on various contextual information and
> knowledge that makes it really hard for computers to understand.
>
> If you want to do a good job of this, your best bet is to plug in 3rd party
> address analysis software that is dedicated to this task. Most (all?) such

These aren't really addresses, as the OP presents them.

> packages are commercial, proprietary affairs. They exist because it's really,
> really hard to do this right.
>
>> Another thing of great import is whether the city can occur in the
>> data column all by itself; if yes, it's next to impossible to distinguish
>> it from a country.
>
> Not least because some places are both, eg:
>
> Luxembourg
> The Vatican
> Singapore
>
> (The Grand Duchy of Luxembourg has other cities, but still serves as an example).

And,of course, you have to distinguish the City of London from London.  New
York City comprises five boroughs (counties), each of which is itself a city.  (Brooklyn is one of the largest cities
inthe world all by itself.)  
"Region" has different meanings in different areas - it can mean part of a
county, or state / province, or nation, or continent.  "The Baltic region",
"the Northeast", "upstate", "the North Country", "Europe" are all regions.

The OP should share more about the semantics of their problem domain and
whether they really intend those table structures to be table structures.  Really?

--
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql



В списке pgsql-sql по дате отправления:

Предыдущее
От: Lew
Дата:
Сообщение: Re: extracting location info from string
Следующее
От: jasmin.dizdarevic@gmail.com
Дата:
Сообщение: Re: Which version of PostgreSQL should I use.