Re: [GENERAL] "Shared strings"-style table

Поиск
Список
Период
Сортировка
От Peter J. Holzer
Тема Re: [GENERAL] "Shared strings"-style table
Дата
Msg-id 20171013175935.wx4cscgn65nbwz3l@hjp.at
обсуждение исходный текст
Ответ на [GENERAL] "Shared strings"-style table  (Seamus Abshere <seamus@abshere.net>)
Список pgsql-general
On 2017-10-13 12:49:21 -0300, Seamus Abshere wrote:
> In the spreadsheet world, there is this concept of "shared strings," a
> simple way of compressing spreadsheets when the data is duplicated in
> many cells.
>
> In my database, I have a table with >200 million rows and >300 columns
> (all the households in the United States). For clarity of development
> and debugging, I have not made any effort to normalize its contents, so
> millions of rows have, for example, "SINGLE FAMILY RESIDENCE /
> TOWNHOUSE" (yes, that whole string!) instead of some code representing
> it.
>
> Theoretically / blue sky, could there be a table or column type that
> transparently handles "shared strings" like this, reducing size on disk
> at the cost of lookup overhead for all queries?

Theoretically it's certainly possible and I think some column-oriented
databases store data that way.

> (I guess maybe it's like TOAST, but content-hashed and de-duped and not
> only for large objects?)

Yes, but if you want to autmatically delete entries which are no longer
needed you need to keep track of that. So either a reference count or an
index lookup on the parent table. This is starting to look a lot like a
foreign key - just hidden from the user. Performance would probably be
similar, too.

We have done something similar (although for different reasons). We
ended up doing the "join" in the application. For most purposes we don't
need the descriptive strings and when we need them we can do a
dictionary lookup just before sending them to the client (of course the
dictionary has to be read from the database, too, but it doesn't change
that often, so it can be cached). And from a software maintainability
POV I think a dictionary lookup in Perl is a lot nicer than 50 joins
(or 300 in your case).
       hp

--   _  | Peter J. Holzer    | we build much bigger, better disasters now
|_|_) |                    | because we have much more sophisticated
| |   | hjp@hjp.at         | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>

В списке pgsql-general по дате отправления:

Предыдущее
От: Melvin Davidson
Дата:
Сообщение: Re: [GENERAL] "Shared strings"-style table
Следующее
От: "Igal @ Lucee.org"
Дата:
Сообщение: Re: [GENERAL] Permissions for Web App