Обсуждение: Composite type storage overhead

Поиск
Список
Период
Сортировка

Composite type storage overhead

От
Laiszner Tamás
Дата:
Hey there,

I am currently exploring the options to utilize 128-bit numeric primary keys. One of the options I am looking at is to store them as composites of two 64-bit integers.

The documentation page on composite types does not tell too much about the internal storage, so I've made my own experiment:

CREATE TYPE entity_id AS
(​
    high bigint,​
    low bigint​
);

CREATE TABLE composite_test
(​
    entity_id entity_id NOT NULL,​
    CONSTRAINT composite_test_pkey PRIMARY KEY (entity_id)​
)

INSERT INTO composite_test (entity_id) VALUES (ROW(0, 0));

Now, as I am really interested in keeping the indexes compact, tried pageinspect to find out what's going on internally:

SELECT * FROM  bt_page_items(get_raw_page('composite_test_pkey', 1));

It seems wrapping the values into a composite type has a pretty significant storage overhead, as the index entry has a total size of 48 bytes, end the data look like this:

4b ff ff ff ff fa 40 00 00 ff ff ff ff 00 00 02 00 00 00 18 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

For comparison, when simply using a composite primary key of two columns, each index entry has a length of only 24 bytes - a 100% overhead from wrapping the values in a composite type.

Now, I understand there might be valid reasons to store a structure header alongside the plain data - e. g. to store version information so when the type is altered there is no need to rebuild the whole table.

However, I also think this should be highlighted in the documentation. (If it already is I apologise.)

Also, I would like ask if there is a way to instruct the storage engine to omit the housekeeping information and simply store the plain data, even if it comes with drawbacks.

I would highly appreciate any comments or additional information on this topic.

Best regards,
Tamas

Re: Composite type storage overhead

От
Rob Sargent
Дата:


On Oct 23, 2019, at 1:32 PM, Laiszner Tamás <t.laiszner@outlook.com> wrote:

Hey there,

I am currently exploring the options to utilize 128-bit numeric primary keys. One of the options I am looking at is to store them as composites of two 64-bit integers.

The documentation page on composite types does not tell too much about the internal storage, so I've made my own experiment:

CREATE TYPE entity_id AS
(​
    high bigint,​
    low bigint​
);

CREATE TABLE composite_test
(​
    entity_id entity_id NOT NULL,​
    CONSTRAINT composite_test_pkey PRIMARY KEY (entity_id)​
)

INSERT INTO composite_test (entity_id) VALUES (ROW(0, 0));

Now, as I am really interested in keeping the indexes compact, tried pageinspect to find out what's going on internally:

SELECT * FROM  bt_page_items(get_raw_page('composite_test_pkey', 1));

It seems wrapping the values into a composite type has a pretty significant storage overhead, as the index entry has a total size of 48 bytes, end the data look like this:

4b ff ff ff ff fa 40 00 00 ff ff ff ff 00 00 02 00 00 00 18 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

For comparison, when simply using a composite primary key of two columns, each index entry has a length of only 24 bytes - a 100% overhead from wrapping the values in a composite type.

Now, I understand there might be valid reasons to store a structure header alongside the plain data - e. g. to store version information so when the type is altered there is no need to rebuild the whole table.

However, I also think this should be highlighted in the documentation. (If it already is I apologise.)

Also, I would like ask if there is a way to instruct the storage engine to omit the housekeeping information and simply store the plain data, even if it comes with drawbacks.

I would highly appreciate any comments or additional information on this topic.

Best regards,
Tamas
Why not use UUID type?

Re: Composite type storage overhead

От
Laiszner Tamás
Дата:
That's an absolutely reasonable suggestion. I am still in the exploration phase so while this solution is not completely ruled out, I have some concerns about it:
  1. Although it does not enforce, but the UUID type kind of suggests a specific interpretation of the data. Of course the documentation says you are free to use any algorithm to generate the values, but there are quite a few standard UUID types and we are not planning to use any of them.
  2. The serialization format is different than needed by the application and, while once again this is not a hard technical barrier, that might cause slight additional complexity and confusion.
  3. The value is logically defined as a 128-bit integer, that is in itself a compound value split into a few "bit groups". Extracting these parts can be done by simple (and supposedly efficient) bitwise operators when stored as integer, but becomes much more cumbersome with UUID, I guess.

Feladó: Rob Sargent <robjsargent@gmail.com>
Elküldve: 2019. október 23., szerda 22:58
Címzett: Laiszner Tamás <t.laiszner@outlook.com>
Másolatot kap: pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Tárgy: Re: Composite type storage overhead
 


On Oct 23, 2019, at 1:32 PM, Laiszner Tamás <t.laiszner@outlook.com> wrote:

Hey there,

I am currently exploring the options to utilize 128-bit numeric primary keys. One of the options I am looking at is to store them as composites of two 64-bit integers.

The documentation page on composite types does not tell too much about the internal storage, so I've made my own experiment:

CREATE TYPE entity_id AS
(​
    high bigint,​
    low bigint​
);

CREATE TABLE composite_test
(​
    entity_id entity_id NOT NULL,​
    CONSTRAINT composite_test_pkey PRIMARY KEY (entity_id)​
)

INSERT INTO composite_test (entity_id) VALUES (ROW(0, 0));

Now, as I am really interested in keeping the indexes compact, tried pageinspect to find out what's going on internally:

SELECT * FROM  bt_page_items(get_raw_page('composite_test_pkey', 1));

It seems wrapping the values into a composite type has a pretty significant storage overhead, as the index entry has a total size of 48 bytes, end the data look like this:

4b ff ff ff ff fa 40 00 00 ff ff ff ff 00 00 02 00 00 00 18 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

For comparison, when simply using a composite primary key of two columns, each index entry has a length of only 24 bytes - a 100% overhead from wrapping the values in a composite type.

Now, I understand there might be valid reasons to store a structure header alongside the plain data - e. g. to store version information so when the type is altered there is no need to rebuild the whole table.

However, I also think this should be highlighted in the documentation. (If it already is I apologise.)

Also, I would like ask if there is a way to instruct the storage engine to omit the housekeeping information and simply store the plain data, even if it comes with drawbacks.

I would highly appreciate any comments or additional information on this topic.

Best regards,
Tamas
Why not use UUID type?

Re: Composite type storage overhead

От
Rob Sargent
Дата:


On 10/23/19 3:24 PM, Laiszner Tamás wrote:
P {margin-top:0;margin-bottom:0;}
That's an absolutely reasonable suggestion. I am still in the exploration phase so while this solution is not completely ruled out, I have some concerns about it:
  1. Although it does not enforce, but the UUID type kind of suggests a specific interpretation of the data. Of course the documentation says you are free to use any algorithm to generate the values, but there are quite a few standard UUID types and we are not planning to use any of them.
  2. The serialization format is different than needed by the application and, while once again this is not a hard technical barrier, that might cause slight additional complexity and confusion.
  3. The value is logically defined as a 128-bit integer, that is in itself a compound value split into a few "bit groups". Extracting these parts can be done by simple (and supposedly efficient) bitwise operators when stored as integer, but becomes much more cumbersome with UUID, I guess.

Feladó: Rob Sargent <robjsargent@gmail.com>
Elküldve: 2019. október 23., szerda 22:58
Címzett: Laiszner Tamás <t.laiszner@outlook.com>
Másolatot kap: pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Tárgy: Re: Composite type storage overhead
 


On Oct 23, 2019, at 1:32 PM, Laiszner Tamás <t.laiszner@outlook.com> wrote:

Hey there,

I am currently exploring the options to utilize 128-bit numeric primary keys. One of the options I am looking at is to store them as composites of two 64-bit integers.

I would highly appreciate any comments or additional information on this topic.

Best regards,
Tamas
Why not use UUID type?


Putting logic and meaning into primary keys?  To what end, I wonder.

Re: Composite type storage overhead

От
Thomas Kellerer
Дата:
> 3. The value is logically defined as a 128-bit integer, that is in
> itself a compound value split into a few "bit groups". Extracting
> these parts can be done by simple (and supposedly efficient) bitwise
> operators when stored as integer, but becomes much more cumbersome
> with UUID, I guess.

This is usually a bad idea. 

Putting logic into the primary key value and merging different types of information in a single column is typically not
sucha good idea. 
 
(And it violates first normal form to begin with) 

I would strongly recommend to revisit this idea, and e.g. think about a multi-column primary key instead. Where each of
these"groups" are stored in a separate column where the actual (business) value can be retrieved without any
bitshiftingor similar operations. 
 

Thomas





Re: Composite type storage overhead

От
Laiszner Tamás
Дата:

Thanks for the suggestion to split up the primary key into components. But even going down this way, packing the components into one superstructure (composite type) would be beneficial as the same scheme is used across multiple tables. And we are back at the original problem.

Feladó: Thomas Kellerer <spam_eater@gmx.net>
Elküldve: 2019. október 24., csütörtök 8:19
Címzett: pgsql-general@lists.postgresql.org <pgsql-general@lists.postgresql.org>
Tárgy: Re: Composite type storage overhead
 
> 3. The value is logically defined as a 128-bit integer, that is in
> itself a compound value split into a few "bit groups". Extracting
> these parts can be done by simple (and supposedly efficient) bitwise
> operators when stored as integer, but becomes much more cumbersome
> with UUID, I guess.

This is usually a bad idea.

Putting logic into the primary key value and merging different types of information in a single column is typically not such a good idea.
(And it violates first normal form to begin with)

I would strongly recommend to revisit this idea, and e.g. think about a multi-column primary key instead. Where each of these "groups" are stored in a separate column where the actual (business) value can be retrieved without any bitshifting or similar operations.

Thomas






Re: Composite type storage overhead

От
Michael Lewis
Дата:
On Thu, Oct 24, 2019 at 3:35 AM Laiszner Tamás <t.laiszner@outlook.com> wrote:

Thanks for the suggestion to split up the primary key into components. But even going down this way, packing the components into one superstructure (composite type) would be beneficial as the same scheme is used across multiple tables. And we are back at the original problem.


This is probably a completely naive question, but why not store this in a text field?

Re: Composite type storage overhead

От
George Neuner
Дата:
On Wed, 23 Oct 2019 21:24:58 +0000, Laiszner Tamás
<t.laiszner@outlook.com> wrote:

 Rob Sargent <robjsargent@gmail.com> wrote
>> Why not use UUID type?

> 1.
> Although it does not enforce, but the UUID type kind of suggests a
> specific interpretation of the data. Of course the documentation says
> you are free to use any algorithm to generate the values, but there
> are quite a few standard UUID types and we are not planning to use
> any of them.

The usual interpretation suggests an IP address and a timestamp, but
there is nothing that /requires/ that interpretation.  The value is
just a large integer and you can treat it that way if you want.

Postgresql doesn't check UUID data for any particular format.  You can
store arbitrary integer values into UUID columns - with the caveat
that, as UUIDs, you can only compare them and can't (easily) do any
arithmetic.


> 2.
> The serialization format is different than needed by the application
> and, while once again this is not a hard technical barrier, that
> might cause slight additional complexity and confusion.

Not sure what is the issue here.  Admittedly I don't know how pglib
passes binary UUIDs[*] but ceratinly they can be sent as strings if
necessary.

[*] Off the top of my head, I would guess a struct or array of four
32-bit values, but truly I haven't looked.


>  3.
> The value is logically defined as a 128-bit integer, that is in itself
> a compound value split into a few "bit groups". Extracting these
> parts can be done by simple (and supposedly efficient) bitwise
> operators when stored as integer, but becomes much more cumbersome
> with UUID, I guess.

The groupings are only for display / printing, to make it easier for
humans to read.  



WRT mixing logic into the key (sharding, etc.), all UUIDs except type
4 can provide you with a reasonable static sharding key.  And even
type 4 works if you shard by time.

Also, keep in mind that a UUID can be generated by a client or a key
server and thus have no relationship to the DBMS server that stores
it. Depending on how you choose to generate and use it, a UUID doesn't
necessarily carry or reveal any exploitable information.


YMMV,
George