Обсуждение: Declarative partitioning and automatically generated row-IDs using BIGSERIAL

Поиск
Список
Период
Сортировка

Declarative partitioning and automatically generated row-IDs using BIGSERIAL

От
Thorsten Schöning
Дата:
Hi all,

I have the following table containing 100+ millions rows currently and
which needs to be queried by "captured_at" a lot. That table stores
rows for the last 6 years, but most of the queries focus on the last
15 months, 15 days or really only 15 minutes.

> CREATE TABLE public.clt_rec(
>         id bigserial NOT NULL,
>         oms_rec bigint NOT NULL,
>         captured_at timestamp with time zone NOT NULL,
>         rssi smallint NOT NULL,
>         CONSTRAINT pk_clt_rec PRIMARY KEY (id)
>          WITH (FILLFACTOR = 10),
>         CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec)
> );

We want to test if performance of some queries can be improved by
using declarative partitioning with far less rows, e.g. one table per
year or half a year or alike.

One important thing to have in mind is handling of the column "id",
because for historical reasons that column is not only used to address
individual rows. It's used as a transaction ID for some rudimentary
exporting of those rows as well: So there's some app requesting NEW
rows to export and simply providing the LAST "id" it received from
former requests. Therefore I need to keep that ID when splitting the
table into individual partitions and as well need to guarantee that
IDs are unique across all partitions.

I've read a lot of similar examples using BIGSERIAL or alike already,
but would like to make sure I understood correctly how those SERIAL
values are generated.

https://alexey-soshin.medium.com/dealing-with-partitions-in-postgres-11-fa9cc5ecf466

Am I correct that after migrating the available table to a partitioned
one I keep INSERTing into the partitioned table only in my app?
Because of the declarative partitioninig used, the only thing I need
to assure is to have necessary partitions available when INSERT
happens? I would create them beforehand as part of some maintenance,
but read about a DEFAULT partition already as well.

Because I keep INSERTing into the partitioned table, the semantic of
my ID doesn't change, correct? There's a SEQUENCE associated with the
column "id" in the partitioned table and that generated my value,
which is AFTERWARDS stored with all other values of a row in whichever
partitions fits to the partition key.

Or is the target partition calculated first and AFTERWARDS a SEQUENCE
private to each partition table is used to calculate the ID? I don't
think so, but according the docs indexes etc. are inherited by
partitions as well. So maybe Postgres maintains multiple SEQUENCES in
the background for some reason as well.

> Unique constraints (and hence primary keys) on partitioned tables
> must include all the partition key columns. This limitation exists
> because PostgreSQL can only enforce uniqueness in each partition
> individually.

https://www.postgresql.org/docs/11/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-BEST-PRACTICES

Doesn't the above allow manually updating different rows in different
partition tables to contain the same ID in the end? The main benefit
of the PRIMARY KEY left is uniqueness per partition, correct?

> While primary keys are supported on partitioned tables, foreign keys
> referencing partitioned tables are not supported. (Foreign key
> references from a partitioned table to some other table are
> supported.)

The docs for Postgres 11 mention that partitioned tables can not be
used as target of foreign keys, while that statement is removed from
the docs of Postgres 13. How is a 1:1 relationship guaranteed in newer
version when PRIMARY KEYS are still local to their individual
partition table?

Thanks for your help!

Mit freundlichen Grüßen,

Thorsten Schöning

--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH

E-Mail:  Thorsten.Schoening@AM-SoFT.de
Web:     http://www.AM-SoFT.de/

Telefon: 05151-  9468-55
Fax:     05151-  9468-88
Mobil:    0178-8 9468-04

Firmensitz: Bitstore IT-Consulting, Frankfurter Allee 285, 10317 Berlin
Steuernummer 037/230/30566, HR 27198, Amtsgericht Potsdam Geschäftsführer Janine Galonska




Re: Declarative partitioning and automatically generated row-IDs using BIGSERIAL

От
Michael Lewis
Дата:
Why is your fillfactor so low? That seems pretty crazy, especially for a table with only 4 columns that are fixed width. 100 million rows with so little data in each row is not very much at all. You should be looking to other solutions before partitioning I expect.

Perhaps a silly question, but do you have an index on captured_on? You included the full DDL for the table it seems so I wanted to ask. Can you provide some example queries that you want to speed up? What are you configuration settings (select * from pg_settings;)?

Have you pursued adding a BRIN index, or partial index on captured_on perhaps? If using a partial index that covers the last 15+ days, you would both need to drop & recreate that occasionally, and also you would need to ensure your where clause on captured_on is a static value and not relative like now() - interval '15 days'. It should be simple enough to compute that date in whatever application or process is generating your query strings though.

As you have gathered already from the docs, implementing partitioning is not without hurdles or feature concessions. The planning time to consider all those partitions is also not free. Not to mention the manual maintenance to create new or combine old partitions.

One thing of note to correct from your email- Primary keys are NOT unique to individual partitions, and if you have a primary key or other unique index on a partitioned table, at least one of those columns must be in the partition key. That is, you could not have a primary key on ID and partition on captured_on for instance.

Re: Declarative partitioning and automatically generated row-IDs using BIGSERIAL

От
Thorsten Schöning
Дата:
Guten Tag Michael Lewis,
am Donnerstag, 31. Dezember 2020 um 18:20 schrieben Sie:

> Why is your fillfactor so low?[...]

I've just copied what my GUI-tool pgModeler generated as SQL right
now, that fill factor might have never been applied at all.

> Perhaps a silly question, but do you have an index on captured_on?

Yes and that changed a lot some years ago, but doesn't seem to be
sufficient anymore.

> You included the full DDL for the table it seems so I wanted to ask.

No, only the relevant part necessary for my question. How SEQUENCES,
PKs etc. are handled is the most important thing right now.

> Can you
> provide some example queries that you want to speed up? What are you
> configuration settings (select * from pg_settings;)?

No thanks, I don't want to discuss such things in this thread. I'll
asked lot's of other questions regarding my concrete bottlenecks,
schema, app logic etc. already and simply want to test with partitions
now. That has been suggested in the other questions as well and is
what I focus on now.

> [...]That is, you could not have a primary key on ID and
> partition on captured_on for instance.

That's what I understood as well and is the reason why I asked: That
means IDs could be duplicated manually within individual partition
tables, while I need them to be unique across all of those.
Additionally I wonder when IDs are generated by which SEQUENCE etc.

Mit freundlichen Grüßen,

Thorsten Schöning

--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH

E-Mail:  Thorsten.Schoening@AM-SoFT.de
Web:     http://www.AM-SoFT.de/

Telefon: 05151-  9468-55
Fax:     05151-  9468-88
Mobil:    0178-8 9468-04

Firmensitz: Bitstore IT-Consulting, Frankfurter Allee 285, 10317 Berlin
Steuernummer 037/230/30566, HR 27198, Amtsgericht Potsdam Geschäftsführer Janine Galonska




Re: Declarative partitioning and automatically generated row-IDs using BIGSERIAL

От
Michael Lewis
Дата:
On Thu, Dec 31, 2020 at 11:18 AM Thorsten Schöning <tschoening@am-soft.de> wrote:
Guten Tag Michael Lewis,
am Donnerstag, 31. Dezember 2020 um 18:20 schrieben Sie:

> Why is your fillfactor so low?[...]

I've just copied what my GUI-tool pgModeler generated as SQL right
now, that fill factor might have never been applied at all.

You should definitely check. Only 10% fill on each block is a bit crazy and has a ton of wasted space built into your table. If it is set low like that, I'd check the fillfactor on the indexes for that table as well.

select
       t.reloptions
from pg_class t
  join pg_namespace n on n.oid = t.relnamespace
where t.relname = 'clt_rec'
  and n.nspname = 'public';

 
> [...]That is, you could not have a primary key on ID and
> partition on captured_on for instance.

That's what I understood as well and is the reason why I asked: That
means IDs could be duplicated manually within individual partition
tables, while I need them to be unique across all of those.
Additionally I wonder when IDs are generated by which SEQUENCE etc.

Sequences are incremented and return the new value whenever they are called. If all partitions inherit the same sequence (behavior determined by which options you use when you create partitions LIKE parent), then they will all use the same series of values that cannot provide duplicates (until wrap around if allowed). If you want to ensure you don't get duplicates, then either do a proper IDENTITY which is generated as always, or ensure ID is a primary key/unique index. Since the partition key must include the ID column and partitions cannot overlap in the partition key space, each index on each partition would be part of a set.

Re: Declarative partitioning and automatically generated row-IDs using BIGSERIAL

От
Thorsten Schöning
Дата:
Guten Tag Michael Lewis,
am Donnerstag, 31. Dezember 2020 um 19:28 schrieben Sie:

> select
>        t.reloptions
> from pg_class t
>   join pg_namespace n on n.oid = t.relnamespace
> where t.relname = 'clt_rec'
>   and n.nspname = 'public';

That outputs NULL, as well for other tested indexes. Additionally, the
following two lines only apply to the index associtaed with the
primary key, not the table itself, don't they?

>         CONSTRAINT pk_clt_rec PRIMARY KEY (id)
>          WITH (FILLFACTOR = 10),

I've checked pgModeler and it seems to set such low fill factors
automatically on old primary keys only. Will remove those.

> Sequences are incremented and return the new value whenever they are
> called. If all partitions inherit the same sequence (behavior determined by
> which options you use when you create partitions LIKE parent)[...]

Am I allowed or is it necessary to use LIKE? The docs don't use that
and the description sounds wrong as well.

> CREATE TABLE measurement_y2006m02 PARTITION OF measurement
>     FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');

> Unlike INHERITS, the new table and original table are completely
> decoupled after creation is complete. Changes to the original table
> will not be applied to the new table, and it is not possible to
> include data of the new table in scans of the original table.

The option you mention is INCLUDING IDENTITY and that would need to be
AVOIDED to NOT get multiple SEQUENCES?

Mit freundlichen Grüßen,

Thorsten Schöning

--
Thorsten Schöning
AM-SoFT IT-Service - Bitstore Hameln GmbH

E-Mail:  Thorsten.Schoening@AM-SoFT.de
Web:     http://www.AM-SoFT.de/

Telefon: 05151-  9468-55
Fax:     05151-  9468-88
Mobil:    0178-8 9468-04

Firmensitz: Bitstore IT-Consulting, Frankfurter Allee 285, 10317 Berlin
Steuernummer 037/230/30566, HR 27198, Amtsgericht Potsdam Geschäftsführer Janine Galonska




Re: Declarative partitioning and automatically generated row-IDs using BIGSERIAL

От
Michael Lewis
Дата:
My apologies. You are correct. My brain may have already switched to holiday mode.

Hopefully others will chime in shortly.

Re: Declarative partitioning and automatically generated row-IDs using BIGSERIAL

От
Laurenz Albe
Дата:
On Thu, 2020-12-31 at 17:38 +0100, Thorsten Schöning wrote:
> I have the following table containing 100+ millions rows currently and
> which needs to be queried by "captured_at" a lot. That table stores
> rows for the last 6 years, but most of the queries focus on the last
> 15 months, 15 days or really only 15 minutes.
> 
> > CREATE TABLE public.clt_rec(
> >         id bigserial NOT NULL,
> >         oms_rec bigint NOT NULL,
> >         captured_at timestamp with time zone NOT NULL,
> >         rssi smallint NOT NULL,
> >         CONSTRAINT pk_clt_rec PRIMARY KEY (id)
> >          WITH (FILLFACTOR = 10),
> >         CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec)
> > );
> 
> We want to test if performance of some queries can be improved by
> using declarative partitioning with far less rows, e.g. one table per
> year or half a year or alike.
> 
> [...] Therefore I need to keep that ID when splitting the
> table into individual partitions and as well need to guarantee that
> IDs are unique across all partitions.
> 
> I've read a lot of similar examples using BIGSERIAL or alike already,
> but would like to make sure I understood correctly how those SERIAL
> values are generated.

You should experiment with partitioned tables, that will answer most of
your questions.

> Am I correct that after migrating the available table to a partitioned
> one I keep INSERTing into the partitioned table only in my app?

That's the idea, yes.

> Because of the declarative partitioninig used, the only thing I need
> to assure is to have necessary partitions available when INSERT
> happens? I would create them beforehand as part of some maintenance,
> but read about a DEFAULT partition already as well.

Yes, you need the partitions created in advance.

Stay away from default partitions, that might be a trap in your case.
If there is a conflicting value in the default partition, it can prevent
the creation of a new partition.

> Because I keep INSERTing into the partitioned table, the semantic of
> my ID doesn't change, correct? There's a SEQUENCE associated with the
> column "id" in the partitioned table and that generated my value,
> which is AFTERWARDS stored with all other values of a row in whichever
> partitions fits to the partition key.

Yes.  All partitions should share the same sequence, so values are
automatically unique.

> Or is the target partition calculated first and AFTERWARDS a SEQUENCE
> private to each partition table is used to calculate the ID? I don't
> think so, but according the docs indexes etc. are inherited by
> partitions as well. So maybe Postgres maintains multiple SEQUENCES in
> the background for some reason as well.

Create a partitioned table and look at the definition, and you will see
that the default value (taken from the sequence) is calculated according
to how you created the partitioned table.  So it works like you expect.

> > Unique constraints (and hence primary keys) on partitioned tables
> > must include all the partition key columns. This limitation exists
> > because PostgreSQL can only enforce uniqueness in each partition
> > individually.
> 
> https://www.postgresql.org/docs/11/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-BEST-PRACTICES
> 
> Doesn't the above allow manually updating different rows in different
> partition tables to contain the same ID in the end? The main benefit
> of the PRIMARY KEY left is uniqueness per partition, correct?

You won't be able to enforce uniqueness with a constraint, period.
You have to make sure that your application always uses the sequence.

One way to make this more likely is to use an identity column rather
than "serial": GENERATED ALWAYS AS IDENTITY
Not only does this conform to the standard, but it will prevent
overriding the sequence value with a user supplied value.
(That is, you'd have to use special syntax to override the sequence
value.)

> > While primary keys are supported on partitioned tables, foreign keys
> > referencing partitioned tables are not supported. (Foreign key
> > references from a partitioned table to some other table are
> > supported.)
> 
> The docs for Postgres 11 mention that partitioned tables can not be
> used as target of foreign keys, while that statement is removed from
> the docs of Postgres 13. How is a 1:1 relationship guaranteed in newer
> version when PRIMARY KEYS are still local to their individual
> partition table?

There are foreign keys referencing partitioned tables from v12 on.

You can guarantee a 1:1 relationship with a unique constraint on the
source columns.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com