Обсуждение: Very long query planning times for database with lots of partitions
I have a PostgreSQL 10 database that contains two tables which both have two levels of partitioning (by list and using a single value). Meaning that a partitioned table gets repartitioned again.
- https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9fdb675fc5d2de825414e05939727de8b120ae81
Helas, it seems that the version update didn't change anything.
Now I know that PostgreSQL doesn't like having lots of partitions but I still would like to understand why the query planner is so slow in PostgreSQL 10 and PostgreSQL 11.
(I was also wondering what "a lot" of partitions is in PostgreSQL? When I look at use cases of extensions like TimescaleDB, I would expect that 5K to 10K partitions wouldn't be a whole lot.)
An example of a simple query that I run on both PostgreSQL version would be:
EXPLAIN ANALYZE
SELECT
table_a.a,
table_b.a
FROM
(
SELECT
a,
b
FROM
table_a
WHERE
partition_level_1_column = 'foo'
AND
partition_level_2_column = 'bar'
)
AS table_a
INNER JOIN
(
SELECT
a,
b
FROM
table_b
WHERE
partition_level_1_column = 'baz'
AND
partition_level_2_column = 'bat'
)
AS table_b
ON table_b.b = table_a.b
LIMIT
10;
Thank you in advance for your help!
Mickael
On Tue, Jan 22, 2019 at 02:44:29PM +0100, Mickael van der Beek wrote: > Hey everyone, > > I have a PostgreSQL 10 database that contains two tables which both have > two levels of partitioning (by list and using a single value). Meaning that > a partitioned table gets repartitioned again. > > The data in my use case is stored on 5K to 10K partitioned tables (children > and grand-children of the two tables mentioned above) depending on usage > levels. > > Three indexes are set on the grand-child partition. The partitioning > columns are not covered by them. > (I don't believe that it is needed to index partition columns no?) > > With this setup, I experience queries that have very slow planning times > but fast execution times. > Even for simple queries where only a couple partitions are searched on and > the partition values are hard-coded. > > Researching the issue, I thought that the linear search in use by > PostgreSQL 10 to find the partition table metadata was the cause. > > cf: https://blog.2ndquadrant.com/partition-elimination-postgresql-11/ > > So I decided to try ou PostgreSQL 11 which included the two aforementioned > fixes: > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=499be013de65242235ebdde06adb08db887f0ea5 > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9fdb675fc5d2de825414e05939727de8b120ae81 Those reduce the CPU time needed, but that's not the most significant issue. For postgres up through 11, including for relkind=p, planning requires 1) stat()ing every 1GB file for every partition, even those partitions which are eventually excluded by constraints or partition bounds ; AND, 2) open()ing every index on every partition, even if it's excluded later. Postgres 12 is expected to resolve this and allow "many" (perhaps 10k) of partitions: https://commitfest.postgresql.org/20/1778/ I think postgres through 11 would consider 1000 partitions to be "too many". You *might* be able to mitigate the high cost of stat()ing tables by ensuring that the table metadata stays in OS cache, by running something like: find /var/lib/pgsql /tablespace -ls You *might* be able to mitigate the high cost of open()ing the indices by keeping their first page in cache (preferably postgres buffer cache)..either by running a cronjob to run explain, or perhaps something like pg_prewarm on the indices. (I did something like this for our largest customers to improve performance as a stopgap). Justin
Do you have constraint_exclusion set correctly (i.e. ‘on’ or ‘partition’)?
If so, does the EXPLAIN output mention all of your parent partitions, or are some being successfully pruned?
Planning times can be sped up significantly if the planner can exclude parent partitions, without ever having to examine the constraints of the child (and grandchild) partitions. If this is not the case, take another look at your query and try to figure out why the planner might believe a parent partition cannot be outright disregarded from the query – does the query contain a filter on the parent partitions’ partition key, for example?
I believe Timescaledb has its own query planner optimisations for discarding partitions early at planning time.
Good luck,
Steve.
This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message. The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations. If you cannot access this link, please notify us by reply message and we will send the contents to you.
GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us. Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy. Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice
Re: Very long query planning times for database with lots of partitions
(The table_a and table_b in my example query. There is also a table_c which connect table_a and table_b but I wanted to keep it simple.)
Another reason was that we do regular, automated cleanups of the data and dropping all the data (hundreds of thousands of rows) for a tenant is very fast with DROP TABLE of a partition and rather slow with a regular DELETE query (even if indexed).
With the redesign of the database schema (that included the partitioning changes), I also dramatically reduced the amounts and size of data per row on the nodes and edges by storing the large and numerous metadata fields on separate tables that are not part of the graph traversal process.
Queries are always limited to one tenant's data which was one of the motivations behind partitioning in the first place.
Do you have constraint_exclusion set correctly (i.e. ‘on’ or ‘partition’)?
If so, does the EXPLAIN output mention all of your parent partitions, or are some being successfully pruned?
Planning times can be sped up significantly if the planner can exclude parent partitions, without ever having to examine the constraints of the child (and grandchild) partitions. If this is not the case, take another look at your query and try to figure out why the planner might believe a parent partition cannot be outright disregarded from the query – does the query contain a filter on the parent partitions’ partition key, for example?
I believe Timescaledb has its own query planner optimisations for discarding partitions early at planning time.
Good luck,
Steve.
This email is confidential. If you are not the intended recipient, please advise us immediately and delete this message. The registered name of Cantab- part of GAM Systematic is Cantab Capital Partners LLP. See - http://www.gam.com/en/Legal/Email+disclosures+EU for further information on confidentiality, the risks of non-secure electronic communication, and certain disclosures which we are required to make in accordance with applicable legislation and regulations. If you cannot access this link, please notify us by reply message and we will send the contents to you.
GAM Holding AG and its subsidiaries (Cantab – GAM Systematic) will collect and use information about you in the course of your interactions with us. Full details about the data types we collect and what we use this for and your related rights is set out in our online privacy policy at https://www.gam.com/en/legal/privacy-policy. Please familiarise yourself with this policy and check it from time to time for updates as it supplements this notice