Обсуждение: Slow delete when many foreign tables are defined

Поиск
Список
Период
Сортировка

Slow delete when many foreign tables are defined

От
Giuseppe Sacco
Дата:
Hello,
I have a main table and a lot of "details" tables that reference the
main one.

Every time I delete a record from the main table, a check is done on
every details table that contain a foreign key toward main table.

This is a simplified schema:

create table main (
type varchar,
serial numeric,
description varchar not null,
constraint "mainpk" primary key (type,serial));

create table details1 (
type varchar check (type = '1'),
serial numeric,
details1 varchar not null,
constraint "details1pk" primary key (type,serial),
constraint "details1fk" foreign key (type,serial) references
main(type,serial));

create table details2 (
type varchar check (type = '2'),
serial numeric,
details2 varchar not null,
constraint "details2pk" primary key (type,serial),
constraint "details2fk" foreign key (type,serial) references
main(type,serial));

and suppose I have about 50-100 of these details tables, and about a
thousand records per each detail table. All detail tables use different
value for column "type".

Now, when I delete a record, I should delete it from a detail table and
from main table.

When I delete from main table, postgresql check for reference from all
details tables, while I would only check from the details table that
have the column "type" corrected.

insert into main values ('1',1,'desc');
insert into main values ('2',1,'desc');
insert into details1 values ('1',1,'desc');
insert into details2 values ('2',1,'desc');

begin;
delete from details2;
explain analyze delete from main where type = '2';

                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Delete on main  (cost=4.17..11.28 rows=3 width=6) (actual time=0.015..0.015 rows=0 loops=1)
   ->  Bitmap Heap Scan on main  (cost=4.17..11.28 rows=3 width=6) (actual time=0.011..0.011 rows=1 loops=1)
         Recheck Cond: ((type)::text = '2'::text)
         Heap Blocks: exact=1
         ->  Bitmap Index Scan on mainpk  (cost=0.00..4.17 rows=3 width=0) (actual time=0.007..0.007 rows=1 loops=1)
               Index Cond: ((type)::text = '2'::text)
 Planning time: 0.035 ms
 Trigger for constraint details1fk: time=0.107 calls=1
 Trigger for constraint details2fk: time=0.197 calls=1
 Execution time: 0.331 ms

As you may see, the delete operation call trigger details1fk even if
data in table details1 cannot be impacted by this delete.

You may think what happen with about 50 details tables...

Is there any way to make it work faster?

Thank you very much,
Giuseppe



Re: Slow delete when many foreign tables are defined

От
Andy Colson
Дата:
On 12/1/2014 9:23 AM, Giuseppe Sacco wrote:
> Hello,
> I have a main table and a lot of "details" tables that reference the
> main one.
>
> Every time I delete a record from the main table, a check is done on
> every details table that contain a foreign key toward main table.
>
> This is a simplified schema:
>
> create table main (
> type varchar,
> serial numeric,
> description varchar not null,
> constraint "mainpk" primary key (type,serial));
>
> create table details1 (
> type varchar check (type = '1'),
> serial numeric,
> details1 varchar not null,
> constraint "details1pk" primary key (type,serial),
> constraint "details1fk" foreign key (type,serial) references
> main(type,serial));
>
> create table details2 (
> type varchar check (type = '2'),
> serial numeric,
> details2 varchar not null,
> constraint "details2pk" primary key (type,serial),
> constraint "details2fk" foreign key (type,serial) references
> main(type,serial));
>
> and suppose I have about 50-100 of these details tables, and about a
> thousand records per each detail table. All detail tables use different
> value for column "type".
>
> Now, when I delete a record, I should delete it from a detail table and
> from main table.
>
> When I delete from main table, postgresql check for reference from all
> details tables, while I would only check from the details table that
> have the column "type" corrected.
>
> insert into main values ('1',1,'desc');
> insert into main values ('2',1,'desc');
> insert into details1 values ('1',1,'desc');
> insert into details2 values ('2',1,'desc');
>
> begin;
> delete from details2;
> explain analyze delete from main where type = '2';
>
>                                                       QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------
>   Delete on main  (cost=4.17..11.28 rows=3 width=6) (actual time=0.015..0.015 rows=0 loops=1)
>     ->  Bitmap Heap Scan on main  (cost=4.17..11.28 rows=3 width=6) (actual time=0.011..0.011 rows=1 loops=1)
>           Recheck Cond: ((type)::text = '2'::text)
>           Heap Blocks: exact=1
>           ->  Bitmap Index Scan on mainpk  (cost=0.00..4.17 rows=3 width=0) (actual time=0.007..0.007 rows=1 loops=1)
>                 Index Cond: ((type)::text = '2'::text)
>   Planning time: 0.035 ms
>   Trigger for constraint details1fk: time=0.107 calls=1
>   Trigger for constraint details2fk: time=0.197 calls=1
>   Execution time: 0.331 ms
>
> As you may see, the delete operation call trigger details1fk even if
> data in table details1 cannot be impacted by this delete.
>
> You may think what happen with about 50 details tables...
>
> Is there any way to make it work faster?
>
> Thank you very much,
> Giuseppe
>
>
>


I can think of two options:

1) Don't use 50 different detail tables.  A single detail table with the
type column will work much faster.  Is there a good reason to break them
out?  (# rows is not a good reason, btw).

2) Try inheritance.  I have no idea if it'll help, but I thought I'd
read someplace where the planner knew a little more about what types of
rows go into which tables.

-Andy


Re: Slow delete when many foreign tables are defined

От
Giuseppe Sacco
Дата:
Il giorno lun, 01/12/2014 alle 09.49 -0600, Andy Colson ha scritto:
> On 12/1/2014 9:23 AM, Giuseppe Sacco wrote:
> > Hello,
> > I have a main table and a lot of "details" tables that reference the
> > main one.
> >
> > Every time I delete a record from the main table, a check is done on
> > every details table that contain a foreign key toward main table.
[...]
> I can think of two options:
>
> 1) Don't use 50 different detail tables.  A single detail table with the
> type column will work much faster.  Is there a good reason to break them
> out?  (# rows is not a good reason, btw).

Basically we do have a lot of different attributes on each details
tables. Let's say we use 20-30 specific columns in each of them, so why
should we waste disk space and CPU cycles for handling all these columns
in one table? If I understand it, you are suggesting to add about 25*50
columns in the main table and only set values for 25 columns.

Moreover, our ORM would probably get crazy :-)

> 2) Try inheritance.  I have no idea if it'll help, but I thought I'd
> read someplace where the planner knew a little more about what types of
> rows go into which tables.

This would probably help, but we are blocked on ANSI SQL for easily
porting our application to other DBMSes.

Bye,
Giuseppe



Re: Slow delete when many foreign tables are defined

От
Alban Hertroys
Дата:
On 1 December 2014 at 17:21, Giuseppe Sacco
<giuseppe@eppesuigoccas.homedns.org> wrote:
> Il giorno lun, 01/12/2014 alle 09.49 -0600, Andy Colson ha scritto:
>> On 12/1/2014 9:23 AM, Giuseppe Sacco wrote:

>> 2) Try inheritance.  I have no idea if it'll help, but I thought I'd
>> read someplace where the planner knew a little more about what types of
>> rows go into which tables.

Andy is referring to a feature called "constraint exclusion". I'm not
sure why that doesn't kick in with your table definition though.

If you get that working with your schema, your problem should be
solved. It's possible that it only works correctly with table
inheritance though.

> This would probably help, but we are blocked on ANSI SQL for easily
> porting our application to other DBMSes.

One thing that could speed up the lookups a little is to reverse your
primary keys on the child tables. Since type is a constant in them,
there's not really any point in searching for that first each time a
value needs to be looked up. Especially since in the child tables that
value has an incredibly bad selectivity: all rows have that value.

It's possible that it throws off the query planner.

That won't prevent those child tables from being checked against, but
it might alleviate the pain a bit.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: Slow delete when many foreign tables are defined

От
Andy Colson
Дата:
On 12/1/2014 10:21 AM, Giuseppe Sacco wrote:
> Il giorno lun, 01/12/2014 alle 09.49 -0600, Andy Colson ha scritto:
>> On 12/1/2014 9:23 AM, Giuseppe Sacco wrote:
>>> Hello,
>>> I have a main table and a lot of "details" tables that reference the
>>> main one.
>>>
>>> Every time I delete a record from the main table, a check is done on
>>> every details table that contain a foreign key toward main table.
> [...]
>> I can think of two options:
>>
>> 1) Don't use 50 different detail tables.  A single detail table with the
>> type column will work much faster.  Is there a good reason to break them
>> out?  (# rows is not a good reason, btw).
>
> Basically we do have a lot of different attributes on each details
> tables. Let's say we use 20-30 specific columns in each of them, so why
> should we waste disk space and CPU cycles for handling all these columns
> in one table? If I understand it, you are suggesting to add about 25*50
> columns in the main table and only set values for 25 columns.
>
> Moreover, our ORM would probably get crazy :-)
>
>> 2) Try inheritance.  I have no idea if it'll help, but I thought I'd
>> read someplace where the planner knew a little more about what types of
>> rows go into which tables.
>
> This would probably help, but we are blocked on ANSI SQL for easily
> porting our application to other DBMSes.
>
> Bye,
> Giuseppe
>
>
>


Oh, so the table structure of detail1 doesn't really match the structure
of detail2?  That'd be a pretty good reason to have lots of different
detail tables.

 > If I understand it, you are suggesting to add about 25*50
 > columns in the main table and only set values for 25 columns.

Nope, I didnt realize they were all different.

Hum.. how different though?  Maybe you could combine the similar one?
Maybe cutting the # of detail tables down from 50-100 to 25 would
increase speed enough?



-Andy


Re: Slow delete when many foreign tables are defined

От
Andy Colson
Дата:
On 12/1/2014 10:37 AM, Alban Hertroys wrote:
> On 1 December 2014 at 17:21, Giuseppe Sacco
> <giuseppe@eppesuigoccas.homedns.org> wrote:
>> Il giorno lun, 01/12/2014 alle 09.49 -0600, Andy Colson ha scritto:
>>> On 12/1/2014 9:23 AM, Giuseppe Sacco wrote:
>
>>> 2) Try inheritance.  I have no idea if it'll help, but I thought I'd
>>> read someplace where the planner knew a little more about what types of
>>> rows go into which tables.
>
> Andy is referring to a feature called "constraint exclusion". I'm not
> sure why that doesn't kick in with your table definition though.
>
> If you get that working with your schema, your problem should be
> solved. It's possible that it only works correctly with table
> inheritance though.
>

Yep, that's what I was thinking, and seeing this:

http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html

Which says:
"6.  Ensure that the constraint_exclusion configuration parameter is not
disabled in postgresql.conf. If it is, queries will not be optimized as
desired."

leads me to believe it only works with table inheritance.


> This would probably help, but we are blocked on ANSI SQL for easily
> porting our application to other DBMSes.

There is very little difference in syntax.  You'd always create many
detail tables except in PG you'd need tiny different syntax.  If it
worked, it might be worth it.  Maybe?

-Andy


Re: Slow delete when many foreign tables are defined

От
Bill Moran
Дата:
On Mon, 01 Dec 2014 11:00:51 -0600
Andy Colson <andy@squeakycode.net> wrote:

> On 12/1/2014 10:21 AM, Giuseppe Sacco wrote:
> > Il giorno lun, 01/12/2014 alle 09.49 -0600, Andy Colson ha scritto:
> >> On 12/1/2014 9:23 AM, Giuseppe Sacco wrote:
> >>> Hello,
> >>> I have a main table and a lot of "details" tables that reference the
> >>> main one.
> >>>
> >>> Every time I delete a record from the main table, a check is done on
> >>> every details table that contain a foreign key toward main table.
> > [...]
> >> I can think of two options:
> >>
> >> 1) Don't use 50 different detail tables.  A single detail table with the
> >> type column will work much faster.  Is there a good reason to break them
> >> out?  (# rows is not a good reason, btw).
> >
> > Basically we do have a lot of different attributes on each details
> > tables. Let's say we use 20-30 specific columns in each of them, so why
> > should we waste disk space and CPU cycles for handling all these columns
> > in one table? If I understand it, you are suggesting to add about 25*50
> > columns in the main table and only set values for 25 columns.
> >
> > Moreover, our ORM would probably get crazy :-)
> >
> >> 2) Try inheritance.  I have no idea if it'll help, but I thought I'd
> >> read someplace where the planner knew a little more about what types of
> >> rows go into which tables.
> >
> > This would probably help, but we are blocked on ANSI SQL for easily
> > porting our application to other DBMSes.
>
> Oh, so the table structure of detail1 doesn't really match the structure
> of detail2?  That'd be a pretty good reason to have lots of different
> detail tables.
>
>  > If I understand it, you are suggesting to add about 25*50
>  > columns in the main table and only set values for 25 columns.
>
> Nope, I didnt realize they were all different.

This is a decent place to consider using the JSON data type. Combine all
those into a single table and put all the fields that are different from
one to the next in a JSON field.

You can replicate this with other RDBMS by using a TEXT field, you just
won't be able to use PG's JSON functions if you want to be compatible
with lesser RDBMS.

Another option is a two-level deal:

CREATE TABLE main (
 id SERIAL PRIMARY KEY,
);

CREATE TABLE secondary (
 parent_id INT PRIMARY KEY REFERENCES main(id),
 ... fields common to all tables ...
);

CREATE table tertiary1 (
 parent_id INT REFERENCES secondary(parent_id),
 ... additional fields ...
);

CREATE table tertiary2 (
 parent_id INT REFERENCES secondary(parent_id),
 ... additional fields ...
);

This is only possible if there's only one secondary row per row in main,
so it may not work for you. But it means that a delete from main only
has to check secondary for PK references. Deletes from secondary will
be slow now, since they have to check a lot of tertiary tables, so that
might only move the problem to another table, depending on your app
design.

--
Bill Moran
I need your help to succeed:
http://gamesbybill.com