Обсуждение: pg_restore deadlocks with itself

Поиск
Список
Период
Сортировка

pg_restore deadlocks with itself

От
hubert depesz lubaczewski
Дата:
Hi,
I have Pg 14.4 on arm64, and I'm loading schema dump from pg12.

Specifically, command to run it is:
sudo -u postgres pg_restore -j 64 -d database -L /tmp/schema-post-data.nopkey.list /tmp/schema-post-data.dump

where /tmp/schema-post-data.dump is dump made using:
sudo -u postgres pg_dump --schema-only --compress=9 --format=c --section='post-data' --schema='*(something|else|xxx)*'
database

and /tmp/schema-post-data.nopkey.list is made using:

#v+
pkey_regexp='^\d+; \d+ \d+ CONSTRAINT \S+ \S+ \S+_pkey '
pg_restore -l "${local_dump_dir}/schema-post-data.dump" | grep -vP "${pkey_regexp}" >
"${local_dump_dir}/schema-post-data.nopkey.list"
#v-

The thing is that the list file has 7676 lines, out of which 2313 are
foreign keys.

database has basically 3 schemas, and fkeys NEVER cross schema.

When loading the dump I got reported 4 errors, but log shows 2 deadlocks.
perhaps all sides got cancelled, not sure, will investigate further, but logs
show:

2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,42,"ALTER TABLE waiting",2022-08-24
13:16:29UTC,21/1932,0,LOG,00000,"process 3343477 detected deadlock while waiting for ShareRowExclusiveLock on relation
742617610of database 16641 after 1000.647 ms","Process holding the lock: 3587718. Wait queue: .",,,,,"ALTER TABLE ONLY
some_schema.table_a_o
    ADD CONSTRAINT table_a_o_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id);
",,,"pg_restore","client backend",,3355460102417501954


2022-08-24 20:01:50.291 UTC,"postgres","database",3343477,"[local]",630624ad.330475,46,"ALTER TABLE waiting",2022-08-24
13:16:29UTC,21/1933,0,LOG,00000,"process 3343477 detected deadlock while waiting for ShareRowExclusiveLock on relation
742617610of database 16641 after 1000.030 ms","Process holding the lock: 3587718. Wait queue: .",,,,,"ALTER TABLE ONLY
some_schema.table_a
    ADD CONSTRAINT fk_rails_46718e626a FOREIGN KEY (migrate_from_id) REFERENCES some_schema.table_q(id);
",,,"pg_restore","client backend",,-2548896815899838768

Now, I know I can fix the situation by adding missing fkeys myself, but
I don't think pg_restore should be putting itself in deadlock.

Best regards,

depesz




Re: pg_restore deadlocks with itself

От
Alvaro Herrera
Дата:
On 2022-Aug-25, hubert depesz lubaczewski wrote:

> Specifically, command to run it is:
> sudo -u postgres pg_restore -j 64 -d database -L /tmp/schema-post-data.nopkey.list /tmp/schema-post-data.dump

> 2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,42,"ALTER TABLE
waiting",2022-08-2413:16:29 UTC,21/1932,0,LOG,00000,"process 3343477 detected deadlock while waiting for
ShareRowExclusiveLockon relation 742617610 of database 16641 after 1000.647 ms","Process holding the lock: 3587718.
Waitqueue: .",,,,,"ALTER TABLE ONLY some_schema.table_a_o
 
>     ADD CONSTRAINT table_a_o_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id);
> ",,,"pg_restore","client backend",,3355460102417501954
> 
> 
> 2022-08-24 20:01:50.291 UTC,"postgres","database",3343477,"[local]",630624ad.330475,46,"ALTER TABLE
waiting",2022-08-2413:16:29 UTC,21/1933,0,LOG,00000,"process 3343477 detected deadlock while waiting for
ShareRowExclusiveLockon relation 742617610 of database 16641 after 1000.030 ms","Process holding the lock: 3587718.
Waitqueue: .",,,,,"ALTER TABLE ONLY some_schema.table_a
 
>     ADD CONSTRAINT fk_rails_46718e626a FOREIGN KEY (migrate_from_id) REFERENCES some_schema.table_q(id);
> ",,,"pg_restore","client backend",,-2548896815899838768
> 
> Now, I know I can fix the situation by adding missing fkeys myself, but
> I don't think pg_restore should be putting itself in deadlock.

You're right, it shouldn't.  Parallel restore shouldn't run DDL commands
in parallel that would deadlock, but I suppose there must be holes in
that.

What was process 3587718 doing at the time?  Some DDL in
some_schema.table_q, I suspect, right?  Are any of these tables
partitioned?

The "Wait queue: ." bit is ugly.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"The problem with the facetime model is not just that it's demoralizing, but
that the people pretending to work interrupt the ones actually working."
                                                           (Paul Graham)



Re: pg_restore deadlocks with itself

От
hubert depesz lubaczewski
Дата:
On Tue, Aug 30, 2022 at 11:06:34AM +0200, Alvaro Herrera wrote:
> On 2022-Aug-25, hubert depesz lubaczewski wrote:
> 
> > Specifically, command to run it is:
> > sudo -u postgres pg_restore -j 64 -d database -L /tmp/schema-post-data.nopkey.list /tmp/schema-post-data.dump
> 
> > 2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,42,"ALTER TABLE
waiting",2022-08-2413:16:29 UTC,21/1932,0,LOG,00000,"process 3343477 detected deadlock while waiting for
ShareRowExclusiveLockon relation 742617610 of database 16641 after 1000.647 ms","Process holding the lock: 3587718.
Waitqueue: .",,,,,"ALTER TABLE ONLY some_schema.table_a_o
 
> >     ADD CONSTRAINT table_a_o_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id);
> > ",,,"pg_restore","client backend",,3355460102417501954
> > 
> > 
> > 2022-08-24 20:01:50.291 UTC,"postgres","database",3343477,"[local]",630624ad.330475,46,"ALTER TABLE
waiting",2022-08-2413:16:29 UTC,21/1933,0,LOG,00000,"process 3343477 detected deadlock while waiting for
ShareRowExclusiveLockon relation 742617610 of database 16641 after 1000.030 ms","Process holding the lock: 3587718.
Waitqueue: .",,,,,"ALTER TABLE ONLY some_schema.table_a
 
> >     ADD CONSTRAINT fk_rails_46718e626a FOREIGN KEY (migrate_from_id) REFERENCES some_schema.table_q(id);
> > ",,,"pg_restore","client backend",,-2548896815899838768
> > 
> > Now, I know I can fix the situation by adding missing fkeys myself, but
> > I don't think pg_restore should be putting itself in deadlock.
> 
> You're right, it shouldn't.  Parallel restore shouldn't run DDL commands
> in parallel that would deadlock, but I suppose there must be holes in
> that.
> 
> What was process 3587718 doing at the time?  Some DDL in
> some_schema.table_q, I suspect, right?  Are any of these tables
> partitioned?

So, more info from logs:

2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,43,"ALTER TABLE",2022-08-24
13:16:29UTC,21/1932,0,ERROR,40P01,"deadlock detected","Process 3343477 waits for ShareRowExclusiveLock on relation
742617610of database 16641; blocked by process 3587718.
 
Process 3587718 waits for RowExclusiveLock on relation 742615338 of database 16641; blocked by process 3343477.
Process 3343477: ALTER TABLE ONLY some_schema.table_ao
    ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id);



Process 3587718: <command string not enabled>","See server log for query details.",,,,"ALTER TABLE ONLY
some_schema.table_ao
    ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id);


",,,"pg_restore","client backend",,3355460102417501954

Which is weird, as it seems that they both were trying to do the same fkey?!

Neither table_ao nor table_q are partitioned, and used restore-list contained this fkey only once.

Best regards,

depesz




Re: pg_restore deadlocks with itself

От
Alvaro Herrera
Дата:
On 2022-Aug-30, hubert depesz lubaczewski wrote:

> 2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,43,"ALTER TABLE",2022-08-24
13:16:29UTC,21/1932,0,ERROR,40P01,"deadlock detected","Process 3343477 waits for ShareRowExclusiveLock on relation
742617610of database 16641; blocked by process 3587718.
 
> Process 3587718 waits for RowExclusiveLock on relation 742615338 of database 16641; blocked by process 3343477.
> Process 3343477: ALTER TABLE ONLY some_schema.table_ao
>     ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id);
> 
> 
> 
> Process 3587718: <command string not enabled>","See server log for query details.",,,,"ALTER TABLE ONLY
some_schema.table_ao
>     ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id);
> 
> 
> ",,,"pg_restore","client backend",,3355460102417501954
> 
> Which is weird, as it seems that they both were trying to do the same fkey?!
> 
> Neither table_ao nor table_q are partitioned, and used restore-list contained this fkey only once.

Hmm, you said table_a_o and table_a first, but now you say table_ao.  I
suspect that's just a typo resulting from anonymizing table names.  Right?

OK, so no partitioning.  But are there any inheritance relationship
between these tables?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No es bueno caminar con un hombre muerto"



Re: pg_restore deadlocks with itself

От
hubert depesz lubaczewski
Дата:
On Tue, Aug 30, 2022 at 11:29:37AM +0200, Alvaro Herrera wrote:
> On 2022-Aug-30, hubert depesz lubaczewski wrote:
> 
> > 2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,43,"ALTER TABLE",2022-08-24
13:16:29UTC,21/1932,0,ERROR,40P01,"deadlock detected","Process 3343477 waits for ShareRowExclusiveLock on relation
742617610of database 16641; blocked by process 3587718.
 
> > Process 3587718 waits for RowExclusiveLock on relation 742615338 of database 16641; blocked by process 3343477.
> > Process 3343477: ALTER TABLE ONLY some_schema.table_ao
> >     ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id);
> > 
> > 
> > 
> > Process 3587718: <command string not enabled>","See server log for query details.",,,,"ALTER TABLE ONLY
some_schema.table_ao
> >     ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id);
> > 
> > 
> > ",,,"pg_restore","client backend",,3355460102417501954
> > 
> > Which is weird, as it seems that they both were trying to do the same fkey?!
> > 
> > Neither table_ao nor table_q are partitioned, and used restore-list contained this fkey only once.
> 
> Hmm, you said table_a_o and table_a first, but now you say table_ao.  I
> suspect that's just a typo resulting from anonymizing table names.  Right?
> 
> OK, so no partitioning.  But are there any inheritance relationship
> between these tables?

There were two deadlocks, I might have copy/pasted info from another
case.

These are *all* deadlock related messages from the log, all where
anonymized using regexps, no manual intervention, so the changes should
be consistent.

======================================================

2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,42,"ALTER TABLE waiting",2022-08-24
13:16:29UTC,21/1932,0,LOG,00000,"process 3343477 detected deadlock while waiting for ShareRowExclusiveLock on relation
742617610of database 16641 after 1000.647 ms","Process holding the lock: 3587718. Wait queue: .",,,,,"ALTER TABLE ONLY
some_schema.table_ao
    ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id);


",,,"pg_restore","client backend",,3355460102417501954


======================================================

2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,43,"ALTER TABLE",2022-08-24
13:16:29UTC,21/1932,0,ERROR,40P01,"deadlock detected","Process 3343477 waits for ShareRowExclusiveLock on relation
742617610of database 16641; blocked by process 3587718.
 
Process 3587718 waits for RowExclusiveLock on relation 742615338 of database 16641; blocked by process 3343477.
Process 3343477: ALTER TABLE ONLY some_schema.table_ao
    ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id);



Process 3587718: <command string not enabled>","See server log for query details.",,,,"ALTER TABLE ONLY
some_schema.table_ao
    ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id);


",,,"pg_restore","client backend",,3355460102417501954


======================================================

2022-08-24 20:01:50.291 UTC,"postgres","database",3343477,"[local]",630624ad.330475,46,"ALTER TABLE waiting",2022-08-24
13:16:29UTC,21/1933,0,LOG,00000,"process 3343477 detected deadlock while waiting for ShareRowExclusiveLock on relation
742617610of database 16641 after 1000.030 ms","Process holding the lock: 3587718. Wait queue: .",,,,,"ALTER TABLE ONLY
some_schema.table_a
    ADD CONSTRAINT fk_rails_46718e626a FOREIGN KEY (migrate_from_id) REFERENCES some_schema.table_q(id);


",,,"pg_restore","client backend",,-2548896815899838768


======================================================

2022-08-24 20:01:50.291 UTC,"postgres","database",3343477,"[local]",630624ad.330475,47,"ALTER TABLE",2022-08-24
13:16:29UTC,21/1933,0,ERROR,40P01,"deadlock detected","Process 3343477 waits for ShareRowExclusiveLock on relation
742617610of database 16641; blocked by process 3587718.
 
Process 3587718 waits for RowExclusiveLock on relation 742615348 of database 16641; blocked by process 3343477.
Process 3343477: ALTER TABLE ONLY some_schema.table_a
    ADD CONSTRAINT fk_rails_46718e626a FOREIGN KEY (migrate_from_id) REFERENCES some_schema.table_q(id);



Process 3587718: <command string not enabled>","See server log for query details.",,,,"ALTER TABLE ONLY
some_schema.table_a
    ADD CONSTRAINT fk_rails_46718e626a FOREIGN KEY (migrate_from_id) REFERENCES some_schema.table_q(id);


",,,"pg_restore","client backend",,-2548896815899838768

======================================================

As for inheritance - none of table_a, table_ao, table_q has any
inheritance. There are tables with inheritance in this db, but not
these.

It is, though, possible that one of table_a/ao/q has fkey pointing to
some inherited table, or some inherited table has fkey to _a/_ao/_q
- would that matter? i can dig details if it would be interesting.

Best regards,

depesz