Обсуждение: pg_restore deadlocks with itself
Hi, I have Pg 14.4 on arm64, and I'm loading schema dump from pg12. Specifically, command to run it is: sudo -u postgres pg_restore -j 64 -d database -L /tmp/schema-post-data.nopkey.list /tmp/schema-post-data.dump where /tmp/schema-post-data.dump is dump made using: sudo -u postgres pg_dump --schema-only --compress=9 --format=c --section='post-data' --schema='*(something|else|xxx)*' database and /tmp/schema-post-data.nopkey.list is made using: #v+ pkey_regexp='^\d+; \d+ \d+ CONSTRAINT \S+ \S+ \S+_pkey ' pg_restore -l "${local_dump_dir}/schema-post-data.dump" | grep -vP "${pkey_regexp}" > "${local_dump_dir}/schema-post-data.nopkey.list" #v- The thing is that the list file has 7676 lines, out of which 2313 are foreign keys. database has basically 3 schemas, and fkeys NEVER cross schema. When loading the dump I got reported 4 errors, but log shows 2 deadlocks. perhaps all sides got cancelled, not sure, will investigate further, but logs show: 2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,42,"ALTER TABLE waiting",2022-08-24 13:16:29UTC,21/1932,0,LOG,00000,"process 3343477 detected deadlock while waiting for ShareRowExclusiveLock on relation 742617610of database 16641 after 1000.647 ms","Process holding the lock: 3587718. Wait queue: .",,,,,"ALTER TABLE ONLY some_schema.table_a_o ADD CONSTRAINT table_a_o_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id); ",,,"pg_restore","client backend",,3355460102417501954 2022-08-24 20:01:50.291 UTC,"postgres","database",3343477,"[local]",630624ad.330475,46,"ALTER TABLE waiting",2022-08-24 13:16:29UTC,21/1933,0,LOG,00000,"process 3343477 detected deadlock while waiting for ShareRowExclusiveLock on relation 742617610of database 16641 after 1000.030 ms","Process holding the lock: 3587718. Wait queue: .",,,,,"ALTER TABLE ONLY some_schema.table_a ADD CONSTRAINT fk_rails_46718e626a FOREIGN KEY (migrate_from_id) REFERENCES some_schema.table_q(id); ",,,"pg_restore","client backend",,-2548896815899838768 Now, I know I can fix the situation by adding missing fkeys myself, but I don't think pg_restore should be putting itself in deadlock. Best regards, depesz
On 2022-Aug-25, hubert depesz lubaczewski wrote: > Specifically, command to run it is: > sudo -u postgres pg_restore -j 64 -d database -L /tmp/schema-post-data.nopkey.list /tmp/schema-post-data.dump > 2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,42,"ALTER TABLE waiting",2022-08-2413:16:29 UTC,21/1932,0,LOG,00000,"process 3343477 detected deadlock while waiting for ShareRowExclusiveLockon relation 742617610 of database 16641 after 1000.647 ms","Process holding the lock: 3587718. Waitqueue: .",,,,,"ALTER TABLE ONLY some_schema.table_a_o > ADD CONSTRAINT table_a_o_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id); > ",,,"pg_restore","client backend",,3355460102417501954 > > > 2022-08-24 20:01:50.291 UTC,"postgres","database",3343477,"[local]",630624ad.330475,46,"ALTER TABLE waiting",2022-08-2413:16:29 UTC,21/1933,0,LOG,00000,"process 3343477 detected deadlock while waiting for ShareRowExclusiveLockon relation 742617610 of database 16641 after 1000.030 ms","Process holding the lock: 3587718. Waitqueue: .",,,,,"ALTER TABLE ONLY some_schema.table_a > ADD CONSTRAINT fk_rails_46718e626a FOREIGN KEY (migrate_from_id) REFERENCES some_schema.table_q(id); > ",,,"pg_restore","client backend",,-2548896815899838768 > > Now, I know I can fix the situation by adding missing fkeys myself, but > I don't think pg_restore should be putting itself in deadlock. You're right, it shouldn't. Parallel restore shouldn't run DDL commands in parallel that would deadlock, but I suppose there must be holes in that. What was process 3587718 doing at the time? Some DDL in some_schema.table_q, I suspect, right? Are any of these tables partitioned? The "Wait queue: ." bit is ugly. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "The problem with the facetime model is not just that it's demoralizing, but that the people pretending to work interrupt the ones actually working." (Paul Graham)
On Tue, Aug 30, 2022 at 11:06:34AM +0200, Alvaro Herrera wrote: > On 2022-Aug-25, hubert depesz lubaczewski wrote: > > > Specifically, command to run it is: > > sudo -u postgres pg_restore -j 64 -d database -L /tmp/schema-post-data.nopkey.list /tmp/schema-post-data.dump > > > 2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,42,"ALTER TABLE waiting",2022-08-2413:16:29 UTC,21/1932,0,LOG,00000,"process 3343477 detected deadlock while waiting for ShareRowExclusiveLockon relation 742617610 of database 16641 after 1000.647 ms","Process holding the lock: 3587718. Waitqueue: .",,,,,"ALTER TABLE ONLY some_schema.table_a_o > > ADD CONSTRAINT table_a_o_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id); > > ",,,"pg_restore","client backend",,3355460102417501954 > > > > > > 2022-08-24 20:01:50.291 UTC,"postgres","database",3343477,"[local]",630624ad.330475,46,"ALTER TABLE waiting",2022-08-2413:16:29 UTC,21/1933,0,LOG,00000,"process 3343477 detected deadlock while waiting for ShareRowExclusiveLockon relation 742617610 of database 16641 after 1000.030 ms","Process holding the lock: 3587718. Waitqueue: .",,,,,"ALTER TABLE ONLY some_schema.table_a > > ADD CONSTRAINT fk_rails_46718e626a FOREIGN KEY (migrate_from_id) REFERENCES some_schema.table_q(id); > > ",,,"pg_restore","client backend",,-2548896815899838768 > > > > Now, I know I can fix the situation by adding missing fkeys myself, but > > I don't think pg_restore should be putting itself in deadlock. > > You're right, it shouldn't. Parallel restore shouldn't run DDL commands > in parallel that would deadlock, but I suppose there must be holes in > that. > > What was process 3587718 doing at the time? Some DDL in > some_schema.table_q, I suspect, right? Are any of these tables > partitioned? So, more info from logs: 2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,43,"ALTER TABLE",2022-08-24 13:16:29UTC,21/1932,0,ERROR,40P01,"deadlock detected","Process 3343477 waits for ShareRowExclusiveLock on relation 742617610of database 16641; blocked by process 3587718. Process 3587718 waits for RowExclusiveLock on relation 742615338 of database 16641; blocked by process 3343477. Process 3343477: ALTER TABLE ONLY some_schema.table_ao ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id); Process 3587718: <command string not enabled>","See server log for query details.",,,,"ALTER TABLE ONLY some_schema.table_ao ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id); ",,,"pg_restore","client backend",,3355460102417501954 Which is weird, as it seems that they both were trying to do the same fkey?! Neither table_ao nor table_q are partitioned, and used restore-list contained this fkey only once. Best regards, depesz
On 2022-Aug-30, hubert depesz lubaczewski wrote: > 2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,43,"ALTER TABLE",2022-08-24 13:16:29UTC,21/1932,0,ERROR,40P01,"deadlock detected","Process 3343477 waits for ShareRowExclusiveLock on relation 742617610of database 16641; blocked by process 3587718. > Process 3587718 waits for RowExclusiveLock on relation 742615338 of database 16641; blocked by process 3343477. > Process 3343477: ALTER TABLE ONLY some_schema.table_ao > ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id); > > > > Process 3587718: <command string not enabled>","See server log for query details.",,,,"ALTER TABLE ONLY some_schema.table_ao > ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id); > > > ",,,"pg_restore","client backend",,3355460102417501954 > > Which is weird, as it seems that they both were trying to do the same fkey?! > > Neither table_ao nor table_q are partitioned, and used restore-list contained this fkey only once. Hmm, you said table_a_o and table_a first, but now you say table_ao. I suspect that's just a typo resulting from anonymizing table names. Right? OK, so no partitioning. But are there any inheritance relationship between these tables? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "No es bueno caminar con un hombre muerto"
On Tue, Aug 30, 2022 at 11:29:37AM +0200, Alvaro Herrera wrote: > On 2022-Aug-30, hubert depesz lubaczewski wrote: > > > 2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,43,"ALTER TABLE",2022-08-24 13:16:29UTC,21/1932,0,ERROR,40P01,"deadlock detected","Process 3343477 waits for ShareRowExclusiveLock on relation 742617610of database 16641; blocked by process 3587718. > > Process 3587718 waits for RowExclusiveLock on relation 742615338 of database 16641; blocked by process 3343477. > > Process 3343477: ALTER TABLE ONLY some_schema.table_ao > > ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id); > > > > > > > > Process 3587718: <command string not enabled>","See server log for query details.",,,,"ALTER TABLE ONLY some_schema.table_ao > > ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id); > > > > > > ",,,"pg_restore","client backend",,3355460102417501954 > > > > Which is weird, as it seems that they both were trying to do the same fkey?! > > > > Neither table_ao nor table_q are partitioned, and used restore-list contained this fkey only once. > > Hmm, you said table_a_o and table_a first, but now you say table_ao. I > suspect that's just a typo resulting from anonymizing table names. Right? > > OK, so no partitioning. But are there any inheritance relationship > between these tables? There were two deadlocks, I might have copy/pasted info from another case. These are *all* deadlock related messages from the log, all where anonymized using regexps, no manual intervention, so the changes should be consistent. ====================================================== 2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,42,"ALTER TABLE waiting",2022-08-24 13:16:29UTC,21/1932,0,LOG,00000,"process 3343477 detected deadlock while waiting for ShareRowExclusiveLock on relation 742617610of database 16641 after 1000.647 ms","Process holding the lock: 3587718. Wait queue: .",,,,,"ALTER TABLE ONLY some_schema.table_ao ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id); ",,,"pg_restore","client backend",,3355460102417501954 ====================================================== 2022-08-24 20:01:04.466 UTC,"postgres","database",3343477,"[local]",630624ad.330475,43,"ALTER TABLE",2022-08-24 13:16:29UTC,21/1932,0,ERROR,40P01,"deadlock detected","Process 3343477 waits for ShareRowExclusiveLock on relation 742617610of database 16641; blocked by process 3587718. Process 3587718 waits for RowExclusiveLock on relation 742615338 of database 16641; blocked by process 3343477. Process 3343477: ALTER TABLE ONLY some_schema.table_ao ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id); Process 3587718: <command string not enabled>","See server log for query details.",,,,"ALTER TABLE ONLY some_schema.table_ao ADD CONSTRAINT table_ao_q_id_fk FOREIGN KEY (q_id) REFERENCES some_schema.table_q(id); ",,,"pg_restore","client backend",,3355460102417501954 ====================================================== 2022-08-24 20:01:50.291 UTC,"postgres","database",3343477,"[local]",630624ad.330475,46,"ALTER TABLE waiting",2022-08-24 13:16:29UTC,21/1933,0,LOG,00000,"process 3343477 detected deadlock while waiting for ShareRowExclusiveLock on relation 742617610of database 16641 after 1000.030 ms","Process holding the lock: 3587718. Wait queue: .",,,,,"ALTER TABLE ONLY some_schema.table_a ADD CONSTRAINT fk_rails_46718e626a FOREIGN KEY (migrate_from_id) REFERENCES some_schema.table_q(id); ",,,"pg_restore","client backend",,-2548896815899838768 ====================================================== 2022-08-24 20:01:50.291 UTC,"postgres","database",3343477,"[local]",630624ad.330475,47,"ALTER TABLE",2022-08-24 13:16:29UTC,21/1933,0,ERROR,40P01,"deadlock detected","Process 3343477 waits for ShareRowExclusiveLock on relation 742617610of database 16641; blocked by process 3587718. Process 3587718 waits for RowExclusiveLock on relation 742615348 of database 16641; blocked by process 3343477. Process 3343477: ALTER TABLE ONLY some_schema.table_a ADD CONSTRAINT fk_rails_46718e626a FOREIGN KEY (migrate_from_id) REFERENCES some_schema.table_q(id); Process 3587718: <command string not enabled>","See server log for query details.",,,,"ALTER TABLE ONLY some_schema.table_a ADD CONSTRAINT fk_rails_46718e626a FOREIGN KEY (migrate_from_id) REFERENCES some_schema.table_q(id); ",,,"pg_restore","client backend",,-2548896815899838768 ====================================================== As for inheritance - none of table_a, table_ao, table_q has any inheritance. There are tables with inheritance in this db, but not these. It is, though, possible that one of table_a/ao/q has fkey pointing to some inherited table, or some inherited table has fkey to _a/_ao/_q - would that matter? i can dig details if it would be interesting. Best regards, depesz