Обсуждение: [HACKERS] valgrind error in subscription code
Hi, I enabled skink / the valgrind animal to run the tap tests too (hugely increasing the test time :(), and that paid of: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-04-22%2004%3A52%3A13 ==606== VALGRINDERROR-BEGIN ==606== Conditional jump or move depends on uninitialised value(s) ==606== at 0x46A207: logicalrep_rel_open (relation.c:361) ==606== by 0x472D12: copy_table (tablesync.c:669) ==606== by 0x473186: LogicalRepSyncTableStart (tablesync.c:803) ==606== by 0x475287: ApplyWorkerMain (worker.c:1530) ==606== by 0x440AFD: StartBackgroundWorker (bgworker.c:835) ==606== by 0x44E48A: do_start_bgworker (postmaster.c:5577) ==606== by 0x44E59F: maybe_start_bgworker (postmaster.c:5761) ==606== by 0x44F11D: sigusr1_handler (postmaster.c:5015) ==606== by 0x4E430BF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.24.so) ==606== by 0x6FB8212: __select_nocancel (syscall-template.S:84) ==606== by 0x44F868: ServerLoop (postmaster.c:1693) ==606== by 0x450C53: PostmasterMain (postmaster.c:1337) ==606== Uninitialised value was created by a heap allocation ==606== at 0x605382: MemoryContextAlloc (mcxt.c:729) ==606== by 0x5E4E6A: DynaHashAlloc (dynahash.c:266) ==606== by 0x5E4EEE: element_alloc (dynahash.c:1637) ==606== by 0x5E5018: get_hash_entry (dynahash.c:1248) ==606== by 0x5E5898: hash_search_with_hash_value (dynahash.c:1033) ==606== by 0x5E5A0D: hash_search (dynahash.c:890) ==606== by 0x469D38: logicalrep_relmap_update (relation.c:179) ==606== by 0x472D05: copy_table (tablesync.c:666) ==606== by 0x473186: LogicalRepSyncTableStart (tablesync.c:803) ==606== by 0x475287: ApplyWorkerMain (worker.c:1530) ==606== by 0x440AFD: StartBackgroundWorker (bgworker.c:835) ==606== by 0x44E48A: do_start_bgworker (postmaster.c:5577) ==606== ==606== VALGRINDERROR-END there's also another bleat: ==2161== VALGRINDERROR-BEGIN ==2161== Syscall param write(buf) points to uninitialised byte(s) ==2161== at 0x4E42190: __write_nocancel (syscall-template.S:84) ==2161== by 0x4676C7: CheckPointReplicationOrigin (origin.c:574) ==2161== by 0x25DB99: CheckPointGuts (xlog.c:9007) ==2161== by 0x2647CD: CreateCheckPoint (xlog.c:8792) ==2161== by 0x26734F: StartupXLOG (xlog.c:7576) ==2161== by 0x4510FA: StartupProcessMain (startup.c:217) ==2161== by 0x273832: AuxiliaryProcessMain (bootstrap.c:426) ==2161== by 0x44D8A0: StartChildProcess (postmaster.c:5256) ==2161== by 0x44DF33: PostmasterStateMachine (postmaster.c:3791) ==2161== by 0x44EE25: reaper (postmaster.c:3038) ==2161== by 0x4E430BF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.24.so) ==2161== by 0x6FB8212: __select_nocancel (syscall-template.S:84) ==2161== Location 0xffeffda42 is 2 bytes inside local var "disk_state" ==2161== declared at origin.c:555, in frame #1 of thread 1 ==2161== Uninitialised value was created by a stack allocation ==2161== at 0x467421: CheckPointReplicationOrigin (origin.c:505) ==2161== ==2161== VALGRINDERROR-END ==2161== VALGRINDERROR-BEGIN ==2161== Syscall param write(buf) points to uninitialised byte(s) ==2161== at 0x4E42190: __write_nocancel (syscall-template.S:84) ==2161== by 0x467712: CheckPointReplicationOrigin (origin.c:591) ==2161== by 0x25DB99: CheckPointGuts (xlog.c:9007) ==2161== by 0x2647CD: CreateCheckPoint (xlog.c:8792) ==2161== by 0x26734F: StartupXLOG (xlog.c:7576) ==2161== by 0x4510FA: StartupProcessMain (startup.c:217) ==2161== by 0x273832: AuxiliaryProcessMain (bootstrap.c:426) ==2161== by 0x44D8A0: StartChildProcess (postmaster.c:5256) ==2161== by 0x44DF33: PostmasterStateMachine (postmaster.c:3791) ==2161== by 0x44EE25: reaper (postmaster.c:3038) ==2161== by 0x4E430BF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.24.so) ==2161== by 0x6FB8212: __select_nocancel (syscall-template.S:84) ==2161== Location 0xffeffda58 is 0 bytes inside local var "crc" ==2161== declared at origin.c:511, in frame #1 of thread 1 ==2161== Uninitialised value was created by a stack allocation ==2161== at 0x467421: CheckPointReplicationOrigin (origin.c:505) ==2161== ==2161== VALGRINDERROR-END but that looks more like something that needs to be silenced, based on the 3s look I gave it. Greetings, Andres Freund
On 22/04/17 20:31, Andres Freund wrote: > Hi, > > I enabled skink / the valgrind animal to run the tap tests too (hugely > increasing the test time :(), and that paid of: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-04-22%2004%3A52%3A13 > > ==606== VALGRINDERROR-BEGIN > ==606== Conditional jump or move depends on uninitialised value(s) > ==606== at 0x46A207: logicalrep_rel_open (relation.c:361) > ==606== by 0x472D12: copy_table (tablesync.c:669) > ==606== by 0x473186: LogicalRepSyncTableStart (tablesync.c:803) > ==606== by 0x475287: ApplyWorkerMain (worker.c:1530) > ==606== by 0x440AFD: StartBackgroundWorker (bgworker.c:835) > ==606== by 0x44E48A: do_start_bgworker (postmaster.c:5577) > ==606== by 0x44E59F: maybe_start_bgworker (postmaster.c:5761) > ==606== by 0x44F11D: sigusr1_handler (postmaster.c:5015) > ==606== by 0x4E430BF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.24.so) > ==606== by 0x6FB8212: __select_nocancel (syscall-template.S:84) > ==606== by 0x44F868: ServerLoop (postmaster.c:1693) > ==606== by 0x450C53: PostmasterMain (postmaster.c:1337) > ==606== Uninitialised value was created by a heap allocation > ==606== at 0x605382: MemoryContextAlloc (mcxt.c:729) > ==606== by 0x5E4E6A: DynaHashAlloc (dynahash.c:266) > ==606== by 0x5E4EEE: element_alloc (dynahash.c:1637) > ==606== by 0x5E5018: get_hash_entry (dynahash.c:1248) > ==606== by 0x5E5898: hash_search_with_hash_value (dynahash.c:1033) > ==606== by 0x5E5A0D: hash_search (dynahash.c:890) > ==606== by 0x469D38: logicalrep_relmap_update (relation.c:179) > ==606== by 0x472D05: copy_table (tablesync.c:666) > ==606== by 0x473186: LogicalRepSyncTableStart (tablesync.c:803) > ==606== by 0x475287: ApplyWorkerMain (worker.c:1530) > ==606== by 0x440AFD: StartBackgroundWorker (bgworker.c:835) > ==606== by 0x44E48A: do_start_bgworker (postmaster.c:5577) > ==606== > ==606== VALGRINDERROR-END Thanks, here is patch to fix that - I also removed the individual settings of everything to NULL/0/InvalidOid etc and just replaced it all with memset. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
Hi, On 2017-04-22 21:08:18 +0200, Petr Jelinek wrote: > Thanks, here is patch to fix that - I also removed the individual > settings of everything to NULL/0/InvalidOid etc and just replaced it all > with memset. Cool. > diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c > index 875a081..5bc54dd 100644 > --- a/src/backend/replication/logical/relation.c > +++ b/src/backend/replication/logical/relation.c > @@ -141,19 +141,10 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry) > pfree(remoterel->attnames); > pfree(remoterel->atttyps); > } > - remoterel->attnames = NULL; > - remoterel->atttyps = NULL; > - > bms_free(remoterel->attkeys); > - remoterel->attkeys = NULL; > > if (entry->attrmap) > pfree(entry->attrmap); > - Btw, I think it's a good pattern to zero things like attrmap after freeing. Based on a minute of looking it's unclear to me if logicalrep_relmap_update() could be called again, if e.g. one of the pallocs after the logicalrep_relmap_free_entry() errors out. I think you essentially addressed that with the memset, so that's good. - Andres
On 22/04/17 21:16, Andres Freund wrote: > Hi, > > On 2017-04-22 21:08:18 +0200, Petr Jelinek wrote: >> Thanks, here is patch to fix that - I also removed the individual >> settings of everything to NULL/0/InvalidOid etc and just replaced it all >> with memset. > > Cool. > >> diff --git a/src/backend/replication/logical/relation.c b/src/backend/replication/logical/relation.c >> index 875a081..5bc54dd 100644 >> --- a/src/backend/replication/logical/relation.c >> +++ b/src/backend/replication/logical/relation.c >> @@ -141,19 +141,10 @@ logicalrep_relmap_free_entry(LogicalRepRelMapEntry *entry) >> pfree(remoterel->attnames); >> pfree(remoterel->atttyps); >> } >> - remoterel->attnames = NULL; >> - remoterel->atttyps = NULL; >> - >> bms_free(remoterel->attkeys); >> - remoterel->attkeys = NULL; >> >> if (entry->attrmap) >> pfree(entry->attrmap); >> - > > Btw, I think it's a good pattern to zero things like attrmap after > freeing. Based on a minute of looking it's unclear to me if > logicalrep_relmap_update() could be called again, if e.g. one of the > pallocs after the logicalrep_relmap_free_entry() errors out. I think > you essentially addressed that with the memset, so that's good. > Yes, memset is called immediately after logicalrep_relmap_free_entry() which is why I found it redundant to set things to NULL there. We could put the memset directly inside logicalrep_relmap_free_entry() and the call either logicalrep_relmap_free_entry() or memset in the caller if that would make you feel better about it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2017-04-22 21:08:18 +0200, Petr Jelinek wrote: > On 22/04/17 20:31, Andres Freund wrote: > > Hi, > > > > I enabled skink / the valgrind animal to run the tap tests too (hugely > > increasing the test time :(), and that paid of: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-04-22%2004%3A52%3A13 > > > > ==606== VALGRINDERROR-BEGIN > > ==606== Conditional jump or move depends on uninitialised value(s) > > ==606== at 0x46A207: logicalrep_rel_open (relation.c:361) > > ==606== by 0x472D12: copy_table (tablesync.c:669) > > ==606== by 0x473186: LogicalRepSyncTableStart (tablesync.c:803) > > ==606== by 0x475287: ApplyWorkerMain (worker.c:1530) > > ==606== by 0x440AFD: StartBackgroundWorker (bgworker.c:835) > > ==606== by 0x44E48A: do_start_bgworker (postmaster.c:5577) > > ==606== by 0x44E59F: maybe_start_bgworker (postmaster.c:5761) > > ==606== by 0x44F11D: sigusr1_handler (postmaster.c:5015) > > ==606== by 0x4E430BF: ??? (in /lib/x86_64-linux-gnu/libpthread-2.24.so) > > ==606== by 0x6FB8212: __select_nocancel (syscall-template.S:84) > > ==606== by 0x44F868: ServerLoop (postmaster.c:1693) > > ==606== by 0x450C53: PostmasterMain (postmaster.c:1337) > > ==606== Uninitialised value was created by a heap allocation > > ==606== at 0x605382: MemoryContextAlloc (mcxt.c:729) > > ==606== by 0x5E4E6A: DynaHashAlloc (dynahash.c:266) > > ==606== by 0x5E4EEE: element_alloc (dynahash.c:1637) > > ==606== by 0x5E5018: get_hash_entry (dynahash.c:1248) > > ==606== by 0x5E5898: hash_search_with_hash_value (dynahash.c:1033) > > ==606== by 0x5E5A0D: hash_search (dynahash.c:890) > > ==606== by 0x469D38: logicalrep_relmap_update (relation.c:179) > > ==606== by 0x472D05: copy_table (tablesync.c:666) > > ==606== by 0x473186: LogicalRepSyncTableStart (tablesync.c:803) > > ==606== by 0x475287: ApplyWorkerMain (worker.c:1530) > > ==606== by 0x440AFD: StartBackgroundWorker (bgworker.c:835) > > ==606== by 0x44E48A: do_start_bgworker (postmaster.c:5577) > > ==606== > > ==606== VALGRINDERROR-END > > Thanks, here is patch to fix that - I also removed the individual > settings of everything to NULL/0/InvalidOid etc and just replaced it all > with memset. I've pushed this, and a fix for the (afaik harmless) replication origins bleat. Let's see what skink says - I wasn't patient enough to locally verify that your fix indeed fixes things. - Andres