RE: Forget close an open relation in ReorderBufferProcessTXN()
От | osumi.takamichi@fujitsu.com |
---|---|
Тема | RE: Forget close an open relation in ReorderBufferProcessTXN() |
Дата | |
Msg-id | OSBPR01MB48887473DB79D2B29A74CA41ED299@OSBPR01MB4888.jpnprd01.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: Forget close an open relation in ReorderBufferProcessTXN() (Amit Langote <amitlangote09@gmail.com>) |
Ответы |
RE: Forget close an open relation in ReorderBufferProcessTXN()
("osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com>)
|
Список | pgsql-hackers |
On Friday, May 21, 2021 4:43 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Fri, May 21, 2021 at 3:55 PM osumi.takamichi@fujitsu.com > <osumi.takamichi@fujitsu.com> wrote: > > But, I've detected segmentation faults caused by the patch, which can > > happen during 100_bugs.pl in src/test/subscription. > > This happens more than one in ten times. > > > > This problem would be a timing issue and has been introduced by v3 > already. > > I used v5 for HEAD also and reproduced this failure, while OSS HEAD > > doesn't reproduce this, even when I executed 100_bugs.pl 200 times in a > tight loop. > > I aligned the commit id 4f586fe2 for all check. Below logs are ones I got from > v3. > > > > My first guess of the cause is that between the timing to get an entry > > from hash_search() in get_rel_sync_entry() and to set the map by > > convert_tuples_by_name() in maybe_send_schema(), we had invalidation > message, which tries to free unset descs in the entry ? > > Hmm, maybe get_rel_syn_entry() should explicitly set map to NULL when first > initializing an entry. It's possible that without doing so, the map remains set > to a garbage value, which causes the invalidation callback that runs into such > partially initialized entry to segfault upon trying to deference that garbage > pointer. Just in case, I prepared a new PG and did a check to make get_rel_sync_entry() print its first pointer value with elog. Here, when I executed 100_bugs.pl, I got some garbage below. * The change I did: @@ -1011,6 +1011,7 @@ get_rel_sync_entry(PGOutputData *data, Oid relid) entry->pubactions.pubinsert = entry->pubactions.pubupdate = entry->pubactions.pubdelete = entry->pubactions.pubtruncate = false; entry->publish_as_relid = InvalidOid; + elog(LOG, "**> the pointer's default value : %p", entry->map); } * Grep result of all logs from 100_bugs.pl 2021-05-21 09:05:56.132 UTC [29122] sub1 LOG: **> the pointer's default value : (nil) 2021-05-21 09:06:11.556 UTC [30198] testsub1 LOG: **> the pointer's default value : (nil) 2021-05-21 09:06:11.561 UTC [30200] pg_16389_sync_16384_6964667281140237667 LOG: **> the pointer's default value : 0x7f7f7f7f7f7f7f7f 2021-05-21 09:06:11.567 UTC [30191] testsub2 LOG: **> the pointer's default value : (nil) 2021-05-21 09:06:11.570 UTC [30194] pg_16387_sync_16384_6964667292923737489 LOG: **> the pointer's default value : 0x7f7f7f7f7f7f7f7f 2021-05-21 09:06:02.513 UTC [29809] testsub LOG: **> the pointer's default value : (nil) 2021-05-21 09:06:02.557 UTC [29809] testsub LOG: **> the pointer's default value : (nil) So, your solution is right, I think. > I've tried that in the attached v6 patches. Please check. With this fix, I don't get the failure. I executed 100_bugs.pl 100 times in a loop and didn't face that problem. Again, I conducted one make check-world for each combination * use OSS HEAD or PG13 * apply only the first patch or both two patches Those all passed. Best Regards, Takamichi Osumi
В списке pgsql-hackers по дате отправления:
Следующее
От: Guillaume LelargeДата:
Сообщение: Re: "Multiple table synchronizations are processed serially" still happens