Обсуждение: [BUGS] BUG #14657: Server process segmentation fault in v10,May 10th dev snapshot
[BUGS] BUG #14657: Server process segmentation fault in v10,May 10th dev snapshot
От
sveinn.sveinsson@gmail.com
Дата:
The following bug has been logged on the website: Bug reference: 14657 Logged by: Sveinn Sveinsson Email address: sveinn.sveinsson@gmail.com PostgreSQL version: Unsupported/Unknown Operating system: Linux x86_64 Description: The following causes segmentation fault in v10, May 10th development snapshot: create table test (sd timestamp,anb varchar(16)) partition by range (sd); create table test_1 partition of test for values from ('2017-01-01') to ('2017-02-01'); create index test_1_a on test_1 (anb,sd); insert into test values ('2017-01-01','12345'); select min(sd), max(sd) from test where anb='12345'; The server log file shows: 2017-05-16 09:36:13.243 UTC [1503] LOG: server process (PID 3474) was terminated by signal 11: Segmentation fault 2017-05-16 09:36:13.243 UTC [1503] DETAIL: Failed process was running: select min(sd), max(sd) from test where anb='12345'; 2017-05-16 09:36:13.244 UTC [1503] LOG: terminating any other active server processes 2017-05-16 09:36:13.245 UTC [3463] WARNING: terminating connection because of crash of another server process The stack trace is (no debug info): Program terminated with signal 11, Segmentation fault. #0 0x000000000061ab1b in list_nth () (gdb) bt #0 0x000000000061ab1b in list_nth () #1 0x00000000005e4081 in ExecLockNonLeafAppendTables () #2 0x00000000005f4d52 in ExecInitMergeAppend () #3 0x00000000005e0365 in ExecInitNode () #4 0x00000000005f35a7 in ExecInitLimit () #5 0x00000000005e00f3 in ExecInitNode () #6 0x00000000005dd207 in standard_ExecutorStart () #7 0x00000000006f96d2 in PortalStart () #8 0x00000000006f5c7f in exec_simple_query () #9 0x00000000006f6fac in PostgresMain () #10 0x0000000000475cdc in ServerLoop () #11 0x0000000000692ffa in PostmasterMain () #12 0x0000000000476600 in main () Regards, Sveinn Sveinsson. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, May 17, 2017 at 7:41 PM, <sveinn.sveinsson@gmail.com> wrote: > (gdb) bt > #0 0x000000000061ab1b in list_nth () > #1 0x00000000005e4081 in ExecLockNonLeafAppendTables () > #2 0x00000000005f4d52 in ExecInitMergeAppend () > #3 0x00000000005e0365 in ExecInitNode () > #4 0x00000000005f35a7 in ExecInitLimit () > #5 0x00000000005e00f3 in ExecInitNode () > #6 0x00000000005dd207 in standard_ExecutorStart () > #7 0x00000000006f96d2 in PortalStart () > #8 0x00000000006f5c7f in exec_simple_query () > #9 0x00000000006f6fac in PostgresMain () > #10 0x0000000000475cdc in ServerLoop () > #11 0x0000000000692ffa in PostmasterMain () > #12 0x0000000000476600 in main () Seems like the issue is that the plans under multiple subroots are pointing to the same partitioned_rels. If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)" the rtoffset is specific to the subroot. Now, problem is that set_plan_refs called for different subroot is updating the same partition_rel info and make this value completely wrong which will ultimately make ExecLockNonLeafAppendTables to access the out of bound "rte" index. set_plan_refs { [clipped] case T_MergeAppend: { [clipped] foreach(l, splan->partitioned_rels) { lfirst_int(l) += rtoffset; I think the solution should be that create_merge_append_path make the copy of partitioned_rels list? Attached patch fixes the problem but I am not completely sure about the fix. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Вложения
Re: [BUGS] BUG #14657: Server process segmentation fault in v10, May10th dev snapshot
От
Amit Langote
Дата:
On 2017/05/18 2:14, Dilip Kumar wrote: > On Wed, May 17, 2017 at 7:41 PM, <sveinn.sveinsson@gmail.com> wrote: >> (gdb) bt >> #0 0x000000000061ab1b in list_nth () >> #1 0x00000000005e4081 in ExecLockNonLeafAppendTables () >> #2 0x00000000005f4d52 in ExecInitMergeAppend () >> #3 0x00000000005e0365 in ExecInitNode () >> #4 0x00000000005f35a7 in ExecInitLimit () >> #5 0x00000000005e00f3 in ExecInitNode () >> #6 0x00000000005dd207 in standard_ExecutorStart () >> #7 0x00000000006f96d2 in PortalStart () >> #8 0x00000000006f5c7f in exec_simple_query () >> #9 0x00000000006f6fac in PostgresMain () >> #10 0x0000000000475cdc in ServerLoop () >> #11 0x0000000000692ffa in PostmasterMain () >> #12 0x0000000000476600 in main () Thanks for the test case Sveinn and thanks Dilip for analyzing. > Seems like the issue is that the plans under multiple subroots are > pointing to the same partitioned_rels. That's correct. > If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan > *plan, int rtoffset)" the rtoffset is specific to the subroot. Now, > problem is that set_plan_refs called for different subroot is updating > the same partition_rel info and make this value completely wrong which > will ultimately make ExecLockNonLeafAppendTables to access the out of > bound "rte" index. Yes. > set_plan_refs > { > [clipped] > case T_MergeAppend: > { > [clipped] > > foreach(l, splan->partitioned_rels) > { > lfirst_int(l) += rtoffset; > > > I think the solution should be that create_merge_append_path make the > copy of partitioned_rels list? Yes, partitioned_rels should be copied. > Attached patch fixes the problem but I am not completely sure about the fix. Thanks for creating the patch, although I think a better fix would be to make get_partitioned_child_rels() do the list_copy. That way, any other users of partitioned_rels will not suffer the same issue. Attached patch implements that, along with a regression test. Added to the open items. Thanks, Amit -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Вложения
Re: [BUGS] BUG #14657: Server process segmentation fault in v10, May10th dev snapshot
От
Amit Langote
Дата:
On 2017/05/18 10:49, Amit Langote wrote: > On 2017/05/18 2:14, Dilip Kumar wrote: >> On Wed, May 17, 2017 at 7:41 PM, <sveinn.sveinsson@gmail.com> wrote: >>> (gdb) bt >>> #0 0x000000000061ab1b in list_nth () >>> #1 0x00000000005e4081 in ExecLockNonLeafAppendTables () >>> #2 0x00000000005f4d52 in ExecInitMergeAppend () >>> #3 0x00000000005e0365 in ExecInitNode () >>> #4 0x00000000005f35a7 in ExecInitLimit () >>> #5 0x00000000005e00f3 in ExecInitNode () >>> #6 0x00000000005dd207 in standard_ExecutorStart () >>> #7 0x00000000006f96d2 in PortalStart () >>> #8 0x00000000006f5c7f in exec_simple_query () >>> #9 0x00000000006f6fac in PostgresMain () >>> #10 0x0000000000475cdc in ServerLoop () >>> #11 0x0000000000692ffa in PostmasterMain () >>> #12 0x0000000000476600 in main () > > Thanks for the test case Sveinn and thanks Dilip for analyzing. > >> Seems like the issue is that the plans under multiple subroots are >> pointing to the same partitioned_rels. > > That's correct. > >> If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan >> *plan, int rtoffset)" the rtoffset is specific to the subroot. Now, >> problem is that set_plan_refs called for different subroot is updating >> the same partition_rel info and make this value completely wrong which >> will ultimately make ExecLockNonLeafAppendTables to access the out of >> bound "rte" index. > > Yes. > >> set_plan_refs >> { >> [clipped] >> case T_MergeAppend: >> { >> [clipped] >> >> foreach(l, splan->partitioned_rels) >> { >> lfirst_int(l) += rtoffset; >> >> >> I think the solution should be that create_merge_append_path make the >> copy of partitioned_rels list? > > Yes, partitioned_rels should be copied. > >> Attached patch fixes the problem but I am not completely sure about the fix. > > Thanks for creating the patch, although I think a better fix would be to > make get_partitioned_child_rels() do the list_copy. That way, any other > users of partitioned_rels will not suffer the same issue. Attached patch > implements that, along with a regression test. > > Added to the open items. Oops, forgot to cc -hackers. Patch attached again. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Thu, May 18, 2017 at 7:19 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Thanks for creating the patch, although I think a better fix would be to > make get_partitioned_child_rels() do the list_copy. That way, any other > users of partitioned_rels will not suffer the same issue. Attached patch > implements that, along with a regression test. Correct! This is generic fix. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] BUG #14657: Server process segmentation fault in v10, May10th dev snapshot
От
Sveinn Sveinsson
Дата:
The patch fixed the problem, thanks a lot. Regards, Sveinn. On fim 18.maí 2017 01:53, Amit Langote wrote: > On 2017/05/18 10:49, Amit Langote wrote: >> On 2017/05/18 2:14, Dilip Kumar wrote: >>> On Wed, May 17, 2017 at 7:41 PM, <sveinn.sveinsson@gmail.com> wrote: >>>> (gdb) bt >>>> #0 0x000000000061ab1b in list_nth () >>>> #1 0x00000000005e4081 in ExecLockNonLeafAppendTables () >>>> #2 0x00000000005f4d52 in ExecInitMergeAppend () >>>> #3 0x00000000005e0365 in ExecInitNode () >>>> #4 0x00000000005f35a7 in ExecInitLimit () >>>> #5 0x00000000005e00f3 in ExecInitNode () >>>> #6 0x00000000005dd207 in standard_ExecutorStart () >>>> #7 0x00000000006f96d2 in PortalStart () >>>> #8 0x00000000006f5c7f in exec_simple_query () >>>> #9 0x00000000006f6fac in PostgresMain () >>>> #10 0x0000000000475cdc in ServerLoop () >>>> #11 0x0000000000692ffa in PostmasterMain () >>>> #12 0x0000000000476600 in main () >> Thanks for the test case Sveinn and thanks Dilip for analyzing. >> >>> Seems like the issue is that the plans under multiple subroots are >>> pointing to the same partitioned_rels. >> That's correct. >> >>> If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan >>> *plan, int rtoffset)" the rtoffset is specific to the subroot. Now, >>> problem is that set_plan_refs called for different subroot is updating >>> the same partition_rel info and make this value completely wrong which >>> will ultimately make ExecLockNonLeafAppendTables to access the out of >>> bound "rte" index. >> Yes. >> >>> set_plan_refs >>> { >>> [clipped] >>> case T_MergeAppend: >>> { >>> [clipped] >>> >>> foreach(l, splan->partitioned_rels) >>> { >>> lfirst_int(l) += rtoffset; >>> >>> >>> I think the solution should be that create_merge_append_path make the >>> copy of partitioned_rels list? >> Yes, partitioned_rels should be copied. >> >>> Attached patch fixes the problem but I am not completely sure about the fix. >> Thanks for creating the patch, although I think a better fix would be to >> make get_partitioned_child_rels() do the list_copy. That way, any other >> users of partitioned_rels will not suffer the same issue. Attached patch >> implements that, along with a regression test. >> >> Added to the open items. > Oops, forgot to cc -hackers. Patch attached again. > > Thanks, > Amit
On Thu, May 18, 2017 at 7:23 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > On 2017/05/18 10:49, Amit Langote wrote: >> On 2017/05/18 2:14, Dilip Kumar wrote: >>> On Wed, May 17, 2017 at 7:41 PM, <sveinn.sveinsson@gmail.com> wrote: >>>> (gdb) bt >>>> #0 0x000000000061ab1b in list_nth () >>>> #1 0x00000000005e4081 in ExecLockNonLeafAppendTables () >>>> #2 0x00000000005f4d52 in ExecInitMergeAppend () >>>> #3 0x00000000005e0365 in ExecInitNode () >>>> #4 0x00000000005f35a7 in ExecInitLimit () >>>> #5 0x00000000005e00f3 in ExecInitNode () >>>> #6 0x00000000005dd207 in standard_ExecutorStart () >>>> #7 0x00000000006f96d2 in PortalStart () >>>> #8 0x00000000006f5c7f in exec_simple_query () >>>> #9 0x00000000006f6fac in PostgresMain () >>>> #10 0x0000000000475cdc in ServerLoop () >>>> #11 0x0000000000692ffa in PostmasterMain () >>>> #12 0x0000000000476600 in main () >> >> Thanks for the test case Sveinn and thanks Dilip for analyzing. >> >>> Seems like the issue is that the plans under multiple subroots are >>> pointing to the same partitioned_rels. >> >> That's correct. >> >>> If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan >>> *plan, int rtoffset)" the rtoffset is specific to the subroot. Now, >>> problem is that set_plan_refs called for different subroot is updating >>> the same partition_rel info and make this value completely wrong which >>> will ultimately make ExecLockNonLeafAppendTables to access the out of >>> bound "rte" index. >> >> Yes. >> >>> set_plan_refs >>> { >>> [clipped] >>> case T_MergeAppend: >>> { >>> [clipped] >>> >>> foreach(l, splan->partitioned_rels) >>> { >>> lfirst_int(l) += rtoffset; >>> >>> >>> I think the solution should be that create_merge_append_path make the >>> copy of partitioned_rels list? >> >> Yes, partitioned_rels should be copied. >> >>> Attached patch fixes the problem but I am not completely sure about the fix. >> >> Thanks for creating the patch, although I think a better fix would be to >> make get_partitioned_child_rels() do the list_copy. That way, any other >> users of partitioned_rels will not suffer the same issue. Attached patch >> implements that, along with a regression test. >> >> Added to the open items. > > Oops, forgot to cc -hackers. Patch attached again. May be we should add a comment as to why the copy is needed. We still have the same copy shared across multiple append paths and set_plan_refs would change change it underneath those. May not be a problem right now but may be a problem in the future. Another option, which consumes a bit less memory is to make a copy at the time of planning if the path gets selected as the cheapest path. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On Fri, May 19, 2017 at 6:07 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > We still have the same copy shared across multiple append paths and > set_plan_refs would change change it underneath those. May not be a > problem right now but may be a problem in the future. I agree. I think it's better for the path-creation functions to copy the list, so that there is no surprising sharing of substructure. set_plan_refs() obviously expects this data to be unshared, and this seems like the best way to ensure that's true in all cases. Committed that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company