Обсуждение: BUG #18170: Unexpected error: no relation entry for relid 3
The following bug has been logged on the website: Bug reference: 18170 Logged by: Zuming Jiang Email address: zuming.jiang@inf.ethz.ch PostgreSQL version: 16.0 Operating system: Ubuntu 20.04 Description: My fuzzer finds a bug in Postgres, which triggers an unexpected error. --- Set up database --- create table t0 (c1 text, c5 text, c6 text); create table t2 (vkey int4, c17 text, primary key(vkey)); ------------------------ The fuzzer generates a test case: --- Test case --- select rtrim(ref_4.c6, ref_2.c17) as c_0 from (t0 as ref_0 inner join ((t0 as ref_1 full outer join (t2 as ref_2 right outer join t2 as ref_3 on (ref_2.vkey = ref_3.vkey)) on (ref_1.c1 = ref_2.c17)) left outer join t0 as ref_4 on (ref_1.c5 = ref_4.c1)) on (ref_0.c6 = ref_4.c1)) except all select ref_6.c17 as c_0 from t2 as ref_6; ------------------------ --- Expected behavior --- The test case should not trigger any error. --- Actual behavior --- The test case trigger an error: ERROR: no relation entry for relid 3 --- Postgres version --- Github commit: 26f988212eada9c586223cbbf876c7eb455044d9 Version: PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit --- Platform information --- Platform: Ubuntu 20.04 Kernel: Linux 5.4.0-147-generic
On 10/26/23 16:01, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 18170 > Logged by: Zuming Jiang > Email address: zuming.jiang@inf.ethz.ch > PostgreSQL version: 16.0 This should be 17devel > Operating system: Ubuntu 20.04 > Description: > > My fuzzer finds a bug in Postgres, which triggers an unexpected error. This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove useless self-joins. -- Vik Fearing
Vik Fearing <vik@postgresfriends.org> writes: > On 10/26/23 16:01, PG Bug reporting form wrote: >> My fuzzer finds a bug in Postgres, which triggers an unexpected error. > This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove useless > self-joins. I wonder if that new code thinks it can remove ref_2 from the query, even though ref_2 is used in the targetlist. I'm not seeing control reach remove_leftjoinrel_from_query, though. Also, while nosing around in this, I tried to pprint(root) at the point of the error, and got 2023-10-26 12:48:37.852 EDT [1186007] WARNING: could not dump unrecognized node type: 37413808 This happens because the patch changed RelOptInfo.unique_for_rels from a list of Bitmapsets into a list of UniqueRelInfo, even though it did not bother to make UniqueRelInfo be a Node type (much less document the change in globally-visible data structures: pathnodes.h still says it's a list of Relid sets). This is not acceptable. I'm getting the distinct impression that this patch wasn't ready for prime time. regards, tom lane
On Thu, Oct 26, 2023 at 6:21 PM Vik Fearing <vik@postgresfriends.org> wrote: > On 10/26/23 16:01, PG Bug reporting form wrote: > > The following bug has been logged on the website: > > > > Bug reference: 18170 > > Logged by: Zuming Jiang > > Email address: zuming.jiang@inf.ethz.ch > > PostgreSQL version: 16.0 > > This should be 17devel > > > Operating system: Ubuntu 20.04 > > Description: > > > > My fuzzer finds a bug in Postgres, which triggers an unexpected error. > > This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove useless > self-joins. Thank you for bisecting! I'm currently digging into this. ------ Regards, Alexander Korotkov
On Thu, Oct 26, 2023 at 8:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Vik Fearing <vik@postgresfriends.org> writes: > > On 10/26/23 16:01, PG Bug reporting form wrote: > >> My fuzzer finds a bug in Postgres, which triggers an unexpected error. > > > This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove useless > > self-joins. > > I wonder if that new code thinks it can remove ref_2 from the query, > even though ref_2 is used in the targetlist. I'm not seeing > control reach remove_leftjoinrel_from_query, though. > > Also, while nosing around in this, I tried to pprint(root) at the > point of the error, and got > > 2023-10-26 12:48:37.852 EDT [1186007] WARNING: could not dump unrecognized node type: 37413808 > > This happens because the patch changed RelOptInfo.unique_for_rels from > a list of Bitmapsets into a list of UniqueRelInfo, even though it did > not bother to make UniqueRelInfo be a Node type (much less document > the change in globally-visible data structures: pathnodes.h still says > it's a list of Relid sets). This is not acceptable. > > I'm getting the distinct impression that this patch wasn't > ready for prime time. Please, give me a chance to fix this shortly. If that wouldn't be an easy fix, I will revert. ------ Regards, Alexander Korotkov
On 27/10/2023 00:12, Tom Lane wrote: > Vik Fearing <vik@postgresfriends.org> writes: >> On 10/26/23 16:01, PG Bug reporting form wrote: >>> My fuzzer finds a bug in Postgres, which triggers an unexpected error. > >> This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove useless >> self-joins. > > I wonder if that new code thinks it can remove ref_2 from the query, > even though ref_2 is used in the targetlist. I'm not seeing > control reach remove_leftjoinrel_from_query, though. As I see, this join can be removed: in the explain you can see that OUTER JOIN is replaced with the INNER JOIN(ref_2, ref_3) ON a key column. In my opinion, the origin of the problem is that the parent_root contains a link to ref_2 in its simple_rte_array[]->subquery->targetList. I am still looking for a general solution right now, but it doesn't look too complicated at first sight. > Also, while nosing around in this, I tried to pprint(root) at the > point of the error, and got Yeah, it is my blunder. Alexander already fixed that, as I see. The only question is whether it is worth making the UniqueRelInfo structure as a node (for debug purposes - I see only one reason) or we should exclude this field from read/write operations at all. -- regards, Andrei Lepikhov Postgres Professional
On Fri, Oct 27, 2023 at 9:31 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 27/10/2023 00:12, Tom Lane wrote: > > Vik Fearing <vik@postgresfriends.org> writes: > >> On 10/26/23 16:01, PG Bug reporting form wrote: > >>> My fuzzer finds a bug in Postgres, which triggers an unexpected error. > > > >> This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove useless > >> self-joins. > > > > I wonder if that new code thinks it can remove ref_2 from the query, > > even though ref_2 is used in the targetlist. I'm not seeing > > control reach remove_leftjoinrel_from_query, though. > > As I see, this join can be removed: in the explain you can see that > OUTER JOIN is replaced with the INNER JOIN(ref_2, ref_3) ON a key column. > In my opinion, the origin of the problem is that the parent_root > contains a link to ref_2 in its > simple_rte_array[]->subquery->targetList. I am still looking for a > general solution right now, but it doesn't look too complicated at first > sight. Yes, I came to the same conclusion. We process root->parse. But I didn't get why parent_root->simple_rte_array[]->subquery is not the same as root->parse. They look the same, but they are distinct copies. If they were the same pointers, there would be no problem. > > Also, while nosing around in this, I tried to pprint(root) at the > > point of the error, and got > > Yeah, it is my blunder. Alexander already fixed that, as I see. The only > question is whether it is worth making the UniqueRelInfo structure as a > node (for debug purposes - I see only one reason) or we should exclude > this field from read/write operations at all. I think we just shouldn't break the general rule, that everything inside the node should be nodes too. ------ Regards, Alexander Korotkov
On 27/10/2023 15:17, Alexander Korotkov wrote: > On Fri, Oct 27, 2023 at 9:31 AM Andrei Lepikhov > <a.lepikhov@postgrespro.ru> wrote: >> On 27/10/2023 00:12, Tom Lane wrote: >>> Vik Fearing <vik@postgresfriends.org> writes: >>>> On 10/26/23 16:01, PG Bug reporting form wrote: >>>>> My fuzzer finds a bug in Postgres, which triggers an unexpected error. >>> >>>> This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove useless >>>> self-joins. >>> >>> I wonder if that new code thinks it can remove ref_2 from the query, >>> even though ref_2 is used in the targetlist. I'm not seeing >>> control reach remove_leftjoinrel_from_query, though. >> >> As I see, this join can be removed: in the explain you can see that >> OUTER JOIN is replaced with the INNER JOIN(ref_2, ref_3) ON a key column. >> In my opinion, the origin of the problem is that the parent_root >> contains a link to ref_2 in its >> simple_rte_array[]->subquery->targetList. I am still looking for a >> general solution right now, but it doesn't look too complicated at first >> sight. > > Yes, I came to the same conclusion. We process root->parse. But I > didn't get why parent_root->simple_rte_array[]->subquery is not the > same as root->parse. They look the same, but they are distinct > copies. If they were the same pointers, there would be no problem. As I see, the copy of the parse tree is induced by the same feature as usual in the last few months: 2489d76. It introduced remove_nulling_relids, and it altered our parse tree. Right now, I don't have an answer: it should be fixed in SJE, or this is a more general issue just discovered by the SJE. -- regards, Andrei Lepikhov Postgres Professional
On 27/10/2023 16:07, Andrei Lepikhov wrote: > On 27/10/2023 15:17, Alexander Korotkov wrote: >> On Fri, Oct 27, 2023 at 9:31 AM Andrei Lepikhov >> <a.lepikhov@postgrespro.ru> wrote: >>> On 27/10/2023 00:12, Tom Lane wrote: >>>> Vik Fearing <vik@postgresfriends.org> writes: >>>>> On 10/26/23 16:01, PG Bug reporting form wrote: >>>>>> My fuzzer finds a bug in Postgres, which triggers an unexpected >>>>>> error. >>>> >>>>> This bisects to d3d55ce571369dad6e1d582f1655e5a3fbd8594a, Remove >>>>> useless >>>>> self-joins. >>>> >>>> I wonder if that new code thinks it can remove ref_2 from the query, >>>> even though ref_2 is used in the targetlist. I'm not seeing >>>> control reach remove_leftjoinrel_from_query, though. >>> >>> As I see, this join can be removed: in the explain you can see that >>> OUTER JOIN is replaced with the INNER JOIN(ref_2, ref_3) ON a key >>> column. >>> In my opinion, the origin of the problem is that the parent_root >>> contains a link to ref_2 in its >>> simple_rte_array[]->subquery->targetList. I am still looking for a >>> general solution right now, but it doesn't look too complicated at first >>> sight. >> >> Yes, I came to the same conclusion. We process root->parse. But I >> didn't get why parent_root->simple_rte_array[]->subquery is not the >> same as root->parse. They look the same, but they are distinct >> copies. If they were the same pointers, there would be no problem. > > As I see, the copy of the parse tree is induced by the same feature as > usual in the last few months: 2489d76. It introduced > remove_nulling_relids, and it altered our parse tree. Right now, I don't > have an answer: it should be fixed in SJE, or this is a more general > issue just discovered by the SJE. So, I can propose two options. First - don't clean only the current root structure, but also make cleanup of the parent. Although it looks safe, I am not happy with this approach - it seems too simple: we should have a genuine reason for such a cleaning because it potentially adds overhead. The second option is to add a flag for not altering queries in remove_nulling_relids() - it looks like a mistake when we have two different query trees in the root and its parent. Also, it reduces memory usage a bit. So, if my analysis is correct, it is better to use the second way (see attachment). -- regards, Andrei Lepikhov Postgres Professional
Вложения
On Fri, Oct 27, 2023 at 7:00 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
So, I can propose two options. First - don't clean only the current root
structure, but also make cleanup of the parent. Although it looks safe,
I am not happy with this approach - it seems too simple: we should have
a genuine reason for such a cleaning because it potentially adds overhead.
The second option is to add a flag for not altering queries in
remove_nulling_relids() - it looks like a mistake when we have two
different query trees in the root and its parent. Also, it reduces
memory usage a bit.
So, if my analysis is correct, it is better to use the second way (see
attachment).
Alternatively, can we look at subroot->parse->targetList instead of
subquery->targetList where we call estimate_num_groups on the output of
the subquery?
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -341,7 +341,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
*pNumGroups = subpath->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
subpath->rows,
NULL,
NULL);
BTW, I'm a little surprised that QTW_DONT_COPY_QUERY doesn't seem to be
used anywhere currently.
Thanks
Richard
subquery->targetList where we call estimate_num_groups on the output of
the subquery?
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -341,7 +341,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root,
*pNumGroups = subpath->rows;
else
*pNumGroups = estimate_num_groups(subroot,
- get_tlist_exprs(subquery->targetList, false),
+ get_tlist_exprs(subroot->parse->targetList, false),
subpath->rows,
NULL,
NULL);
BTW, I'm a little surprised that QTW_DONT_COPY_QUERY doesn't seem to be
used anywhere currently.
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes: > Alternatively, can we look at subroot->parse->targetList instead of > subquery->targetList where we call estimate_num_groups on the output of > the subquery? Please read the comment just above that. regards, tom lane
On Fri, Oct 27, 2023 at 5:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Richard Guo <guofenglinux@gmail.com> writes: > > Alternatively, can we look at subroot->parse->targetList instead of > > subquery->targetList where we call estimate_num_groups on the output of > > the subquery? > > Please read the comment just above that. Thank you, Tom, This is clear. Could you please comment on the second option by Andrei [1]? Was the copying of the parse trees an intention of 2489d76 or just a side effect? Do you think removing copying of trees is safe? Links 1. https://www.postgresql.org/message-id/2eb87c2e-7e6f-4002-8df3-8fac3aa6a037%40postgrespro.ru ------ Regards, Alexander Korotkov
On 27/10/2023 21:10, Richard Guo wrote: > > On Fri, Oct 27, 2023 at 7:00 PM Andrei Lepikhov > <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote: > > So, I can propose two options. First - don't clean only the current > root > structure, but also make cleanup of the parent. Although it looks safe, > I am not happy with this approach - it seems too simple: we should have > a genuine reason for such a cleaning because it potentially adds > overhead. > The second option is to add a flag for not altering queries in > remove_nulling_relids() - it looks like a mistake when we have two > different query trees in the root and its parent. Also, it reduces > memory usage a bit. > So, if my analysis is correct, it is better to use the second way (see > attachment). > > > Alternatively, can we look at subroot->parse->targetList instead of > subquery->targetList where we call estimate_num_groups on the output of > the subquery? It is a solution. But does it mask the real problem? In my mind, we copy node trees to use somewhere else or probe a conjecture. Here, we have two different representations of the same subquery. Keeping aside the memory consumption issue, is it correct? Make sense to apply both options: switch the groups estimation to subroot targetList and keep one version of a subquery. In attachment - second (combined) version of the change. Here I added assertions to check identity of root->parse and incoming query tree. > > --- a/src/backend/optimizer/prep/prepunion.c > +++ b/src/backend/optimizer/prep/prepunion.c > @@ -341,7 +341,7 @@ recurse_set_operations(Node *setOp, PlannerInfo *root, > *pNumGroups = subpath->rows; > else > *pNumGroups = estimate_num_groups(subroot, > - > get_tlist_exprs(subquery->targetList, false), > + > get_tlist_exprs(subroot->parse->targetList, false), > subpath->rows, > NULL, > NULL); > > BTW, I'm a little surprised that QTW_DONT_COPY_QUERY doesn't seem to be > used anywhere currently. I too. But we use this flag in the enterprise fork to reduce memory consumption. It could be proposed for upstream, but looks a bit unsafe. I guess, some extensions could do the same. -- regards, Andrei Lepikhov Postgres Professional
Вложения
On 28/10/2023 08:59, Andrei Lepikhov wrote: >> BTW, I'm a little surprised that QTW_DONT_COPY_QUERY doesn't seem to be >> used anywhere currently. > > I too. But we use this flag in the enterprise fork to reduce memory > consumption. It could be proposed for upstream, but looks a bit unsafe. > I guess, some extensions could do the same. What is "the enterprise fork"? This is a PostgreSQL mailing list. -- Vik Fearing
On Sat, Oct 28, 2023 at 9:59 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > > On 27/10/2023 21:10, Richard Guo wrote: > > > > On Fri, Oct 27, 2023 at 7:00 PM Andrei Lepikhov > > <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote: > > > > So, I can propose two options. First - don't clean only the current > > root > > structure, but also make cleanup of the parent. Although it looks safe, > > I am not happy with this approach - it seems too simple: we should have > > a genuine reason for such a cleaning because it potentially adds > > overhead. > > The second option is to add a flag for not altering queries in > > remove_nulling_relids() - it looks like a mistake when we have two > > different query trees in the root and its parent. Also, it reduces > > memory usage a bit. > > So, if my analysis is correct, it is better to use the second way (see > > attachment). > > > > > > Alternatively, can we look at subroot->parse->targetList instead of > > subquery->targetList where we call estimate_num_groups on the output of > > the subquery? > > It is a solution. But does it mask the real problem? In my mind, we copy > node trees to use somewhere else or probe a conjecture. Here, we have > two different representations of the same subquery. Keeping aside the > memory consumption issue, is it correct? > Make sense to apply both options: switch the groups estimation to > subroot targetList and keep one version of a subquery. > In attachment - second (combined) version of the change. Here I added > assertions to check identity of root->parse and incoming query tree. Andrei, did you read the comment just before the groups estimation as pointed by Tom [1]? * XXX you don't really want to know about this: we do the estimation * using the subquery's original targetlist expressions, not the * subroot->processed_tlist which might seem more appropriate. The * reason is that if the subquery is itself a setop, it may return a * processed_tlist containing "varno 0" Vars generated by * generate_append_tlist, and those would confuse estimate_num_groups * mightily. We ought to get rid of the "varno 0" hack, but that * requires a redesign of the parsetree representation of setops, so * that there can be an RTE corresponding to each setop's output. As I understand, it requires much more work to correctly switch the groups estimation to subroot targetList. +1 for asserts that parse trees are the same. Links 1. https://www.postgresql.org/message-id/1551957.1698417686%40sss.pgh.pa.us ------ Regards, Alexander Korotkov
On 28/10/2023 17:10, Alexander Korotkov wrote: >> >> It is a solution. But does it mask the real problem? In my mind, we copy >> node trees to use somewhere else or probe a conjecture. Here, we have >> two different representations of the same subquery. Keeping aside the >> memory consumption issue, is it correct? >> Make sense to apply both options: switch the groups estimation to >> subroot targetList and keep one version of a subquery. >> In attachment - second (combined) version of the change. Here I added >> assertions to check identity of root->parse and incoming query tree. > > Andrei, did you read the comment just before the groups estimation as > pointed by Tom [1]? Yes, and I am a bit confused. We use here subroot->parse->targetList. The processed_tlist, where we can find the "Varno 0" value, is based on it, but it is different. As I see, forming processed_tlist, we make a new node and don't change the original targetList. Am I wrong? > * XXX you don't really want to know about this: we do the estimation > * using the subquery's original targetlist expressions, not the > * subroot->processed_tlist which might seem more appropriate. The > * reason is that if the subquery is itself a setop, it may return a > * processed_tlist containing "varno 0" Vars generated by > * generate_append_tlist, and those would confuse estimate_num_groups > * mightily. We ought to get rid of the "varno 0" hack, but that > * requires a redesign of the parsetree representation of setops, so > * that there can be an RTE corresponding to each setop's output. > > As I understand, it requires much more work to correctly switch the > groups estimation to subroot targetList. "Varno 0" is quite an irritating problem, which has beaten me a lot before, during the development of the GROUP-BY optimization feature and not only. I'd be glad to redesign this part of the planner. But I didn't find an easy way to implement that yet. -- regards, Andrei Lepikhov Postgres Professional
On Sat, Oct 28, 2023 at 1:10 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Sat, Oct 28, 2023 at 9:59 AM Andrei Lepikhov > <a.lepikhov@postgrespro.ru> wrote: > > > > On 27/10/2023 21:10, Richard Guo wrote: > > > > > > On Fri, Oct 27, 2023 at 7:00 PM Andrei Lepikhov > > > <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote: > > > > > > So, I can propose two options. First - don't clean only the current > > > root > > > structure, but also make cleanup of the parent. Although it looks safe, > > > I am not happy with this approach - it seems too simple: we should have > > > a genuine reason for such a cleaning because it potentially adds > > > overhead. > > > The second option is to add a flag for not altering queries in > > > remove_nulling_relids() - it looks like a mistake when we have two > > > different query trees in the root and its parent. Also, it reduces > > > memory usage a bit. > > > So, if my analysis is correct, it is better to use the second way (see > > > attachment). > > > > > > > > > Alternatively, can we look at subroot->parse->targetList instead of > > > subquery->targetList where we call estimate_num_groups on the output of > > > the subquery? > > > > It is a solution. But does it mask the real problem? In my mind, we copy > > node trees to use somewhere else or probe a conjecture. Here, we have > > two different representations of the same subquery. Keeping aside the > > memory consumption issue, is it correct? > > Make sense to apply both options: switch the groups estimation to > > subroot targetList and keep one version of a subquery. > > In attachment - second (combined) version of the change. Here I added > > assertions to check identity of root->parse and incoming query tree. > > Andrei, did you read the comment just before the groups estimation as > pointed by Tom [1]? > > * XXX you don't really want to know about this: we do the estimation > * using the subquery's original targetlist expressions, not the > * subroot->processed_tlist which might seem more appropriate. The > * reason is that if the subquery is itself a setop, it may return a > * processed_tlist containing "varno 0" Vars generated by > * generate_append_tlist, and those would confuse estimate_num_groups > * mightily. We ought to get rid of the "varno 0" hack, but that > * requires a redesign of the parsetree representation of setops, so > * that there can be an RTE corresponding to each setop's output. > > As I understand, it requires much more work to correctly switch the > groups estimation to subroot targetList. > > +1 for asserts that parse trees are the same. I made some beautification of the patch by Andrei. I also removed the part which changes the target list for estimate_num_groups(). Any objections to pushing this? ------ Regards, Alexander Korotkov
Вложения
Alexander Korotkov <aekorotkov@gmail.com> writes: > I made some beautification of the patch by Andrei. I also removed the > part which changes the target list for estimate_num_groups(). Any > objections to pushing this? It seems moderately likely that this will break as much as it fixes. I've not studied the original patch enough to understand why you need to be playing strange games with tree mutation rules, but I suspect that this is band-aiding over some rather fundamentally bad code. regards, tom lane
On Fri, Oct 27, 2023 at 10:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> Alternatively, can we look at subroot->parse->targetList instead of
> subquery->targetList where we call estimate_num_groups on the output of
> the subquery?
Please read the comment just above that.
Hmm, I understand that we cannot look at subroot->processed_tlist here
because of the "varno 0" hack explained in the comment. But what I
proposed is to look at subroot->parse->targetList, which should be fine
because the tlist generated by generate_append_tlist would only be
returned into processed_tlist not the original parse->targetList. For
instance, the query below still works if we look at
subroot->parse->targetList, even though the subquery is itself a setop.
(SELECT 1,2,3 UNION SELECT 4,5,6 ORDER BY 1,2) INTERSECT SELECT 4,5,6;
?column? | ?column? | ?column?
----------+----------+----------
4 | 5 | 6
(1 row)
Thanks
Richard
because of the "varno 0" hack explained in the comment. But what I
proposed is to look at subroot->parse->targetList, which should be fine
because the tlist generated by generate_append_tlist would only be
returned into processed_tlist not the original parse->targetList. For
instance, the query below still works if we look at
subroot->parse->targetList, even though the subquery is itself a setop.
(SELECT 1,2,3 UNION SELECT 4,5,6 ORDER BY 1,2) INTERSECT SELECT 4,5,6;
?column? | ?column? | ?column?
----------+----------+----------
4 | 5 | 6
(1 row)
Thanks
Richard
On Sun, Oct 29, 2023 at 11:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <aekorotkov@gmail.com> writes:
> I made some beautification of the patch by Andrei. I also removed the
> part which changes the target list for estimate_num_groups(). Any
> objections to pushing this?
It seems moderately likely that this will break as much as it fixes.
I've not studied the original patch enough to understand why you need
to be playing strange games with tree mutation rules, but I suspect
that this is band-aiding over some rather fundamentally bad code.
I also have some concerns about this patch. It requires that
root->parse remains unchanged during the whole subquery_planner() in
order to work, which is an implicit constraint we did not have before.
Thanks
Richard
root->parse remains unchanged during the whole subquery_planner() in
order to work, which is an implicit constraint we did not have before.
Thanks
Richard
On 29/10/2023 22:56, Tom Lane wrote: > Alexander Korotkov <aekorotkov@gmail.com> writes: >> I made some beautification of the patch by Andrei. I also removed the >> part which changes the target list for estimate_num_groups(). Any >> objections to pushing this? > > It seems moderately likely that this will break as much as it fixes. > > I've not studied the original patch enough to understand why you need > to be playing strange games with tree mutation rules, but I suspect > that this is band-aiding over some rather fundamentally bad code. Thank you for the feedback. I need to confess that it is not clear why keeping two links on the same query in a consistent state is a bad idea. If it really is, it would be better for future development to explain the reason as a comment in the code (because the current problematic procedure was introduced only in [1]). The doubt I had had about SJE was the cleaning procedure. We are afraid of hidden corner cases when deleting relid can be placed elsewhere, or a new feature could add more places for this relid. But as an opposite reason, we have a better understanding of the planner, a kind of self-check of hidden potential bugs. And for me personally, the latter is a strong reason. Also, as I see, in the e9a20e4 you wrote the code keeping in mind the same idea. [1] Make Vars be outer-join-aware -- regards, Andrei Lepikhov Postgres Professional
On 30/10/2023 09:24, Richard Guo wrote: > > On Sun, Oct 29, 2023 at 11:56 PM Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > > Alexander Korotkov <aekorotkov@gmail.com > <mailto:aekorotkov@gmail.com>> writes: > > I made some beautification of the patch by Andrei. I also > removed the > > part which changes the target list for estimate_num_groups(). Any > > objections to pushing this? > > It seems moderately likely that this will break as much as it fixes. > > I've not studied the original patch enough to understand why you need > to be playing strange games with tree mutation rules, but I suspect > that this is band-aiding over some rather fundamentally bad code. > > > I also have some concerns about this patch. It requires that > root->parse remains unchanged during the whole subquery_planner() in > order to work, which is an implicit constraint we did not have before. It is not about unchanged; it is about referencing the same query at the parent and child query blocks. Am I missing something? -- regards, Andrei Lepikhov Postgres Professional
On Mon, Oct 30, 2023 at 10:47 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 30/10/2023 09:24, Richard Guo wrote:
> I also have some concerns about this patch. It requires that
> root->parse remains unchanged during the whole subquery_planner() in
> order to work, which is an implicit constraint we did not have before.
It is not about unchanged; it is about referencing the same query at the
parent and child query blocks. Am I missing something?
Yeah, that's what I meant. We need to ensure that root->parse
references the same Query structure during the whole subquery_planner()
for this patch to work, which seems hacky, and error-prone for future
development. If we really want to do so, at least we need to emphasize
this point in the comment of subquery_planner().
Thanks
Richard
references the same Query structure during the whole subquery_planner()
for this patch to work, which seems hacky, and error-prone for future
development. If we really want to do so, at least we need to emphasize
this point in the comment of subquery_planner().
Thanks
Richard
On 30/10/2023 13:24, Richard Guo wrote: > > On Mon, Oct 30, 2023 at 10:47 AM Andrei Lepikhov > <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote: > > On 30/10/2023 09:24, Richard Guo wrote: > > I also have some concerns about this patch. It requires that > > root->parse remains unchanged during the whole subquery_planner() in > > order to work, which is an implicit constraint we did not have > before. > > It is not about unchanged; it is about referencing the same query at > the > parent and child query blocks. Am I missing something? > > > Yeah, that's what I meant. We need to ensure that root->parse > references the same Query structure during the whole subquery_planner() > for this patch to work, which seems hacky, and error-prone for future > development. If we really want to do so, at least we need to emphasize > this point in the comment of subquery_planner(). Okay, let's go another way and try to imagine what additional opportunities it will give us in the future? I mean, save unchanged a rte->subquery tree and, simultaneously, have some transformed version in the subroot->parse field. I can imagine kind of a subquery replanning procedure in the case we realized during higher-level query planning that the underlying subquery is not optimal. Such a picture makes sense, and I can agree with your words. From this point of view, all links to the subquery must reference the subroot structures, and the patch you have proposed is the best solution. And the SJE feature works correctly. If we don't imagine the picture I have drawn above, it is reasonable to save memory and not alter the parse tree while removing unneeded joins. I think, in that case, we can use the walker procedure instead of a mutator. -- regards, Andrei Lepikhov Postgres Professional
On Tue, Oct 31, 2023 at 1:05 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 30/10/2023 13:24, Richard Guo wrote:
> Yeah, that's what I meant. We need to ensure that root->parse
> references the same Query structure during the whole subquery_planner()
> for this patch to work, which seems hacky, and error-prone for future
> development. If we really want to do so, at least we need to emphasize
> this point in the comment of subquery_planner().
Okay, let's go another way and try to imagine what additional
opportunities it will give us in the future? I mean, save unchanged a
rte->subquery tree and, simultaneously, have some transformed version in
the subroot->parse field.
I can imagine kind of a subquery replanning procedure in the case we
realized during higher-level query planning that the underlying subquery
is not optimal.
Such a picture makes sense, and I can agree with your words. From this
point of view, all links to the subquery must reference the subroot
structures, and the patch you have proposed is the best solution. And
the SJE feature works correctly.
If we don't imagine the picture I have drawn above, it is reasonable to
save memory and not alter the parse tree while removing unneeded joins.
I think, in that case, we can use the walker procedure instead of a mutator.
Well, I think what happens here is that the new SJE code alters the
subquery (by removing ref_2) and that change is not propagated into the
parent level. So in the parent level, when we look at the subquery's
original targetlist (which remains unchanged) against 'subroot' (which
has been changed accordingly), we'd have problem.
AFAICS there are two solutions being discussed:
1) We propagate the change to the subquery into the parent level by
ensuring that root->parse references the same Query structure during the
whole subquery_planner().
2) In the parent level, we look at the changed subquery instead of the
original rte->subquery. IOW, we look at subroot->parse->targetList
instead of subquery->targetList.
Personally I prefer the second solution because it seems more natural to
look at 'subroot->parse' together with 'subroot' in estimate_num_groups
and it does not introduce new constraint to subquery_planner().
Thanks
Richard
subquery (by removing ref_2) and that change is not propagated into the
parent level. So in the parent level, when we look at the subquery's
original targetlist (which remains unchanged) against 'subroot' (which
has been changed accordingly), we'd have problem.
AFAICS there are two solutions being discussed:
1) We propagate the change to the subquery into the parent level by
ensuring that root->parse references the same Query structure during the
whole subquery_planner().
2) In the parent level, we look at the changed subquery instead of the
original rte->subquery. IOW, we look at subroot->parse->targetList
instead of subquery->targetList.
Personally I prefer the second solution because it seems more natural to
look at 'subroot->parse' together with 'subroot' in estimate_num_groups
and it does not introduce new constraint to subquery_planner().
Thanks
Richard
On Tue, Oct 31, 2023 at 3:37 PM Richard Guo <guofenglinux@gmail.com> wrote:
Well, I think what happens here is that the new SJE code alters the
subquery (by removing ref_2) and that change is not propagated into the
parent level. So in the parent level, when we look at the subquery's
original targetlist (which remains unchanged) against 'subroot' (which
has been changed accordingly), we'd have problem.
AFAICS there are two solutions being discussed:
1) We propagate the change to the subquery into the parent level by
ensuring that root->parse references the same Query structure during the
whole subquery_planner().
2) In the parent level, we look at the changed subquery instead of the
original rte->subquery. IOW, we look at subroot->parse->targetList
instead of subquery->targetList.
Personally I prefer the second solution because it seems more natural to
look at 'subroot->parse' together with 'subroot' in estimate_num_groups
and it does not introduce new constraint to subquery_planner().
FWIW, here is a simplified query that can reproduce this error.
explain (costs off)
select ref_1.c17 from t2 ref_1 left join
t2 ref_2 on ref_1.vkey = ref_2.vkey
where ref_2.vkey is not null
except all
select c17 from t2 ref_3;
ERROR: no relation entry for relid 1
Thanks
Richard
explain (costs off)
select ref_1.c17 from t2 ref_1 left join
t2 ref_2 on ref_1.vkey = ref_2.vkey
where ref_2.vkey is not null
except all
select c17 from t2 ref_3;
ERROR: no relation entry for relid 1
Thanks
Richard
On Mon, Oct 30, 2023 at 4:24 AM Richard Guo <guofenglinux@gmail.com> wrote: > > On Sun, Oct 29, 2023 at 11:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> Alexander Korotkov <aekorotkov@gmail.com> writes: >> > I made some beautification of the patch by Andrei. I also removed the >> > part which changes the target list for estimate_num_groups(). Any >> > objections to pushing this? >> >> It seems moderately likely that this will break as much as it fixes. >> >> I've not studied the original patch enough to understand why you need >> to be playing strange games with tree mutation rules, but I suspect >> that this is band-aiding over some rather fundamentally bad code. > > > I also have some concerns about this patch. It requires that > root->parse remains unchanged during the whole subquery_planner() in > order to work, which is an implicit constraint we did not have before. I wouldn't say this is exactly what is required. Actually, the patch requires root->parse and corresponding parent_root->simple_rte_array[]->subquery to be the same. So, it's still possible to change both of them, but just don't make them distinct copies. ------ Regards, Alexander Korotkov
On Tue, Oct 31, 2023 at 9:37 AM Richard Guo <guofenglinux@gmail.com> wrote: > On Tue, Oct 31, 2023 at 1:05 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: >> >> On 30/10/2023 13:24, Richard Guo wrote: >> > Yeah, that's what I meant. We need to ensure that root->parse >> > references the same Query structure during the whole subquery_planner() >> > for this patch to work, which seems hacky, and error-prone for future >> > development. If we really want to do so, at least we need to emphasize >> > this point in the comment of subquery_planner(). >> >> Okay, let's go another way and try to imagine what additional >> opportunities it will give us in the future? I mean, save unchanged a >> rte->subquery tree and, simultaneously, have some transformed version in >> the subroot->parse field. >> I can imagine kind of a subquery replanning procedure in the case we >> realized during higher-level query planning that the underlying subquery >> is not optimal. >> Such a picture makes sense, and I can agree with your words. From this >> point of view, all links to the subquery must reference the subroot >> structures, and the patch you have proposed is the best solution. And >> the SJE feature works correctly. >> If we don't imagine the picture I have drawn above, it is reasonable to >> save memory and not alter the parse tree while removing unneeded joins. >> I think, in that case, we can use the walker procedure instead of a mutator. > > > Well, I think what happens here is that the new SJE code alters the > subquery (by removing ref_2) and that change is not propagated into the > parent level. So in the parent level, when we look at the subquery's > original targetlist (which remains unchanged) against 'subroot' (which > has been changed accordingly), we'd have problem. > > AFAICS there are two solutions being discussed: > > 1) We propagate the change to the subquery into the parent level by > ensuring that root->parse references the same Query structure during the > whole subquery_planner(). Even if we don't pick this option to resolve the current problem, I would say I'm not particularly happy that root->parse and corresponding parent_root->simple_rte_array[]->subquery are distinct copies without a strong reason for that. > 2) In the parent level, we look at the changed subquery instead of the > original rte->subquery. IOW, we look at subroot->parse->targetList > instead of subquery->targetList. > > Personally I prefer the second solution because it seems more natural to > look at 'subroot->parse' together with 'subroot' in estimate_num_groups > and it does not introduce new constraint to subquery_planner(). 3) We may decide to introduce "alias" relids. Instead of deleting the relid completely, we may decide to make it an alias of another relid. See the attached patch to illustrate this approach. I guess it could allow us to get rid of a significant part of relid replacement login in d3d55ce571. I'm not sure about potential problems with this approach though. ------ Regards, Alexander Korotkov
Вложения
On 31/10/2023 15:46, Alexander Korotkov wrote: > On Tue, Oct 31, 2023 at 9:37 AM Richard Guo <guofenglinux@gmail.com> wrote: >> On Tue, Oct 31, 2023 at 1:05 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: >>> >>> On 30/10/2023 13:24, Richard Guo wrote: >>>> Yeah, that's what I meant. We need to ensure that root->parse >>>> references the same Query structure during the whole subquery_planner() >>>> for this patch to work, which seems hacky, and error-prone for future >>>> development. If we really want to do so, at least we need to emphasize >>>> this point in the comment of subquery_planner(). >>> >>> Okay, let's go another way and try to imagine what additional >>> opportunities it will give us in the future? I mean, save unchanged a >>> rte->subquery tree and, simultaneously, have some transformed version in >>> the subroot->parse field. >>> I can imagine kind of a subquery replanning procedure in the case we >>> realized during higher-level query planning that the underlying subquery >>> is not optimal. >>> Such a picture makes sense, and I can agree with your words. From this >>> point of view, all links to the subquery must reference the subroot >>> structures, and the patch you have proposed is the best solution. And >>> the SJE feature works correctly. >>> If we don't imagine the picture I have drawn above, it is reasonable to >>> save memory and not alter the parse tree while removing unneeded joins. >>> I think, in that case, we can use the walker procedure instead of a mutator. >> >> >> Well, I think what happens here is that the new SJE code alters the >> subquery (by removing ref_2) and that change is not propagated into the >> parent level. So in the parent level, when we look at the subquery's >> original targetlist (which remains unchanged) against 'subroot' (which >> has been changed accordingly), we'd have problem. >> >> AFAICS there are two solutions being discussed: >> >> 1) We propagate the change to the subquery into the parent level by >> ensuring that root->parse references the same Query structure during the >> whole subquery_planner(). > > Even if we don't pick this option to resolve the current problem, I > would say I'm not particularly happy that root->parse and > corresponding parent_root->simple_rte_array[]->subquery are distinct > copies without a strong reason for that. I don't happy with this approach too. > >> 2) In the parent level, we look at the changed subquery instead of the >> original rte->subquery. IOW, we look at subroot->parse->targetList >> instead of subquery->targetList. >> >> Personally I prefer the second solution because it seems more natural to >> look at 'subroot->parse' together with 'subroot' in estimate_num_groups >> and it does not introduce new constraint to subquery_planner(). > > 3) We may decide to introduce "alias" relids. Instead of deleting the > relid completely, we may decide to make it an alias of another relid. > See the attached patch to illustrate this approach. I guess it could > allow us to get rid of a significant part of relid replacement login > in d3d55ce571. I'm not sure about potential problems with this > approach though. By and large, the idea of an alias looks productive. It is really cheaper and simpler to make an alias. But we still need to replace relids in clauses. This approach requires a lot of additional work, as I see. But right now, we should agree on some more straightforward solution. In Richard's approach No.2, I see some rational grain: living with such a quick fix for some time, we can find additional misleading references to the 'rte->subquery' in the upper query. Can we postpone the subquery trees sync part of the patch and commit Richard's fix? -- regards, Andrei Lepikhov Postgres Professional
On Sat, Oct 28, 2023 at 12:12 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > On Fri, Oct 27, 2023 at 5:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Richard Guo <guofenglinux@gmail.com> writes: > > > Alternatively, can we look at subroot->parse->targetList instead of > > > subquery->targetList where we call estimate_num_groups on the output of > > > the subquery? > > > > Please read the comment just above that. > > Thank you, Tom, This is clear. Hmm... I just found that I wasn't attentive enough. The proposed change is to use subroot->parse->targetList, not subroot->processed_tlist. I don't know what could be the problem here. ------ Regards, Alexander Korotkov
On Wed, Nov 1, 2023 at 12:29 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Sat, Oct 28, 2023 at 12:12 AM Alexander Korotkov > <aekorotkov@gmail.com> wrote: > > > > On Fri, Oct 27, 2023 at 5:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Richard Guo <guofenglinux@gmail.com> writes: > > > > Alternatively, can we look at subroot->parse->targetList instead of > > > > subquery->targetList where we call estimate_num_groups on the output of > > > > the subquery? > > > > > > Please read the comment just above that. > > > > Thank you, Tom, This is clear. > > > Hmm... I just found that I wasn't attentive enough. The proposed > change is to use subroot->parse->targetList, not > subroot->processed_tlist. I don't know what could be the problem > here. My proposal is to use the attached patch as a hotfix for the bug considered. And for future I propose to consider two questions (each should probably go to its own thread): 1) Getting rid of having two distinct copies of parse tree (have one copy instead). 2) Introduce relation aliases to simplify self-join removal. Any thoughts? ------ Regards, Alexander Korotkov
Вложения
On 2/11/2023 05:29, Alexander Korotkov wrote: > On Wed, Nov 1, 2023 at 12:29 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: >> On Sat, Oct 28, 2023 at 12:12 AM Alexander Korotkov >> <aekorotkov@gmail.com> wrote: >>> >>> On Fri, Oct 27, 2023 at 5:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Richard Guo <guofenglinux@gmail.com> writes: >>>>> Alternatively, can we look at subroot->parse->targetList instead of >>>>> subquery->targetList where we call estimate_num_groups on the output of >>>>> the subquery? >>>> >>>> Please read the comment just above that. >>> >>> Thank you, Tom, This is clear. >> >> >> Hmm... I just found that I wasn't attentive enough. The proposed >> change is to use subroot->parse->targetList, not >> subroot->processed_tlist. I don't know what could be the problem >> here. > > My proposal is to use the attached patch as a hotfix for the bug considered. Looks good, no objections. > > And for future I propose to consider two questions (each should > probably go to its own thread): > 1) Getting rid of having two distinct copies of parse tree (have one > copy instead). +1. It can be done somewhere near the beta release. Or, in the next release, with the sake of finding other issues (like fixing with the patch proposed), if they have a place in the planner's code. > 2) Introduce relation aliases to simplify self-join removal. In my opinion, it should be another type of RelOptInfo, not a simple reference to the entry that has been kept. And we need to invent a kind of relid translation procedure. Would it be worth it? I'm not sure, but it can make implementation of optimizations of that type simpler. -- regards, Andrei Lepikhov Postgres Professional
On Fri, Oct 27, 2023 at 1:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Also, while nosing around in this, I tried to pprint(root) at the
point of the error, and got
2023-10-26 12:48:37.852 EDT [1186007] WARNING: could not dump unrecognized node type: 37413808
I came across a similar warning when I tried to pprint(innerrel):
WARNING: could not dump unrecognized node type: 0
... even though we've made UniqueRelInfo be a Node type in commit
2b26a69455.
This happens because we use palloc to allocate UniqueRelInfo node in
innerrel_is_unique_ext(), which I think should be replaced by makeNode.
Attached is a trivial patch to do so.
Thanks
Richard
WARNING: could not dump unrecognized node type: 0
... even though we've made UniqueRelInfo be a Node type in commit
2b26a69455.
This happens because we use palloc to allocate UniqueRelInfo node in
innerrel_is_unique_ext(), which I think should be replaced by makeNode.
Attached is a trivial patch to do so.
Thanks
Richard
Вложения
On Mon, Nov 6, 2023 at 9:27 AM Richard Guo <guofenglinux@gmail.com> wrote: > On Fri, Oct 27, 2023 at 1:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> Also, while nosing around in this, I tried to pprint(root) at the >> point of the error, and got >> >> 2023-10-26 12:48:37.852 EDT [1186007] WARNING: could not dump unrecognized node type: 37413808 > > > I came across a similar warning when I tried to pprint(innerrel): > > WARNING: could not dump unrecognized node type: 0 > > ... even though we've made UniqueRelInfo be a Node type in commit > 2b26a69455. > > This happens because we use palloc to allocate UniqueRelInfo node in > innerrel_is_unique_ext(), which I think should be replaced by makeNode. > Attached is a trivial patch to do so. Pushed, thank you for catching this. ------ Regards, Alexander Korotkov