Обсуждение: Properly pathify the union planner
The upper planner was pathified many years ago now. That was a large chunk of work and because of that, the union planner was not properly pathified in that effort. A small note was left in recurse_set_operations() to mention about future work. You can see this lack of pathification in make_union_unique() and choose_hashed_setop(). There are heuristics in there to decide the method to use instead of creating paths and letting add_path() decide what's faster. I've been working on improving this for the past few weeks and I'm not quite as far along as I'd hoped, but what I have is perhaps worthy of sharing. For now, I've only improved UNIONs. A UNION plan can now look like: # explain (costs off) select * from a union select * from a; QUERY PLAN --------------------------------------------------- Unique -> Merge Append Sort Key: a.a -> Index Only Scan using a_pkey on a -> Index Only Scan using a_pkey on a a_1 Previously we'd have only considered Append -> Hash Aggregate or via Append -> Sort -> Unique To make this work, the query_planner() needs to know about setops, so I've passed those down via the standard_qp_extra struct so that we can choose pathkeys for the setops. One part that still needs work is the EquivalanceClass building. Because we only build the final targetlist for the Append after planning all the append child queries, I ended up having to populate the EquivalanceClasses backwards, i.e children first. add_eq_member() determines if you're passing a child member by checking if parent != NULL. Since I don't have a parent EquivalenceMember to pass, em_is_child gets set wrongly, and that causes problems because ec_has_const can get set to true when it shouldn't. This is a problem as it can make a PathKey redundant when it isn't. I wonder if I'll need to change the signature of add_eq_member() and add an "is_child" bool to force the EM to be a child em... Needs more thought... I've not worked on the creation of Incremental Sort paths yet, or done any path plumbing work to have UNION consider Gather Merge -> Unique on already sorted paths. I think to make similar improvements to EXCEPT and INTERSECT we'd need a node executor node. Perhaps nodeMergeAppendSetops.c which can be configured to do EXCEPT or INTERSECT. It could also perhaps handle UNION too then we can use that instead of a Merge Append -> Unique. That might save doing some slot copying and improve performance further. I'm not planning on doing that for the first stage. I only intend to improve UNION for that and we have all the executor nodes to make that work already. Anyway, I've attached my WIP patch for this. David
Вложения
On Thu, 2 Nov 2023 at 12:42, David Rowley <dgrowleyml@gmail.com> wrote: > One part that still needs work is the EquivalanceClass building. > Because we only build the final targetlist for the Append after > planning all the append child queries, I ended up having to populate > the EquivalanceClasses backwards, i.e children first. add_eq_member() > determines if you're passing a child member by checking if parent != > NULL. Since I don't have a parent EquivalenceMember to pass, > em_is_child gets set wrongly, and that causes problems because > ec_has_const can get set to true when it shouldn't. This is a problem > as it can make a PathKey redundant when it isn't. I wonder if I'll > need to change the signature of add_eq_member() and add an "is_child" > bool to force the EM to be a child em... Needs more thought... I've spent more time working on this and I ended up solving the above problem by delaying the subquery path creation on the union children until after we've built the top-level targetlist. This allows the parent eclasses to be correctly added before adding members for the union children. (See build_setop_child_paths() in the patch) There's been quite a bit of progress in other areas too. Incremental sorts now work: # create table t1(a int primary key, b int not null); # create table t2(a int primary key, b int not null); # insert into t1 select x,x from generate_Series(1,1000000)x; # insert into t2 select x,x from generate_Series(1,1000000)x; # vacuum analyze t1,t2; # explain (costs off) select * from t1 union select * from t2; QUERY PLAN -------------------------------------------------- Unique -> Merge Append Sort Key: t1.a, t1.b -> Incremental Sort Sort Key: t1.a, t1.b Presorted Key: t1.a -> Index Scan using t1_pkey on t1 -> Incremental Sort Sort Key: t2.a, t2.b Presorted Key: t2.a -> Index Scan using t2_pkey on t2 (11 rows) However, I've not yet made the MergeAppend UNIONs work when the datatypes don't match on either side of the UNION. For now, the reason this does not work is due to convert_subquery_pathkeys() being unable to find the pathkey targets in the targetlist. The actual targets can't be found due to the typecast. I wondered if this could be fixed by adding an additional projection path to the subquery when the output columns don't match the setop->colTypes, but I'm a bit put off by the comment in transformSetOperationTree: > * For all non-UNKNOWN-type cases, we verify coercibility but we > * don't modify the child's expression, for fear of changing the > * child query's semantics. I assume that's worried about the semantics of things like WHERE clauses, so maybe the projection path in the subquery would be ok. I need to spend more time on that. Another problem I hit was add_path() pfreeing a Path that I needed. This was happening due to how I'm building the final paths in the subquery when setop_pathkeys are set. Because I want to always include the cheapest_input_path to allow that path to be used in hash-based UNIONs, I also want to provide sorted paths so that MergeAppend has something to work with. I found cases where I'd add_path() the cheapest_input_path to the final rel then also attempt to sort that path. Unfortunately, add_path() found the unsorted path and the sorted path fuzzily the same cost and opted to keep the sorted one due to it having better pathkeys. add_path() then pfree'd the cheapest_input_path which meant the Sort's subpath was gone which obviously caused issues in createplan.c. For now, as a temporary fix, I've just #ifdef'd out the code in add_path() that's pfreeing the old path. I have drafted a patch that refcounts Paths, but I'm unsure if that's the correct solution as I'm only maintaining the refcounts in add_path() and add_partial_path(). I think a true correct solution would bump the refcount when a path is used as some other path's subpath. That would mean having to recursively pfree paths up until we find one with a refcount>0. Seems a bit expensive for add_path() to do. I've attached the updated patch. This one is probably ready for someone to test out. There will be more work to do, however. David
Вложения
On Fri, Nov 24, 2023 at 3:59 AM David Rowley <dgrowleyml@gmail.com> wrote: > > Another problem I hit was add_path() pfreeing a Path that I needed. > This was happening due to how I'm building the final paths in the > subquery when setop_pathkeys are set. Because I want to always > include the cheapest_input_path to allow that path to be used in > hash-based UNIONs, I also want to provide sorted paths so that > MergeAppend has something to work with. I found cases where I'd > add_path() the cheapest_input_path to the final rel then also attempt > to sort that path. Unfortunately, add_path() found the unsorted path > and the sorted path fuzzily the same cost and opted to keep the sorted > one due to it having better pathkeys. add_path() then pfree'd the > cheapest_input_path which meant the Sort's subpath was gone which > obviously caused issues in createplan.c. > > For now, as a temporary fix, I've just #ifdef'd out the code in > add_path() that's pfreeing the old path. I have drafted a patch that > refcounts Paths, but I'm unsure if that's the correct solution as I'm > only maintaining the refcounts in add_path() and add_partial_path(). I > think a true correct solution would bump the refcount when a path is > used as some other path's subpath. That would mean having to > recursively pfree paths up until we find one with a refcount>0. Seems > a bit expensive for add_path() to do. Please find my proposal to refcount paths at [1]. I did that to reduce the memory consumed by partitionwise joins. I remember another thread where freeing a path that was referenced by upper sort path created minor debugging problem. [2]. I paused my work on my proposal since there didn't seem enough justification. But it looks like the requirement is coming up repeatedly. I am willing to resume my work if it's needed. The email lists next TODOs. As to making the add_path() expensive, I didn't find any noticeable impact on planning time. [1] https://www.postgresql.org/message-id/CAExHW5tUcVsBkq9qT%3DL5vYz4e-cwQNw%3DKAGJrtSyzOp3F%3DXacA%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAM2%2B6%3DUC1mcVtM0Y_LEMBEGHTM58HEkqHPn7vau_V_YfuZjEGg%40mail.gmail.com -- Best Wishes, Ashutosh Bapat
On Fri, 24 Nov 2023 at 18:43, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Fri, Nov 24, 2023 at 3:59 AM David Rowley <dgrowleyml@gmail.com> wrote: > > For now, as a temporary fix, I've just #ifdef'd out the code in > > add_path() that's pfreeing the old path. I have drafted a patch that > > refcounts Paths, but I'm unsure if that's the correct solution as I'm > > only maintaining the refcounts in add_path() and add_partial_path(). I > > think a true correct solution would bump the refcount when a path is > > used as some other path's subpath. That would mean having to > > recursively pfree paths up until we find one with a refcount>0. Seems > > a bit expensive for add_path() to do. > > Please find my proposal to refcount paths at [1]. I did that to reduce > the memory consumed by partitionwise joins. I remember another thread > where freeing a path that was referenced by upper sort path created > minor debugging problem. [2]. I paused my work on my proposal since > there didn't seem enough justification. But it looks like the > requirement is coming up repeatedly. I am willing to resume my work if > it's needed. The email lists next TODOs. As to making the add_path() > expensive, I didn't find any noticeable impact on planning time. I missed that thread. Thanks for pointing it out. I skim read your patch and I see it does seem to have the workings for tracking refcounts when the pack is a subpath of another path. I imagine that would allow the undocumented hack that is "if (!IsA(old_path, IndexPath))" in add_path() to disappear. I wondered if the problem of pfreeing paths that are in the pathlist of another relation could be fixed in another way. If we have an AdoptedPath path type that just inherits the costs from its single subpath and we wrap a Path up in one of these before we do add_path() a Path which is not parented by the relation we're adding the path to, since we don't recursively pfree() Paths in add_path(), we'd only ever pfree the AdoptedPath rather than pfreeing a Path that directly exists in another relations pathlist. Another simpler option would be just don't pfree the Path if the Path parent is not the add_path rel. David > [1] https://www.postgresql.org/message-id/CAExHW5tUcVsBkq9qT%3DL5vYz4e-cwQNw%3DKAGJrtSyzOp3F%3DXacA%40mail.gmail.com > [2] https://www.postgresql.org/message-id/CAM2%2B6%3DUC1mcVtM0Y_LEMBEGHTM58HEkqHPn7vau_V_YfuZjEGg%40mail.gmail.com
On Fri, Nov 24, 2023 at 6:29 AM David Rowley <dgrowleyml@gmail.com> wrote:
I've attached the updated patch. This one is probably ready for
someone to test out. There will be more work to do, however.
I just started reviewing this patch and haven't looked through all the
details yet. Here are some feedbacks that came to my mind. Post them
first so that I don’t forget them after the holidays.
* I think we should update truncate_useless_pathkeys() to account for
the ordering requested by the query's set operation; otherwise we may
not get a subquery's path with the expected pathkeys. For instance,
create table t (a int, b int);
create index on t (a, b);
set enable_hashagg to off;
-- on v1 patch
explain (costs off)
(select * from t order by a) UNION (select * from t order by a);
QUERY PLAN
------------------------------------------------------------
Unique
-> Merge Append
Sort Key: t.a, t.b
-> Incremental Sort
Sort Key: t.a, t.b
Presorted Key: t.a
-> Index Only Scan using t_a_b_idx on t
-> Incremental Sort
Sort Key: t_1.a, t_1.b
Presorted Key: t_1.a
-> Index Only Scan using t_a_b_idx on t t_1
(11 rows)
-- after accounting for setop_pathkeys in truncate_useless_pathkeys()
explain (costs off)
(select * from t order by a) UNION (select * from t order by a);
QUERY PLAN
------------------------------------------------------
Unique
-> Merge Append
Sort Key: t.a, t.b
-> Index Only Scan using t_a_b_idx on t
-> Index Only Scan using t_a_b_idx on t t_1
(5 rows)
details yet. Here are some feedbacks that came to my mind. Post them
first so that I don’t forget them after the holidays.
* I think we should update truncate_useless_pathkeys() to account for
the ordering requested by the query's set operation; otherwise we may
not get a subquery's path with the expected pathkeys. For instance,
create table t (a int, b int);
create index on t (a, b);
set enable_hashagg to off;
-- on v1 patch
explain (costs off)
(select * from t order by a) UNION (select * from t order by a);
QUERY PLAN
------------------------------------------------------------
Unique
-> Merge Append
Sort Key: t.a, t.b
-> Incremental Sort
Sort Key: t.a, t.b
Presorted Key: t.a
-> Index Only Scan using t_a_b_idx on t
-> Incremental Sort
Sort Key: t_1.a, t_1.b
Presorted Key: t_1.a
-> Index Only Scan using t_a_b_idx on t t_1
(11 rows)
-- after accounting for setop_pathkeys in truncate_useless_pathkeys()
explain (costs off)
(select * from t order by a) UNION (select * from t order by a);
QUERY PLAN
------------------------------------------------------
Unique
-> Merge Append
Sort Key: t.a, t.b
-> Index Only Scan using t_a_b_idx on t
-> Index Only Scan using t_a_b_idx on t t_1
(5 rows)
* I understand that we need to sort (full or incremental) the paths of
the subqueries to meet the ordering required for setop_pathkeys, so that
MergeAppend has something to work with. Currently in the v1 patch this
sorting is performed during the planning phase of the subqueries (in
grouping_planner).
And we want to add the subquery's cheapest_total_path as-is to allow
that path to be used in hash-based UNIONs, and we also want to add a
sorted path on top of cheapest_total_path. And then we may encounter
the issue you mentioned earlier regarding add_path() potentially freeing
the cheapest_total_path, leaving the Sort's subpath gone.
I'm thinking that maybe it'd be better to move the work of sorting the
subquery's paths to the outer query level, specifically within the
build_setop_child_paths() function, just before we stick SubqueryScanPath
on top of the subquery's paths. I think this is better because:
1. This minimizes the impact on subquery planning and reduces the
footprint within the grouping_planner() function as much as possible.
2. This can help avoid the aforementioned add_path() issue because the
two involved paths will be structured as:
cheapest_path -> subqueryscan
and
cheapest_path -> sort -> subqueryscan
If the two paths cost fuzzily the same and add_path() decides to keep
the second one due to it having better pathkeys and pfree the first one,
it would not be a problem.
BTW, I haven't looked through the part involving partial paths, but I
think we can do the same to partial paths.
* I noticed that in generate_union_paths() we use a new function
build_setop_pathkeys() to build the 'union_pathkeys'. I wonder why we
don't simply utilize make_pathkeys_for_sortclauses() since we already
have the grouplist for the setop's output columns.
the subqueries to meet the ordering required for setop_pathkeys, so that
MergeAppend has something to work with. Currently in the v1 patch this
sorting is performed during the planning phase of the subqueries (in
grouping_planner).
And we want to add the subquery's cheapest_total_path as-is to allow
that path to be used in hash-based UNIONs, and we also want to add a
sorted path on top of cheapest_total_path. And then we may encounter
the issue you mentioned earlier regarding add_path() potentially freeing
the cheapest_total_path, leaving the Sort's subpath gone.
I'm thinking that maybe it'd be better to move the work of sorting the
subquery's paths to the outer query level, specifically within the
build_setop_child_paths() function, just before we stick SubqueryScanPath
on top of the subquery's paths. I think this is better because:
1. This minimizes the impact on subquery planning and reduces the
footprint within the grouping_planner() function as much as possible.
2. This can help avoid the aforementioned add_path() issue because the
two involved paths will be structured as:
cheapest_path -> subqueryscan
and
cheapest_path -> sort -> subqueryscan
If the two paths cost fuzzily the same and add_path() decides to keep
the second one due to it having better pathkeys and pfree the first one,
it would not be a problem.
BTW, I haven't looked through the part involving partial paths, but I
think we can do the same to partial paths.
* I noticed that in generate_union_paths() we use a new function
build_setop_pathkeys() to build the 'union_pathkeys'. I wonder why we
don't simply utilize make_pathkeys_for_sortclauses() since we already
have the grouplist for the setop's output columns.
To assist the discussion I've attached a diff file that includes all the
changes above.
Thanks
Richard
changes above.
Thanks
Richard
Вложения
Hi, > * I think we should update truncate_useless_pathkeys() to account for > the ordering requested by the query's set operation; Nice catch. > I'm thinking that maybe it'd be better to move the work of sorting the > subquery's paths to the outer query level, specifically within the > build_setop_child_paths() function, just before we stick SubqueryScanPath > on top of the subquery's paths. I think this is better because: > > 1. This minimizes the impact on subquery planning and reduces the > footprint within the grouping_planner() function as much as possible. > > 2. This can help avoid the aforementioned add_path() issue because the > two involved paths will be structured as: > > cheapest_path -> subqueryscan > and > cheapest_path -> sort -> subqueryscan > > If the two paths cost fuzzily the same and add_path() decides to keep > the second one due to it having better pathkeys and pfree the first one, > it would not be a problem. This is a smart idea, it works because you create a two different subqueryscan for the cheapest_input_path. FWIW, I found we didn't create_sort_path during building a merge join path, instead it just cost the sort and add it to the cost of mergejoin path only and note this path needs a presorted data. At last during the create_mergejoin_*plan*, it create the sort_plan really. As for the mergeappend case, could we use the similar strategy? with this way, we might simpliy the code to use MergeAppend node since the caller just need to say I want to try MergeAppend with the given pathkeys without really creating the sort by themselves. (Have a quick glance of initial_cost_mergejoin and create_mergejoin_plan, looks incremental sort doesn't work with mergejoin?) > > To assist the discussion I've attached a diff file that includes all the > changes above. + */ +static int +pathkeys_useful_for_setop(PlannerInfo *root, List *pathkeys) +{ + int n_common_pathkeys; + + if (root->setop_pathkeys == NIL) + return 0; /* no special setop ordering requested */ + + if (pathkeys == NIL) + return 0; /* unordered path */ + + (void) pathkeys_count_contained_in(root->setop_pathkeys, pathkeys, + &n_common_pathkeys); + + return n_common_pathkeys; +} The two if-clauses looks unnecessary, it should be handled by pathkeys_count_contained_in already. The same issue exists in pathkeys_useful_for_ordering as well. Attached patch fix it in master. -- Best Regards Andy Fan
Вложения
On Tue, 6 Feb 2024 at 22:05, Richard Guo <guofenglinux@gmail.com> wrote: > I'm thinking that maybe it'd be better to move the work of sorting the > subquery's paths to the outer query level, specifically within the > build_setop_child_paths() function, just before we stick SubqueryScanPath > on top of the subquery's paths. I think this is better because: > > 1. This minimizes the impact on subquery planning and reduces the > footprint within the grouping_planner() function as much as possible. > > 2. This can help avoid the aforementioned add_path() issue because the > two involved paths will be structured as: Yes, this is a good idea. I agree with both of your points. I've taken your suggested changes with minor fixups and expanded on it to do the partial paths too. I've also removed almost all of the changes to planner.c. I fixed a bug where I was overwriting the union child's TargetEntry.ressortgroupref without consideration that it might be set for some other purpose in the subquery. I wrote generate_setop_child_grouplist() to handle this which is almost like generate_setop_grouplist() except it calls assignSortGroupRef() to figure out the next free tleSortGroupRef, (or reuse the existing one if the TargetEntry already has one set). Earlier, I pushed a small comment change to pathnode.c in order to shrink this patch down a little. It was also a chance that could be made in isolation of this work. v2 attached. David
Вложения
On Wed, 7 Feb 2024 at 12:05, Andy Fan <zhihuifan1213@163.com> wrote: > +static int > +pathkeys_useful_for_setop(PlannerInfo *root, List *pathkeys) > +{ > + int n_common_pathkeys; > + > + if (root->setop_pathkeys == NIL) > + return 0; /* no special setop ordering requested */ > + > + if (pathkeys == NIL) > + return 0; /* unordered path */ > + > + (void) pathkeys_count_contained_in(root->setop_pathkeys, pathkeys, > + &n_common_pathkeys); > + > + return n_common_pathkeys; > +} > > The two if-clauses looks unnecessary, it should be handled by > pathkeys_count_contained_in already. The same issue exists in > pathkeys_useful_for_ordering as well. Attached patch fix it in master. I agree. I'd rather not have those redundant checks in pathkeys_useful_for_setop(), and I do want those functions to be as similar as possible. So I think adjusting it in master is a good idea. I've pushed your patch. David
David Rowley <dgrowleyml@gmail.com> writes: >> >> The two if-clauses looks unnecessary, it should be handled by >> pathkeys_count_contained_in already. The same issue exists in >> pathkeys_useful_for_ordering as well. Attached patch fix it in master. > > I agree. I'd rather not have those redundant checks in > pathkeys_useful_for_setop(), and I do want those functions to be as > similar as possible. So I think adjusting it in master is a good > idea. > > I've pushed your patch. > Thanks for the pushing! -- Best Regards Andy Fan
On Thu, 15 Feb 2024 at 17:30, David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 6 Feb 2024 at 22:05, Richard Guo <guofenglinux@gmail.com> wrote: > > I'm thinking that maybe it'd be better to move the work of sorting the > > subquery's paths to the outer query level, specifically within the > > build_setop_child_paths() function, just before we stick SubqueryScanPath > > on top of the subquery's paths. I think this is better because: > > > > 1. This minimizes the impact on subquery planning and reduces the > > footprint within the grouping_planner() function as much as possible. > > > > 2. This can help avoid the aforementioned add_path() issue because the > > two involved paths will be structured as: > > Yes, this is a good idea. I agree with both of your points. > v2 attached. If anyone else or if you want to take another look, let me know soon. Otherwise, I'll assume that's the reviews over and I can take another look again. If nobody speaks up before Monday next week (11th), New Zealand time, I'm going to be looking at this again from the point of view of committing it. Thanks David
On Thu, Mar 7, 2024 at 7:16 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 15 Feb 2024 at 17:30, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 6 Feb 2024 at 22:05, Richard Guo <guofenglinux@gmail.com> wrote:
> > I'm thinking that maybe it'd be better to move the work of sorting the
> > subquery's paths to the outer query level, specifically within the
> > build_setop_child_paths() function, just before we stick SubqueryScanPath
> > on top of the subquery's paths. I think this is better because:
> >
> > 1. This minimizes the impact on subquery planning and reduces the
> > footprint within the grouping_planner() function as much as possible.
> >
> > 2. This can help avoid the aforementioned add_path() issue because the
> > two involved paths will be structured as:
>
> Yes, this is a good idea. I agree with both of your points.
> v2 attached.
If anyone else or if you want to take another look, let me know soon.
Otherwise, I'll assume that's the reviews over and I can take another
look again.
Hi David,
I would like to have another look, but it might take several days.
Would that be too late?
Thanks
Richard
I would like to have another look, but it might take several days.
Would that be too late?
Thanks
Richard
On Fri, 8 Mar 2024 at 00:39, Richard Guo <guofenglinux@gmail.com> wrote: > I would like to have another look, but it might take several days. > Would that be too late? Please look. Several days is fine. I'd like to start looking on Monday or Tuesday next week. Thanks David
On Fri, Mar 8, 2024 at 11:31 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Fri, 8 Mar 2024 at 00:39, Richard Guo <guofenglinux@gmail.com> wrote:
> I would like to have another look, but it might take several days.
> Would that be too late?
Please look. Several days is fine. I'd like to start looking on Monday
or Tuesday next week.
I've had another look, and here are some comments that came to my mind.
* There are cases where the setop_pathkeys of a subquery does not match
the union_pathkeys generated in generate_union_paths() for sorting by
the whole target list. In such cases, even if we have explicitly sorted
the paths of the subquery to meet the ordering required for its
setop_pathkeys, we cannot find appropriate ordered paths for
MergeAppend. For instance,
explain (costs off)
(select a, b from t t1 where a = b) UNION (select a, b from t t2 where a = b);
QUERY PLAN
-----------------------------------------------------------
Unique
-> Sort
Sort Key: t1.a, t1.b
-> Append
-> Index Only Scan using t_a_b_idx on t t1
Filter: (a = b)
-> Index Only Scan using t_a_b_idx on t t2
Filter: (a = b)
(8 rows)
(Assume t_a_b_idx is a btree index on t(a, b))
In this query, the setop_pathkeys of the subqueries includes only one
PathKey because 'a' and 'b' are in the same EC inside the subqueries,
while the union_pathkeys of the whole query includes two PathKeys, one
for each target entry. After we convert the setop_pathkeys to outer
representation, we'd notice that it does not match union_pathkeys.
Consequently, we are unable to recognize that the index scan paths are
already appropriately sorted, leading us to miss the opportunity to
utilize MergeAppend.
Not sure if this case is common enough to be worth paying attention to.
* In build_setop_child_paths() we also create properly sorted partial
paths, which seems not necessary because we do not support parallel
merge append, right?
* Another is minor and relates to cosmetic matters. When we unique-ify
the result of a UNION, we take the number of distinct groups as equal to
the total input size. For the Append path and Gather path, we use
'dNumGroups', which is 'rows' of the Append path. For the MergeAppend
we use 'rows' of the MergeAppend path. I believe they are supposed to
be the same, but I think it'd be better to keep them consistent: either
use 'dNumGroups' for all the three kinds of paths, or use 'path->rows'
for each path.
Thanks
Richard
* There are cases where the setop_pathkeys of a subquery does not match
the union_pathkeys generated in generate_union_paths() for sorting by
the whole target list. In such cases, even if we have explicitly sorted
the paths of the subquery to meet the ordering required for its
setop_pathkeys, we cannot find appropriate ordered paths for
MergeAppend. For instance,
explain (costs off)
(select a, b from t t1 where a = b) UNION (select a, b from t t2 where a = b);
QUERY PLAN
-----------------------------------------------------------
Unique
-> Sort
Sort Key: t1.a, t1.b
-> Append
-> Index Only Scan using t_a_b_idx on t t1
Filter: (a = b)
-> Index Only Scan using t_a_b_idx on t t2
Filter: (a = b)
(8 rows)
(Assume t_a_b_idx is a btree index on t(a, b))
In this query, the setop_pathkeys of the subqueries includes only one
PathKey because 'a' and 'b' are in the same EC inside the subqueries,
while the union_pathkeys of the whole query includes two PathKeys, one
for each target entry. After we convert the setop_pathkeys to outer
representation, we'd notice that it does not match union_pathkeys.
Consequently, we are unable to recognize that the index scan paths are
already appropriately sorted, leading us to miss the opportunity to
utilize MergeAppend.
Not sure if this case is common enough to be worth paying attention to.
* In build_setop_child_paths() we also create properly sorted partial
paths, which seems not necessary because we do not support parallel
merge append, right?
* Another is minor and relates to cosmetic matters. When we unique-ify
the result of a UNION, we take the number of distinct groups as equal to
the total input size. For the Append path and Gather path, we use
'dNumGroups', which is 'rows' of the Append path. For the MergeAppend
we use 'rows' of the MergeAppend path. I believe they are supposed to
be the same, but I think it'd be better to keep them consistent: either
use 'dNumGroups' for all the three kinds of paths, or use 'path->rows'
for each path.
Thanks
Richard
On Mon, 11 Mar 2024 at 19:56, Richard Guo <guofenglinux@gmail.com> wrote: > * There are cases where the setop_pathkeys of a subquery does not match > the union_pathkeys generated in generate_union_paths() for sorting by > the whole target list. In such cases, even if we have explicitly sorted > the paths of the subquery to meet the ordering required for its > setop_pathkeys, we cannot find appropriate ordered paths for > MergeAppend. For instance, > > explain (costs off) > (select a, b from t t1 where a = b) UNION (select a, b from t t2 where a = b); > QUERY PLAN > ----------------------------------------------------------- > Unique > -> Sort > Sort Key: t1.a, t1.b > -> Append > -> Index Only Scan using t_a_b_idx on t t1 > Filter: (a = b) > -> Index Only Scan using t_a_b_idx on t t2 > Filter: (a = b) > (8 rows) > > (Assume t_a_b_idx is a btree index on t(a, b)) > > In this query, the setop_pathkeys of the subqueries includes only one > PathKey because 'a' and 'b' are in the same EC inside the subqueries, > while the union_pathkeys of the whole query includes two PathKeys, one > for each target entry. After we convert the setop_pathkeys to outer > representation, we'd notice that it does not match union_pathkeys. > Consequently, we are unable to recognize that the index scan paths are > already appropriately sorted, leading us to miss the opportunity to > utilize MergeAppend. > > Not sure if this case is common enough to be worth paying attention to. I've spent almost all day looking into this, which is just enough work to satisfy myself this *is* future work rather than v17 work. The reason I feel this is future work rather than work for this patch is that this is already a limitation of subqueries in general and it's not unique to union child queries. For example: create table ab(a int, b int, primary key(a,b)); insert into ab select x,y from generate_series(1,100)x,generate_series(1,100)y; vacuum analyze ab; explain select * from (select a,b from ab where a = 1 order by a,b limit 10) order by a,b; The current plan for this is: QUERY PLAN ------------------------------------------------- Sort Sort Key: ab.a, ab.b -> Limit -> Index Only Scan using ab_pkey on ab Index Cond: (a = 1) (5 rows) The additional sort isn't required but is added because the outer query requires the pathkeys {a,b} and the inner query only has the pathkey {b}. {a} is removed due to it being redundant because of the const member. The outer query does not know about the redundant pathkeys so think the subquery is only sorted by {b} therefore adds the sort on "a", "b". The attached 0001 patch (renamed as .txt so it's ignored by the CFbot) adjusts convert_subquery_pathkeys() to have it look a bit deeper and try harder to match the path to the outer query's query_pathkeys. After patching with that, the plan becomes: QUERY PLAN ------------------------------------------- Limit -> Index Only Scan using ab_pkey on ab Index Cond: (a = 1) (3 rows) The patch is still incomplete as the matching is quite complex for the case you mentioned with a=b. It's a bit late here to start making that work, but I think the key to make that work is to give subquery_matches_pathkeys() an extra parameter or 2 for where to start working on each list and recursively call itself where I've left the TODO comment in the function and on the recursive call, try the next query_pathkeys and the same sub_pathkey. If the recursive call function returns false, continue on trying to match the normal way. If it returns true, return true. There'd be a bit more work elsewhere to do to make this work for the general case. For example: explain (costs off) select * from (select a,b from ab where a = 1 offset 0) order by a,b; still produces the following plan with the patched version: QUERY PLAN ------------------------------------------- Sort Sort Key: ab.a, ab.b -> Index Only Scan using ab_pkey on ab Index Cond: (a = 1) (4 rows) the reason for this is that the subquery does not know the outer query would like the index path to have the pathkeys {a,b}. I've not looked at this issue in detail but I suspect we could do something somewhere like set_subquery_pathlist() to check if the outer query's query_pathkeys are all Vars from the subquery. I think we'd need to trawl through the eclasses to ensure that each query_pathkeys eclasses contains a member mentioning only a Var from the subquery and if so, get the subquery to set those pathkeys. Anyway, this is all at least PG18 work so I'll raise a thread about it around June. The patch is included in case you want to mess around with it. I'd be happy if you want to look into the subquery pathkey issue portion of the work. I won't be giving this much more focus until after the freeze. > * In build_setop_child_paths() we also create properly sorted partial > paths, which seems not necessary because we do not support parallel > merge append, right? Yeah. Thanks for noticing that. Removing all that saves quite a bit more code. > * Another is minor and relates to cosmetic matters. When we unique-ify > the result of a UNION, we take the number of distinct groups as equal to > the total input size. For the Append path and Gather path, we use > 'dNumGroups', which is 'rows' of the Append path. For the MergeAppend > we use 'rows' of the MergeAppend path. I believe they are supposed to > be the same, but I think it'd be better to keep them consistent: either > use 'dNumGroups' for all the three kinds of paths, or use 'path->rows' > for each path. Yeah, that should use dNumGroups. Well spotted. I've attached v3. David
Вложения
On Tue, 12 Mar 2024 at 23:21, David Rowley <dgrowleyml@gmail.com> wrote: > I've attached v3. I spent quite a bit more time looking at this. I discovered that the dNumGroups wasn't being set as it should have been for INTERSECT and EXCEPT queries. There was a plan change as a result of this. I've fixed this by adjusting where dNumGroups is set. It must be delayed until after the setop child paths have been created. Aside from this, the changes I made were mostly cosmetic. However, I did notice that I wasn't setting the union child RelOptInfo's ec_indexes in add_setop_child_rel_equivalences(). I also discovered that we're not doing that properly for the top-level RelOptInfo for the UNION query prior to this change. The reason is that due to the Var.varno==0 for the top-level UNION targetlist. The code in get_eclass_for_sort_expr() effectively misses this relation due to "while ((i = bms_next_member(newec->ec_relids, i)) > 0)". This happens to be good because there is no root->simple_rel_array[0] entry, so happens to prevent that code crashing. It seems ok that the ec_indexes are not set for the top-level set RelOptInfo as get_eclass_for_sort_expr() does not make use of ec_indexes while searching for an existing EquivalenceClass. Maybe we should fix this varno == 0 hack and adjust get_eclass_for_sort_expr() so that it makes use of the ec_indexes. It's possible to see this happen with a query such as: SELECT oid FROM pg_class UNION SELECT oid FROM pg_class ORDER BY oid; I didn't see that as a reason not to push this patch as this occurs both with and without this change, so I've now pushed this patch. Thank you and Andy for reviewing this. David
On Mon, Mar 25, 2024 at 9:44 AM David Rowley <dgrowleyml@gmail.com> wrote:
It seems ok that
the ec_indexes are not set for the top-level set RelOptInfo as
get_eclass_for_sort_expr() does not make use of ec_indexes while
searching for an existing EquivalenceClass. Maybe we should fix this
varno == 0 hack and adjust get_eclass_for_sort_expr() so that it makes
use of the ec_indexes.
It's possible to see this happen with a query such as:
SELECT oid FROM pg_class UNION SELECT oid FROM pg_class ORDER BY oid;
I see what you said. Yeah, there might be some optimization
possibilities in this area. And I agree that this should not be a
blocker in pushing this patch.
possibilities in this area. And I agree that this should not be a
blocker in pushing this patch.
I didn't see that as a reason not to push this patch as this occurs
both with and without this change, so I've now pushed this patch.
Great to see this patch has been pushed!
Thanks
Richard
Thanks
Richard
Hello David, 25.03.2024 04:43, David Rowley wrote: > I didn't see that as a reason not to push this patch as this occurs > both with and without this change, so I've now pushed this patch. Please look at a new assertion failure, that is triggered by the following query: SELECT count(*) FROM ( WITH q1(x) AS (SELECT 1) SELECT FROM q1 UNION SELECT FROM q1 ) qu; TRAP: failed Assert("lg != NULL"), File: "planner.c", Line: 7941, PID: 1133017 Best regards, Alexander
On Wed, 27 Mar 2024 at 06:00, Alexander Lakhin <exclusion@gmail.com> wrote: > SELECT count(*) FROM ( > WITH q1(x) AS (SELECT 1) > SELECT FROM q1 UNION SELECT FROM q1 > ) qu; > > TRAP: failed Assert("lg != NULL"), File: "planner.c", Line: 7941, PID: 1133017 Thanks for finding that. There's something weird going on with the UNION child subquery's setOperations field. As far as I understand, and from reading the existing comments, this should only be set for the top-level union. Because this field is set, it plans the CTE thinking it's a UNION child and breaks when it can't find a SortGroupClause for the CTE's target list item. I'll keep digging. As far as I see the setOperations field is only set in transformSetOperationStmt(). I'm guessing we must be doing a copyObject() somewhere and accidentally picking up the parent's setOperations. David
On Wed, Mar 27, 2024 at 6:23 AM David Rowley <dgrowleyml@gmail.com> wrote:
Because this field is set, it plans the CTE thinking it's a UNION
child and breaks when it can't find a SortGroupClause for the CTE's
target list item.
Right. The problem here is that we mistakenly think that the CTE query
is a subquery for the set operation and thus store the SetOperationStmt
in its qp_extra. Currently the code for the check is:
/*
* Check if we're a subquery for a set operation. If we are, store
* the SetOperationStmt in qp_extra.
*/
if (root->parent_root != NULL &&
root->parent_root->parse->setOperations != NULL &&
IsA(root->parent_root->parse->setOperations, SetOperationStmt))
qp_extra.setop =
(SetOperationStmt *) root->parent_root->parse->setOperations;
else
qp_extra.setop = NULL;
This check cannot tell if the subquery is for a set operation or a CTE,
because its parent might have setOperations set in both cases. Hmm, is
there any way to differentiate between the two?
Thanks
Richard
is a subquery for the set operation and thus store the SetOperationStmt
in its qp_extra. Currently the code for the check is:
/*
* Check if we're a subquery for a set operation. If we are, store
* the SetOperationStmt in qp_extra.
*/
if (root->parent_root != NULL &&
root->parent_root->parse->setOperations != NULL &&
IsA(root->parent_root->parse->setOperations, SetOperationStmt))
qp_extra.setop =
(SetOperationStmt *) root->parent_root->parse->setOperations;
else
qp_extra.setop = NULL;
This check cannot tell if the subquery is for a set operation or a CTE,
because its parent might have setOperations set in both cases. Hmm, is
there any way to differentiate between the two?
Thanks
Richard
On Wed, 27 Mar 2024 at 16:15, Richard Guo <guofenglinux@gmail.com> wrote: > if (root->parent_root != NULL && > root->parent_root->parse->setOperations != NULL && > IsA(root->parent_root->parse->setOperations, SetOperationStmt)) > qp_extra.setop = > (SetOperationStmt *) root->parent_root->parse->setOperations; > else > qp_extra.setop = NULL; > > This check cannot tell if the subquery is for a set operation or a CTE, > because its parent might have setOperations set in both cases. Hmm, is > there any way to differentiate between the two? As far as I see, there's nothing to go on... well unless you counted canSetTag, which is false for the CTE (per analyzeCTE())... but that's certainly not the fix. I did wonder when first working on this patch if subquery_planner() should grow an extra parameter, or maybe consolidate some existing ones by passing some struct that provides the planner with a bit more context about the query. A few of the existing parameters are likely candidates for being in such a struct. e.g. hasRecursion and tuple_fraction. A SetOperationStmt could go in there too. The other CTE thread about the PathKey change you worked on highlights that something like this could be useful. I posted in [1] about this. David [1] https://postgr.es/m/CAApHDvrF53ErmonnpO77eDiJm7PyReZ+nD=4FSsSOmaKx6+MuQ@mail.gmail.com
On Wed, 27 Mar 2024 at 22:47, David Rowley <dgrowleyml@gmail.com> wrote: > I did wonder when first working on this patch if subquery_planner() > should grow an extra parameter, or maybe consolidate some existing > ones by passing some struct that provides the planner with a bit more > context about the query. A few of the existing parameters are likely > candidates for being in such a struct. e.g. hasRecursion and > tuple_fraction. A SetOperationStmt could go in there too. The attached is roughly what I had in mind. I've not taken the time to see what comments need to be updated, so the attached aims only to assist discussion. David
Вложения
On Wed, Mar 27, 2024 at 6:34 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 27 Mar 2024 at 22:47, David Rowley <dgrowleyml@gmail.com> wrote:
> I did wonder when first working on this patch if subquery_planner()
> should grow an extra parameter, or maybe consolidate some existing
> ones by passing some struct that provides the planner with a bit more
> context about the query. A few of the existing parameters are likely
> candidates for being in such a struct. e.g. hasRecursion and
> tuple_fraction. A SetOperationStmt could go in there too.
The attached is roughly what I had in mind. I've not taken the time
to see what comments need to be updated, so the attached aims only to
assist discussion.
I like this idea. And there may be future applications for having such
a struct if we want to pass down additional information to subquery
planning, such as ordering requirements from outer query.
Thanks
Richard
a struct if we want to pass down additional information to subquery
planning, such as ordering requirements from outer query.
Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes: > On Wed, Mar 27, 2024 at 6:34 PM David Rowley <dgrowleyml@gmail.com> wrote: >> The attached is roughly what I had in mind. I've not taken the time >> to see what comments need to be updated, so the attached aims only to >> assist discussion. > I like this idea. I haven't studied the underlying problem yet, so I'm not quite buying into whether we need this struct at all ... but assuming we do, I feel like "PlannerContext" is a pretty poor name. There's basically nothing to distinguish it from "PlannerInfo", not to mention that readers would likely assume it's a memory context of some sort. Perhaps "SubqueryContext" or the like would be better? It still has the conflict with memory contexts though. regards, tom lane
On Thu, 28 Mar 2024 at 15:56, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I haven't studied the underlying problem yet, so I'm not quite > buying into whether we need this struct at all ... The problem is, when planning a UNION child query, we want to try and produce some Paths that would suit the top-level UNION query so that a Merge Append -> Unique can be used rather than a Append -> Sort -> Unique or Append -> Hash Aggregate. The problem is informing the UNION child query about what it is. I thought I could do root->parent_root->parse->setOperations for a UNION child to know what it is, but that breaks for a query such as: WITH q1(x) AS (SELECT 1) SELECT FROM q1 UNION SELECT FROM q1 as the CTE also has root->parent_root->parse->setOperations set and in the above case, that's a problem as there's some code that tries to match the non-resjunk child targetlists up with the SetOperationStmt's SortGroupClauses, but there's a mismatch for the CTE. The actual UNION children should have a 1:1 match for non-junk columns. > but assuming > we do, I feel like "PlannerContext" is a pretty poor name. > There's basically nothing to distinguish it from "PlannerInfo", > not to mention that readers would likely assume it's a memory > context of some sort. > > Perhaps "SubqueryContext" or the like would be better? It > still has the conflict with memory contexts though. Maybe something with "Parameters" in the name? David
David Rowley <dgrowleyml@gmail.com> writes: > The problem is informing the UNION child query about what it is. I > thought I could do root->parent_root->parse->setOperations for a UNION > child to know what it is, but that breaks for a query such as: Yeah, having grouping_planner poke into the parent level doesn't seem like a great idea here. I continue to not like the name "PlannerContext" but I agree passing down the setop explicitly is the way to go. >> Perhaps "SubqueryContext" or the like would be better? It >> still has the conflict with memory contexts though. > Maybe something with "Parameters" in the name? SubqueryParameters might be OK. Or SubqueryPlannerExtra? Since this is a bespoke struct that will probably only ever be used with subquery_planner, naming it after that function seems like a good idea. (And, given that fact and the fact that it's not a Node, I'm not sure it belongs in pathnodes.h. We could just declare it in planner.h.) Some minor comments now that I've looked at 66c0185a3 a little: * Near the head of grouping_planner is this bit: if (parse->setOperations) { /* * If there's a top-level ORDER BY, assume we have to fetch all the * tuples. This might be too simplistic given all the hackery below * to possibly avoid the sort; but the odds of accurate estimates here * are pretty low anyway. XXX try to get rid of this in favor of * letting plan_set_operations generate both fast-start and * cheapest-total paths. */ if (parse->sortClause) root->tuple_fraction = 0.0; I'm pretty sure this comment is mine, but it's old enough that I don't recall exactly what I had in mind. Still, it seems like your patch has addressed precisely the issue of generating fast-start plans for setops. Should we now remove this reset of tuple_fraction? * generate_setop_child_grouplist does this: /* assign a tleSortGroupRef, or reuse the existing one */ sgc->tleSortGroupRef = assignSortGroupRef(tle, targetlist); tle->ressortgroupref = sgc->tleSortGroupRef; That last line is redundant and confusing. It is not this code's charter to change ressortgroupref. regards, tom lane
I wrote: > David Rowley <dgrowleyml@gmail.com> writes: >> Maybe something with "Parameters" in the name? > SubqueryParameters might be OK. Or SubqueryPlannerExtra? > Since this is a bespoke struct that will probably only ever > be used with subquery_planner, naming it after that function > seems like a good idea. On third thought, I'm not at all convinced that we even want to invent this struct as compared to just adding another parameter to subquery_planner. The problem with a struct is what happens the next time we need to add a parameter? If we add yet another function parameter, we can count on the compiler to complain about call sites that didn't get the memo. Adding a field within an existing struct provokes no such warning, leading to situations with uninitialized fields that might accidentally work during testing, but fail the minute they get to the field. If you do want to go this direction, a minimum safety requirement would be to have an ironclad rule that callers memset the whole struct to zero before filling it, so that any not-set fields will at least have predictable values. But I don't see the point really. regards, tom lane
On Fri, 29 Mar 2024 at 08:53, Tom Lane <tgl@sss.pgh.pa.us> wrote: > On third thought, I'm not at all convinced that we even want to > invent this struct as compared to just adding another parameter > to subquery_planner. The problem with a struct is what happens > the next time we need to add a parameter? If we add yet another > function parameter, we can count on the compiler to complain > about call sites that didn't get the memo. Adding a field > within an existing struct provokes no such warning, leading > to situations with uninitialized fields that might accidentally > work during testing, but fail the minute they get to the field. I agree it's best to break callers that don't update their code to consider passing or not passing a SetOperationStmt. I've just committed a fix to do it that way. This also seems to be the path of least resistance, which also appeals. I opted to add a new test alongside the existing tests which validate set operations with an empty SELECT list work. The new tests include the variation that the set operation has both a materialized and non-materialized CTE as a child. This was only a problem with a materialized CTE, but I opted to include a non-materialized one as I don't expect that we'll get this exact problem again. I was just keen on getting more coverage with a couple of cheap tests. Thanks for your input on this. I'll review your other comments shortly. David