Обсуждение: pg16: XX000: could not find pathkey item to sort
This fails since 1349d2790b commit 1349d2790bf48a4de072931c722f39337e72055e Author: David Rowley <drowley@postgresql.org> Date: Tue Aug 2 23:11:45 2022 +1200 Improve performance of ORDER BY / DISTINCT aggregates ts=# CREATE TABLE t (a int, b text) PARTITION BY RANGE (a); ts=# CREATE TABLE td PARTITION OF t DEFAULT; ts=# INSERT INTO t SELECT 1 AS a, '' AS b; ts=# SET enable_partitionwise_aggregate=on; ts=# explain SELECT a, COUNT(DISTINCT b) FROM t GROUP BY a; ERROR: XX000: could not find pathkey item to sort LOCATION: prepare_sort_from_pathkeys, createplan.c:6235 -- Justin
On Mon, Sep 18, 2023 at 10:02 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
This fails since 1349d2790b
commit 1349d2790bf48a4de072931c722f39337e72055e
Author: David Rowley <drowley@postgresql.org>
Date: Tue Aug 2 23:11:45 2022 +1200
Improve performance of ORDER BY / DISTINCT aggregates
ts=# CREATE TABLE t (a int, b text) PARTITION BY RANGE (a);
ts=# CREATE TABLE td PARTITION OF t DEFAULT;
ts=# INSERT INTO t SELECT 1 AS a, '' AS b;
ts=# SET enable_partitionwise_aggregate=on;
ts=# explain SELECT a, COUNT(DISTINCT b) FROM t GROUP BY a;
ERROR: XX000: could not find pathkey item to sort
LOCATION: prepare_sort_from_pathkeys, createplan.c:6235
Thanks for the report! I've looked at it a little bit. In function
adjust_group_pathkeys_for_groupagg we add the pathkeys in ordered
aggregates to root->group_pathkeys. But if the new added pathkeys do
not have EC members that match the targetlist or can be computed from
the targetlist, prepare_sort_from_pathkeys would have problem computing
sort column info for the new added pathkeys. In the given example, the
pathkey representing 'b' can not match or be computed from the current
targetlist, so prepare_sort_from_pathkeys emits the error.
My first thought about the fix is that we artificially add resjunk
target entries to parse->targetList for the ordered aggregates'
arguments that are ORDER BY expressions, as attached. While this can
fix the given query, it would cause Assert failure for the query in
sql/triggers.sql.
-- inserts only
insert into my_table values (1, 'AAA'), (2, 'BBB')
on conflict (a) do
update set b = my_table.b || ':' || excluded.b;
I haven't looked into how that happens.
Any thoughts?
Thanks
Richard
adjust_group_pathkeys_for_groupagg we add the pathkeys in ordered
aggregates to root->group_pathkeys. But if the new added pathkeys do
not have EC members that match the targetlist or can be computed from
the targetlist, prepare_sort_from_pathkeys would have problem computing
sort column info for the new added pathkeys. In the given example, the
pathkey representing 'b' can not match or be computed from the current
targetlist, so prepare_sort_from_pathkeys emits the error.
My first thought about the fix is that we artificially add resjunk
target entries to parse->targetList for the ordered aggregates'
arguments that are ORDER BY expressions, as attached. While this can
fix the given query, it would cause Assert failure for the query in
sql/triggers.sql.
-- inserts only
insert into my_table values (1, 'AAA'), (2, 'BBB')
on conflict (a) do
update set b = my_table.b || ':' || excluded.b;
I haven't looked into how that happens.
Any thoughts?
Thanks
Richard
Вложения
On Tue, 19 Sept 2023 at 23:45, Richard Guo <guofenglinux@gmail.com> wrote: > My first thought about the fix is that we artificially add resjunk > target entries to parse->targetList for the ordered aggregates' > arguments that are ORDER BY expressions, as attached. While this can > fix the given query, it would cause Assert failure for the query in > sql/triggers.sql. > Any thoughts? Unfortunately, we can't do that as it'll lead to target entries existing in the GroupAggregate's target list that have been aggregated. postgres=# explain verbose SELECT a, COUNT(DISTINCT b) FROM rp GROUP BY a; QUERY PLAN -------------------------------------------------------------------------------------- Append (cost=88.17..201.39 rows=400 width=44) -> GroupAggregate (cost=88.17..99.70 rows=200 width=44) Output: rp.a, count(DISTINCT rp.b), rp.b Your patch adds rp.b as an output column of the GroupAggregate. Logically, that column cannot exist there as there is no correct single value of rp.b after aggregation. I think the fix needs to go into create_agg_path(). The problem is that for AGG_SORTED we do: if (aggstrategy == AGG_SORTED) pathnode->path.pathkeys = subpath->pathkeys; /* preserves order */ which assumes that all of the columns before the aggregate will be available after the aggregate. That likely used to work ok before 1349d2790 as the planner wouldn't have requested any Pathkeys for columns that were not available below the Agg node. We can no longer take the subpath pathkey's verbatim. We need to strip off pathkeys for columns that are not in pathnode's targetlist. I've attached a patch which adds a new function to pathkeys.c to strip off any PathKeys in a list that don't have a corresponding item in the given PathTarget and just return a prefix of the input pathkey list up until the first expr that can't be found. I'm concerned that this patch will be too much overhead when creating paths when a PathKey's EquivalenceClass has a large number of members from partitioned tables. I wondered if we should instead just check if the subpath's pathkeys match root->group_pathkeys and if they do set the AggPath's pathkeys to list_copy_head(subpath->pathkeys, root->num_groupby_pathkeys), that'll be much cheaper, but it just feels a bit too much like a special case. David
Вложения
On Tue, 3 Oct 2023 at 09:11, David Rowley <dgrowleyml@gmail.com> wrote: > I'm concerned that this patch will be too much overhead when creating > paths when a PathKey's EquivalenceClass has a large number of members > from partitioned tables. I just tried out the patch to see how much it affects the performance of the planner. I think we need to find a better way to strip off the pathkeys for the columns that have been aggregated. Setup: create table lp (a int, b int) partition by list (a); select 'create table lp'||x||' partition of lp for values in('||x||')' from generate_series(0,999)x; \gexec \pset pager off set enable_partitionwise_aggregate=1; Benchmark query: explain (summary on) select a,count(*) from lp group by a; Master: Planning Time: 23.945 ms Planning Time: 23.887 ms Planning Time: 23.927 ms perf top: 7.39% libc.so.6 [.] __memmove_avx_unaligned_erms 6.98% [kernel] [k] clear_page_rep 5.69% postgres [.] bms_is_subset 5.07% postgres [.] fetch_upper_rel 4.41% postgres [.] bms_equal Patched: Planning Time: 41.410 ms Planning Time: 41.474 ms Planning Time: 41.488 ms perf top: 19.02% postgres [.] bms_is_subset 6.91% postgres [.] find_ec_member_matching_expr 5.93% libc.so.6 [.] __memmove_avx_unaligned_erms 5.55% [kernel] [k] clear_page_rep 4.07% postgres [.] fetch_upper_rel 3.46% postgres [.] bms_equal > I wondered if we should instead just check > if the subpath's pathkeys match root->group_pathkeys and if they do > set the AggPath's pathkeys to list_copy_head(subpath->pathkeys, > root->num_groupby_pathkeys), that'll be much cheaper, but it just > feels a bit too much like a special case. I tried this approach (patch attached) and it does perform better than the other patch: create_agg_path_fix2.patch: Planning Time: 24.357 ms Planning Time: 24.293 ms Planning Time: 24.259 ms 7.45% libc.so.6 [.] __memmove_avx_unaligned_erms 6.90% [kernel] [k] clear_page_rep 5.56% postgres [.] bms_is_subset 5.38% postgres [.] bms_equal I wonder if the attached patch is too much of a special case fix. I guess from the lack of complaints previously that there are no other cases where we could possibly have pathkeys that belong to columns that are aggregated. I've not gone to much effort to see if I can craft a case that hits this without the ORDER BY/DISTINCT aggregate optimisation, however. David
Вложения
On Tue, 3 Oct 2023 at 20:16, David Rowley <dgrowleyml@gmail.com> wrote: > I wonder if the attached patch is too much of a special case fix. I > guess from the lack of complaints previously that there are no other > cases where we could possibly have pathkeys that belong to columns > that are aggregated. I've not gone to much effort to see if I can > craft a case that hits this without the ORDER BY/DISTINCT aggregate > optimisation, however. I spent more time on this today. I'd been wondering if there was any reason why create_agg_path() would receive a subpath with pathkeys that were anything but the PlannerInfo's group_pathkeys. I mean, how could we do Group Aggregate if it wasn't? I wondered if grouping sets might change that, but it seems the group_pathkeys will be set to the initial grouping set. Given that, it would seem it's safe to just trim off any pathkey that was added to the group_pathkeys by adjust_group_pathkeys_for_groupagg(). PlannerInfo.num_groupby_pathkeys marks the number of pathkeys that existed in group_pathkeys before adjust_group_pathkeys_for_groupagg() made any additions, so we can just trim the list length back to that. I've done this in the attached patch. I also considered if it was worth adding a regression test for this and I concluded that there are better ways to test for this and considered if we should add some code to createplan.c to check that all Path pathkeys have corresponding items in the PathTarget. I've included an additional patch which adds some code in USE_ASSERT_CHECKING builds to verify this. Without the fix it's simple enough to trigger this with a query such as: select two,count(distinct four) from tenk1 group by two order by two; Without the fix the additional asserts cause the regression tests to fail, but with the fix everything passes. Justin's case is quite an obscure way to hit this as it requires partitionwise aggregation plus a single partition so that the Append is removed due to only having a single subplan in setrefs.c. If there had been 2 partitions, then the AppendPath wouldn't have inherited the subpath's pathkeys per code at the end of create_append_path(). So in short, I propose the attached fix without any regression tests because I feel that any regression test would just mark that there was a big in create_agg_path() and not really help with ensuring we don't end up with some similar problem in the future. I have some concerns that the assert_pathkeys_in_target() function might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm not proposing to commit that without further discussion. Does anyone feel differently? If not, I plan to push the attached strip_aggregate_pathkeys_from_aggpaths_v2.patch early next week. David
Вложения
On Thu, Oct 5, 2023 at 2:26 PM David Rowley <dgrowleyml@gmail.com> wrote:
So in short, I propose the attached fix without any regression tests
because I feel that any regression test would just mark that there was
a big in create_agg_path() and not really help with ensuring we don't
end up with some similar problem in the future.
If the pathkeys that were added by adjust_group_pathkeys_for_groupagg()
are computable from the targetlist, it seems that we do not need to trim
them off, because prepare_sort_from_pathkeys() will add resjunk target
entries for them. But it's also no harm if we trim them off. So I
think the patch is a pretty safe fix. +1 to it.
are computable from the targetlist, it seems that we do not need to trim
them off, because prepare_sort_from_pathkeys() will add resjunk target
entries for them. But it's also no harm if we trim them off. So I
think the patch is a pretty safe fix. +1 to it.
I have some concerns that the assert_pathkeys_in_target() function
might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm
not proposing to commit that without further discussion.
Yeah, it looks like some heavy to call assert_pathkeys_in_target() for
each path node. Can we run some benchmarks to see how much overhead it
would bring to USE_ASSERT_CHECKING build?
Thanks
Richard
each path node. Can we run some benchmarks to see how much overhead it
would bring to USE_ASSERT_CHECKING build?
Thanks
Richard
On Sun, 8 Oct 2023 at 23:52, Richard Guo <guofenglinux@gmail.com> wrote: > On Thu, Oct 5, 2023 at 2:26 PM David Rowley <dgrowleyml@gmail.com> wrote: >> >> So in short, I propose the attached fix without any regression tests >> because I feel that any regression test would just mark that there was >> a big in create_agg_path() and not really help with ensuring we don't >> end up with some similar problem in the future. > > > If the pathkeys that were added by adjust_group_pathkeys_for_groupagg() > are computable from the targetlist, it seems that we do not need to trim > them off, because prepare_sort_from_pathkeys() will add resjunk target > entries for them. But it's also no harm if we trim them off. So I > think the patch is a pretty safe fix. +1 to it. hmm, I think one of us does not understand what is going on here. I tried to explain in [1] why we *need* to strip off the pathkeys added by adjust_group_pathkeys_for_groupagg(). Given the following example: create table ab (a int,b int); explain (costs off) select a,count(distinct b) from ab group by a; QUERY PLAN ---------------------------- GroupAggregate Group Key: a -> Sort Sort Key: a, b -> Seq Scan on ab (5 rows) adjust_group_pathkeys_for_groupagg() will add the pathkey for the "b" column and that results in the Sort node sorting on {a,b}. It's simply not at all valid to have the GroupAggregate path claim that its pathkeys are also (effectively) {a,b}" as "b" does not and cannot legally exist after the aggregation takes place. We cannot put a resjunk "b" in the targetlist of the GroupAggregate either as there could be any number "b" values aggregated. Can you explain why you think we can put a resjunk "b" in the target list of the GroupAggregate in the above case? >> >> I have some concerns that the assert_pathkeys_in_target() function >> might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm >> not proposing to commit that without further discussion. > > > Yeah, it looks like some heavy to call assert_pathkeys_in_target() for > each path node. Can we run some benchmarks to see how much overhead it > would bring to USE_ASSERT_CHECKING build? I think it'll be easy to show that there is an overhead to it. It'll be in the realm of the ~41ms patched vs ~24ms unpatched that I showed in [2]. That's quite an extreme case. Maybe it's worth checking the total planning time spent in a run of the regression tests with and without the patch to see how much overhead it adds to the "average case". David [1] https://postgr.es/m/CAApHDvpJJigQRW29TppTOPYp+Aui0mtd3MpfRxyKv=N-tB62jQ@mail.gmail.com [2] https://postgr.es/m/CAApHDvo7RzcQYw-gnkZr6QCijCqf8vJLkJ4XFk-KawvyAw109Q@mail.gmail.com
On Mon, Oct 9, 2023 at 7:42 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Sun, 8 Oct 2023 at 23:52, Richard Guo <guofenglinux@gmail.com> wrote:
> If the pathkeys that were added by adjust_group_pathkeys_for_groupagg()
> are computable from the targetlist, it seems that we do not need to trim
> them off, because prepare_sort_from_pathkeys() will add resjunk target
> entries for them. But it's also no harm if we trim them off. So I
> think the patch is a pretty safe fix. +1 to it.
hmm, I think one of us does not understand what is going on here. I
tried to explain in [1] why we *need* to strip off the pathkeys added
by adjust_group_pathkeys_for_groupagg().
Sorry I didn't make myself clear. I understand why we need to trim off
the pathkeys added by adjust_group_pathkeys_for_groupagg(). What I
meant was that if the new added pathkeys are *computable* from the
existing target entries, then prepare_sort_from_pathkeys() will add
resjunk target entries for them, so there seems to be no problem even if
we do not trim them off. For example
explain (verbose, costs off)
select a, count(distinct a+1) from prt1 group by a order by a;
QUERY PLAN
------------------------------------------------------------------------------------
Result
Output: prt1.a, (count(DISTINCT ((prt1.a + 1))))
-> Merge Append
Sort Key: prt1.a, ((prt1.a + 1))
-> GroupAggregate
Output: prt1.a, count(DISTINCT ((prt1.a + 1))), ((prt1.a + 1))
Group Key: prt1.a
-> Sort
Output: prt1.a, ((prt1.a + 1))
Sort Key: prt1.a, ((prt1.a + 1))
-> Seq Scan on public.prt1_p1 prt1
Output: prt1.a, (prt1.a + 1)
...
Expression 'a+1' is *computable* from the existing entry 'a', so we just
add a new resjunk target entry for 'a+1', and there is no error planning
this query. But if we change 'a+1' to something that is not computable,
then we would have problems (without your fix), and the reason has been
well explained by your messages.
explain (verbose, costs off)
select a, count(distinct b) from prt1 group by a order by a;
ERROR: could not find pathkey item to sort
Having said that, I think it's the right thing to do to trim off the new
added pathkeys, even if they are *computable*. In the plan above, the
'(prt1.a + 1)' in GroupAggregate's targetlist and MergeAppend's
pathkeys are actually redundant. It's good to remove it.
the pathkeys added by adjust_group_pathkeys_for_groupagg(). What I
meant was that if the new added pathkeys are *computable* from the
existing target entries, then prepare_sort_from_pathkeys() will add
resjunk target entries for them, so there seems to be no problem even if
we do not trim them off. For example
explain (verbose, costs off)
select a, count(distinct a+1) from prt1 group by a order by a;
QUERY PLAN
------------------------------------------------------------------------------------
Result
Output: prt1.a, (count(DISTINCT ((prt1.a + 1))))
-> Merge Append
Sort Key: prt1.a, ((prt1.a + 1))
-> GroupAggregate
Output: prt1.a, count(DISTINCT ((prt1.a + 1))), ((prt1.a + 1))
Group Key: prt1.a
-> Sort
Output: prt1.a, ((prt1.a + 1))
Sort Key: prt1.a, ((prt1.a + 1))
-> Seq Scan on public.prt1_p1 prt1
Output: prt1.a, (prt1.a + 1)
...
Expression 'a+1' is *computable* from the existing entry 'a', so we just
add a new resjunk target entry for 'a+1', and there is no error planning
this query. But if we change 'a+1' to something that is not computable,
then we would have problems (without your fix), and the reason has been
well explained by your messages.
explain (verbose, costs off)
select a, count(distinct b) from prt1 group by a order by a;
ERROR: could not find pathkey item to sort
Having said that, I think it's the right thing to do to trim off the new
added pathkeys, even if they are *computable*. In the plan above, the
'(prt1.a + 1)' in GroupAggregate's targetlist and MergeAppend's
pathkeys are actually redundant. It's good to remove it.
Can you explain why you think we can put a resjunk "b" in the target
list of the GroupAggregate in the above case?
Hmm, I don't think we can do that, because 'b' is not *computable* from
the existing target entries, as I explained above.
Thanks
Richard
the existing target entries, as I explained above.
Thanks
Richard
On Mon, 9 Oct 2023 at 12:42, David Rowley <dgrowleyml@gmail.com> wrote: > Maybe it's worth checking the total planning time spent in a run of > the regression tests with and without the patch to see how much > overhead it adds to the "average case". I've now pushed the patch that trims off the Pathkeys for the ORDER BY / DISTINCT aggregates. As for the patch to verify the pathkeys during create plan, I patched master with the attached plan_times.patch.txt and used the following to check the time spent in the planner for 3 runs of make installcheck. $ for i in {1..3}; do pg_ctl start -D pgdata -l plantime.log > /dev/null && cd pg_src && make installcheck > /dev/null && cd .. && grep "planning time in" plantime.log|sed -E -e 's/.*planning time in (.*) nanoseconds/\1/'|awk '{nanoseconds += $1} END{print nanoseconds}' && pg_ctl stop -D pgdata > /dev/null && rm plantime.log; done Master: 1855788104 1839655412 1740769066 Patched: 1917797221 1766606115 1881322655 Those results are a bit noisy. Perhaps a few more runs might yield more consistency, but it seems that there's not too much overhead to it. If I take the minimum value out of the 3 runs from each, it comes to about 1.5% extra time spent in planning. Perhaps that's OK. David
Вложения
On Mon, Oct 9, 2023 at 12:13 PM David Rowley <dgrowleyml@gmail.com> wrote:
I've now pushed the patch that trims off the Pathkeys for the ORDER BY
/ DISTINCT aggregates.
Thanks for pushing!
Those results are a bit noisy. Perhaps a few more runs might yield
more consistency, but it seems that there's not too much overhead to
it. If I take the minimum value out of the 3 runs from each, it comes
to about 1.5% extra time spent in planning. Perhaps that's OK.
I agree that the overhead is acceptable, especially it only happens in
USE_ASSERT_CHECKING builds.
Thanks
Richard
USE_ASSERT_CHECKING builds.
Thanks
Richard
Hello David, 09.10.2023 07:13, David Rowley wrote: > On Mon, 9 Oct 2023 at 12:42, David Rowley <dgrowleyml@gmail.com> wrote: >> Maybe it's worth checking the total planning time spent in a run of >> the regression tests with and without the patch to see how much >> overhead it adds to the "average case". > I've now pushed the patch that trims off the Pathkeys for the ORDER BY > / DISTINCT aggregates. > I've stumbled upon the same error, but this time it apparently has another cause. It can be produced (on REL_16_STABLE and master) as follows: CREATE TABLE t (a int, b int) PARTITION BY RANGE (a); CREATE TABLE td PARTITION OF t DEFAULT; CREATE TABLE tp1 PARTITION OF t FOR VALUES FROM (1) TO (2); SET enable_partitionwise_aggregate = on; SET parallel_setup_cost = 0; SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a; ERROR: could not find pathkey item to sort `git bisect` for this anomaly blames the same commit 1349d2790. Best regards, Alexander
On Thu, 14 Mar 2024 at 06:00, Alexander Lakhin <exclusion@gmail.com> wrote: > I've stumbled upon the same error, but this time it apparently has another > cause. It can be produced (on REL_16_STABLE and master) as follows: > CREATE TABLE t (a int, b int) PARTITION BY RANGE (a); > CREATE TABLE td PARTITION OF t DEFAULT; > CREATE TABLE tp1 PARTITION OF t FOR VALUES FROM (1) TO (2); > SET enable_partitionwise_aggregate = on; > SET parallel_setup_cost = 0; > SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a; > > ERROR: could not find pathkey item to sort > > `git bisect` for this anomaly blames the same commit 1349d2790. Thanks for finding and for the recreator script. I've attached a patch which fixes the problem for me. On debugging this I uncovered some other stuff that looks broken which seems to caused by partition-wise aggregates. With your example query, in get_useful_pathkeys_for_relation(), we call relation_can_be_sorted_early() to check if the pathkey can be used as a set of pathkeys in useful_pathkeys_list. The problem is that in your query the 'rel' is the base relation belonging to the partitioned table and relation_can_be_sorted_early() looks through the targetlist for that relation and finds columns "a" and "b" in there. The problem is "b" has been aggregated away as partial aggregation has taken place due to the partition-wise aggregation. I believe whichever rel we should be using there should have an Aggref in the target exprs rather than the plain unaggregated column. I've added Robert and Ashutosh to see what their thoughts are on this. David
Вложения
On Thu, Mar 14, 2024 at 4:30 AM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 14 Mar 2024 at 06:00, Alexander Lakhin <exclusion@gmail.com> wrote:
> I've stumbled upon the same error, but this time it apparently has another
> cause. It can be produced (on REL_16_STABLE and master) as follows:
> CREATE TABLE t (a int, b int) PARTITION BY RANGE (a);
> CREATE TABLE td PARTITION OF t DEFAULT;
> CREATE TABLE tp1 PARTITION OF t FOR VALUES FROM (1) TO (2);
> SET enable_partitionwise_aggregate = on;
> SET parallel_setup_cost = 0;
> SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a;
>
> ERROR: could not find pathkey item to sort
>
> `git bisect` for this anomaly blames the same commit 1349d2790.
Thanks for finding and for the recreator script.
I've attached a patch which fixes the problem for me.
On debugging this I uncovered some other stuff that looks broken which
seems to caused by partition-wise aggregates. With your example
query, in get_useful_pathkeys_for_relation(), we call
relation_can_be_sorted_early() to check if the pathkey can be used as
a set of pathkeys in useful_pathkeys_list. The problem is that in
your query the 'rel' is the base relation belonging to the partitioned
table and relation_can_be_sorted_early() looks through the targetlist
for that relation and finds columns "a" and "b" in there. The problem
is "b" has been aggregated away as partial aggregation has taken place
due to the partition-wise aggregation. I believe whichever rel we
should be using there should have an Aggref in the target exprs rather
than the plain unaggregated column. I've added Robert and Ashutosh to
see what their thoughts are on this.
I don't understand why root->query_pathkeys has both a and b. "a" is there because of GROUP BY and ORDER BY clause. But why "b"?
Under the debugger this is what I observed: generate_useful_gather_paths() gets called twice, once for the base relation and second time for the upper relation.
When it's called for base relation, it includes "a" and "b" both in the useful pathkeys. The plan doesn't use sortedness on b. But I don't think that's the problem of the relation used. It looks like root->query_pathkeys containing "b" may be a problem.
When it's called for upper relation, the reltarget has "a" and Aggref() and it includes only "a" in the useful pathkeys which is as per your expectation.
--
Best Wishes,
Ashutosh Bapat
On Thu, 14 Mar 2024 at 18:23, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > I don't understand why root->query_pathkeys has both a and b. "a" is there because of GROUP BY and ORDER BY clause. Butwhy "b"? So that the ORDER BY aggregate function can be evaluated without nodeAgg.c having to perform the sort. See adjust_group_pathkeys_for_groupagg(). David
On Thu, 14 Mar 2024 at 12:00, David Rowley <dgrowleyml@gmail.com> wrote: > I've attached a patch which fixes the problem for me. I've pushed the patch to fix gather_grouping_paths(). The issue with the RelOptInfo having the incorrect PathTarget->exprs after the partial phase of partition-wise aggregate remains. David
On Thu, Mar 14, 2024 at 3:45 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Thu, 14 Mar 2024 at 18:23, Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
> I don't understand why root->query_pathkeys has both a and b. "a" is there because of GROUP BY and ORDER BY clause. But why "b"?
So that the ORDER BY aggregate function can be evaluated without
nodeAgg.c having to perform the sort. See
adjust_group_pathkeys_for_groupagg().
Thanks. To me, it looks like we are gathering pathkeys, which if used to sort the result of overall join, would avoid sorting in as many as aggregates as possible.
relation_can_be_sorted_early() finds, pathkeys which if used to sort the given relation, would help sorting the overall join. Contrary to what I said earlier, it might help if the base relation is sorted on "a" and "b". What I find weird is that the sorting is not pushed down to the partitions, where it would help most.
#explain verbose SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a;
QUERY PLAN
------------------------------------------------------------------------------------
GroupAggregate (cost=362.21..398.11 rows=200 width=12)
Output: t.a, sum(t.b ORDER BY t.b)
Group Key: t.a
-> Sort (cost=362.21..373.51 rows=4520 width=8)
Output: t.a, t.b
Sort Key: t.a, t.b
-> Append (cost=0.00..87.80 rows=4520 width=8)
-> Seq Scan on public.tp1 t_1 (cost=0.00..32.60 rows=2260 width=8)
Output: t_1.a, t_1.b
-> Seq Scan on public.td t_2 (cost=0.00..32.60 rows=2260 width=8)
Output: t_2.a, t_2.b
(11 rows)
QUERY PLAN
------------------------------------------------------------------------------------
GroupAggregate (cost=362.21..398.11 rows=200 width=12)
Output: t.a, sum(t.b ORDER BY t.b)
Group Key: t.a
-> Sort (cost=362.21..373.51 rows=4520 width=8)
Output: t.a, t.b
Sort Key: t.a, t.b
-> Append (cost=0.00..87.80 rows=4520 width=8)
-> Seq Scan on public.tp1 t_1 (cost=0.00..32.60 rows=2260 width=8)
Output: t_1.a, t_1.b
-> Seq Scan on public.td t_2 (cost=0.00..32.60 rows=2260 width=8)
Output: t_2.a, t_2.b
(11 rows)
and that's the case even without parallel plans
#explain verbose SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a;
QUERY PLAN
------------------------------------------------------------------------------------
GroupAggregate (cost=362.21..398.11 rows=200 width=12)
Output: t.a, sum(t.b ORDER BY t.b)
Group Key: t.a
-> Sort (cost=362.21..373.51 rows=4520 width=8)
Output: t.a, t.b
Sort Key: t.a, t.b
-> Append (cost=0.00..87.80 rows=4520 width=8)
-> Seq Scan on public.tp1 t_1 (cost=0.00..32.60 rows=2260 width=8)
Output: t_1.a, t_1.b
-> Seq Scan on public.td t_2 (cost=0.00..32.60 rows=2260 width=8)
Output: t_2.a, t_2.b
(11 rows)
QUERY PLAN
------------------------------------------------------------------------------------
GroupAggregate (cost=362.21..398.11 rows=200 width=12)
Output: t.a, sum(t.b ORDER BY t.b)
Group Key: t.a
-> Sort (cost=362.21..373.51 rows=4520 width=8)
Output: t.a, t.b
Sort Key: t.a, t.b
-> Append (cost=0.00..87.80 rows=4520 width=8)
-> Seq Scan on public.tp1 t_1 (cost=0.00..32.60 rows=2260 width=8)
Output: t_1.a, t_1.b
-> Seq Scan on public.td t_2 (cost=0.00..32.60 rows=2260 width=8)
Output: t_2.a, t_2.b
(11 rows)
But it could be just because the corresponding plan was not found to be optimal. May be because there isn't enough data in those tables.
If the problem you speculate is different from this one, I am not able to see it. It might help give an example query or explain more.
--
Best Wishes,
Ashutosh Bapat
On Mon, 18 Mar 2024 at 18:50, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > If the problem you speculate is different from this one, I am not able to see it. It might help give an example query orexplain more. I looked at this again and I might have been wrong about there being a problem. I set a breakpoint in create_gather_merge_path() and adjusted the startup and total cost to 1 when I saw the pathkeys containing {a,b}. It turns out this is the non-partitionwise aggregate path, and of course, the targetlist there does contain the "b" column, so it's fine in that case that the pathkeys are {a,b}. I had previously thought that this was for the partition-wise aggregate plan, in which case the targetlist would contain a, sum(b order by b), of which there's no single value of "b" that we can legally sort by. Here's the full plan. postgres=# explain verbose SELECT a, sum(b order by b) FROM t GROUP BY a ORDER BY a; QUERY PLAN --------------------------------------------------------------------------------------------------- GroupAggregate (cost=1.00..25.60 rows=200 width=12) Output: t.a, sum(t.b ORDER BY t.b) Group Key: t.a -> Gather Merge (cost=1.00..1.00 rows=4520 width=8) Output: t.a, t.b Workers Planned: 2 -> Sort (cost=158.36..163.07 rows=1882 width=8) Output: t.a, t.b Sort Key: t.a, t.b -> Parallel Append (cost=0.00..56.00 rows=1882 width=8) -> Parallel Seq Scan on public.tp1 t_1 (cost=0.00..23.29 rows=1329 width=8) Output: t_1.a, t_1.b -> Parallel Seq Scan on public.td t_2 (cost=0.00..23.29 rows=1329 width=8) Output: t_2.a, t_2.b (14 rows) David