Обсуждение: Expand applicability of aggregate's sortop optimization

Поиск
Список
Период
Сортировка

Expand applicability of aggregate's sortop optimization

От
Matthias van de Meent
Дата:
Hi,

As you may know, aggregates like SELECT MIN(unique1) FROM tenk1; are
rewritten as SELECT unique1 FROM tenk1 ORDER BY unique1 USING < LIMIT
1; by using the optional sortop field in the aggregator.
However, this optimization is disabled for clauses that in itself have
an ORDER BY clause such as `MIN(unique1 ORDER BY <anything>), because
<anything> can cause reordering of distinguisable values like 1.0 and
1.00, which then causes measurable differences in the output. In the
general case, that's a good reason to not apply this optimization, but
in some cases, we could still apply the index optimization.

One of those cases is fixed in the attached patch: if we order by the
same column that we're aggregating, using the same order class as the
aggregate's sort operator (i.e. the aggregate's sortop is in the same
btree opclass as the ORDER BY's sort operator), then we can still use
the index operation: The sort behaviour isn't changed, thus we can
apply the optimization.

PFA the small patch that implements this.

Note that we can't blindly accept just any ordering by the same
column: If we had an opclass that sorted numeric values by the length
of the significant/mantissa, then that'd provide a different (and
distinct) ordering from that which is expected by the normal
min()/max() aggregates for numeric, which could cause us to return
arguably incorrect results for the aggregate expression.

Alternatively, the current code could be changed to build indexed
paths that append the SORT BY paths to the aggregate's sort operator,
but that'd significantly increase complexity here.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Вложения

Re: Expand applicability of aggregate's sortop optimization

От
Dagfinn Ilmari Mannsåker
Дата:
Matthias van de Meent <boekewurm+postgres@gmail.com> writes:

> PFA the small patch that implements this.

I don't have enough knowledge to have an opinion on most of the patch
other than it looks okay at a glance, but the list API usage could be
updated to more modern variants:

> diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
> index afb5445b77..d8479fe286 100644
> --- a/src/backend/optimizer/plan/planagg.c
> +++ b/src/backend/optimizer/plan/planagg.c
> @@ -253,6 +253,16 @@ can_minmax_aggs(PlannerInfo *root, List **context)
>          if (list_length(aggref->args) != 1)
>              return false;        /* it couldn't be MIN/MAX */
>  
> +        /*
> +         * We might implement the optimization when a FILTER clause is present
> +         * by adding the filter to the quals of the generated subquery.  For
> +         * now, just punt.
> +         */
> +        if (aggref->aggfilter != NULL)
> +            return false;
> +
> +        curTarget = (TargetEntry *) linitial(aggref->args);

This could be linitial_node(TargetEntry, aggref->args).

> +            if (list_length(aggref->aggorder) > 1)
> +                return false;
> +
> +            orderClause = castNode(SortGroupClause, linitial(aggref->aggorder));

This could be linitial_node(SortGroupClause, aggref->aggorder).

> +            if (orderClause->sortop != aggsortop)
> +            {
> +                List   *btclasses;
> +                ListCell *cell;
> +                bool    match = false;
> +
> +                btclasses = get_op_btree_interpretation(orderClause->sortop);
> +
> +                foreach(cell, btclasses)
> +                {
> +                    OpBtreeInterpretation *interpretation;
> +                    interpretation = (OpBtreeInterpretation *) lfirst(cell);

This could be foreach_ptr(OpBtreeInterpretation, interpretation, btclasses),
which also eliminates the need for the explicit `ListCell *` variable
and lfirst() call.

- ilmari



Re: Expand applicability of aggregate's sortop optimization

От
David Rowley
Дата:
On Wed, 8 May 2024 at 22:13, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> As you may know, aggregates like SELECT MIN(unique1) FROM tenk1; are
> rewritten as SELECT unique1 FROM tenk1 ORDER BY unique1 USING < LIMIT
> 1; by using the optional sortop field in the aggregator.
> However, this optimization is disabled for clauses that in itself have
> an ORDER BY clause such as `MIN(unique1 ORDER BY <anything>), because
> <anything> can cause reordering of distinguisable values like 1.0 and
> 1.00, which then causes measurable differences in the output. In the
> general case, that's a good reason to not apply this optimization, but
> in some cases, we could still apply the index optimization.

I wonder if we should also consider as an alternative to this to just
have an aggregate support function, similar to
SupportRequestOptimizeWindowClause that just nullifies the aggorder /
aggdistinct fields for Min/Max aggregates on types where there's no
possible difference in output when calling the transition function on
rows in a different order.

Would that apply in enough cases for you?

I think it would rule out Min(numeric) and Max(numeric). We were
careful not to affect the number of decimal places in the numeric
output when using the moving aggregate inverse transition
infrastructure for WindowFuncs, so I agree we should maintain an
ability to control the aggregate transition order for numeric. (See
do_numeric_discard's maxScale if check)

I don't think floating point types have the same issues here. At least
+1.0 is greater than -1.0.

Are there any strange collation rules that would cause issues if we
did this with the text types?

David



Re: Expand applicability of aggregate's sortop optimization

От
David Rowley
Дата:
On Thu, 9 May 2024 at 12:26, David Rowley <dgrowleyml@gmail.com> wrote:
> I wonder if we should also consider as an alternative to this to just
> have an aggregate support function, similar to
> SupportRequestOptimizeWindowClause that just nullifies the aggorder /
> aggdistinct fields for Min/Max aggregates on types where there's no
> possible difference in output when calling the transition function on
> rows in a different order.
>
> Would that apply in enough cases for you?

One additional thought is that the above method would also help
eliminate redundant sorting in queries with a GROUP BY clause.
Whereas, the can_minmax_aggs optimisation is not applied in that case.

David



Re: Expand applicability of aggregate's sortop optimization

От
David Rowley
Дата:
On Thu, 9 May 2024 at 13:08, David Rowley <dgrowleyml@gmail.com> wrote:
> One additional thought is that the above method would also help
> eliminate redundant sorting in queries with a GROUP BY clause.
> Whereas, the can_minmax_aggs optimisation is not applied in that case.

Another argument for using this method is that
SupportRequestOptimizeAggref could allow future unrelated
optimisations such as swapping count(<non-nullable-col>) for count(*).
Where <non-nullable-col> is a NOT NULL column and isn't nullable by
any outer join.  Doing that could speed some queries up quite a bit as
it may mean fewer columns to deform from the tuple. You could imagine
a fact table with many columns and a few dimensions, often the
dimension columns that you'd expect to use in GROUP BY would appear
before the columns you'd aggregate on.

David