Обсуждение: Bug with aggregate Window Functions when using Order By. Elements dropped if order by provided

Поиск
Список
Период
Сортировка

Bug with aggregate Window Functions when using Order By. Elements dropped if order by provided

От
Walker Philips
Дата:
Postgres 14.8
Ubuntu 22.04 Jellyfish

Applying an order by clause changes the contents of an array_agg (and string_agg) when performing a window function.

select  right(a.telephone1,3), a.accountid, array_agg(accountid) over (partition by lower(a.telephone1) order by a.accountid desc) as hashed_household

from air_spotter.account a;


generates the results:
889      1C14927F-C28A-E311-A35E-6C3BE5A84FE4      {1C14927F-C28A-E311-A35E-6C3BE5A84FE4}
889      8552B5D9-95ED-E311-96E3-6C3BE5A86DF8      {1C14927F-C28A-E311-A35E-6C3BE5A84FE4,8552B5D9-95ED-E311-96E3-6C3BE5A86DF8}
[NULL]  B25520B1-A08B-E011-8DC7-1CC1DEE8AA5F      {B25520B1-A08B-E011-8DC7-1CC1DEE8AA5F}

select  right(a.telephone1,3), a.accountid, array_agg(accountid) over (partition by lower(a.telephone1)) as hashed_household

from air_spotter.account a;



889     1C14927F-C28A-E311-A35E-6C3BE5A84FE4      {1C14927F-C28A-E311-A35E-6C3BE5A84FE4,8552B5D9-95ED-E311-96E3-6C3BE5A86DF8}
889     8552B5D9-95ED-E311-96E3-6C3BE5A86DF8      {1C14927F-C28A-E311-A35E-6C3BE5A84FE4,8552B5D9-95ED-E311-96E3-6C3BE5A86DF8}
[NULL] B25520B1-A08B-E011-8DC7-1CC1DEE8AA5F      {B25520B1-A08B-E011-8DC7-1CC1DEE8AA5F}

Expected results would be the same total content, but the content being sorted as specified by the order by. Specifically
 {8552B5D9-95ED-E311-96E3-6C3BE5A86DF8,1C14927F-C28A-E311-A35E-6C3BE5A84FE4} for both telephones ending in 889.


Logo-1.png
Walker Philips
Director of Data Engineering and Analytics
T:  205.443.4617   C:  205.585.7668  
Commercial and Private Banking | Wealth Management | Advisory Services
Timeless Values for a Modern World
www.oakworth.com


Oakworth Capital Bank will never ask for personal or account information by e-mail. To protect yourself from fraud, never divulge sensitive information, such as passwords, account numbers, credit card numbers or your PIN in response to an e-mail. When in doubt, feel free to contact us at 205.263.4700.

This email and any files transmitted with it are confidential and are intended solely for the use of the individual or entity to whom they are addressed. This communication represents the originator's personal views and opinions, which do not necessarily reflect those of Oakworth Capital Bank. If you are not the original recipient or the person responsible for delivering the email to the intended recipient, be advised that you have received this email in error, and that any use, dissemination, forwarding, printing, or copying of this email is strictly prohibited. If you received this email in error, please immediately notify postmaster@oakworth.com.

On Mon, 2023-07-17 at 21:41 +0000, Walker Philips wrote:
> Applying an order by clause changes the contents of an array_agg (and string_agg) when performing a window function.
>
>
select right(a.telephone1,3),a.accountid,array_agg(accountid)over (partition by lower(a.telephone1)order by a.accountid
desc)as hashed_household
> from air_spotter.account a;
>
> generates the results:
> 889      1C14927F-C28A-E311-A35E-6C3BE5A84FE4      {1C14927F-C28A-E311-A35E-6C3BE5A84FE4}
> 889    
 8552B5D9-95ED-E311-96E3-6C3BE5A86DF8      {1C14927F-C28A-E311-A35E-6C3BE5A84FE4,8552B5D9-95ED-E311-96E3-6C3BE5A86DF8}
> [NULL]  B25520B1-A08B-E011-8DC7-1CC1DEE8AA5F      {B25520B1-A08B-E011-8DC7-1CC1DEE8AA5F}
>
> Expected results would be the same total content, but the content being sorted as specified by the order by.
Specifically
>  {8552B5D9-95ED-E311-96E3-6C3BE5A86DF8,1C14927F-C28A-E311-A35E-6C3BE5A84FE4} for both telephones ending in 889.

I don't get it: '8552B5D9-95ED-E311-96E3-6C3BE5A86DF8' is greater than '1C14927F-C28A-E311-A35E-6C3BE5A84FE4'.

Yours,
Laurenz Albe



On Tue, 18 Jul 2023 at 17:38, Walker Philips
<walker.philips@oakworth.com> wrote:
> select  right(a.telephone1,3), a.accountid, array_agg(accountid) over (partition by lower(a.telephone1) order by
a.accountiddesc) as hashed_household 
>
> from air_spotter.account a;
>
>
> generates the results:
> 889      1C14927F-C28A-E311-A35E-6C3BE5A84FE4      {1C14927F-C28A-E311-A35E-6C3BE5A84FE4}
> 889   
8552B5D9-95ED-E311-96E3-6C3BE5A86DF8      {1C14927F-C28A-E311-A35E-6C3BE5A84FE4,8552B5D9-95ED-E311-96E3-6C3BE5A86DF8}
> [NULL]  B25520B1-A08B-E011-8DC7-1CC1DEE8AA5F      {B25520B1-A08B-E011-8DC7-1CC1DEE8AA5F}
>
> select  right(a.telephone1,3), a.accountid, array_agg(accountid) over (partition by lower(a.telephone1)) as
hashed_household
>
> from air_spotter.account a;
>
>
>
> 889   
1C14927F-C28A-E311-A35E-6C3BE5A84FE4      {1C14927F-C28A-E311-A35E-6C3BE5A84FE4,8552B5D9-95ED-E311-96E3-6C3BE5A86DF8}
> 889   
8552B5D9-95ED-E311-96E3-6C3BE5A86DF8      {1C14927F-C28A-E311-A35E-6C3BE5A84FE4,8552B5D9-95ED-E311-96E3-6C3BE5A86DF8}
> [NULL] B25520B1-A08B-E011-8DC7-1CC1DEE8AA5F      {B25520B1-A08B-E011-8DC7-1CC1DEE8AA5F}
>
> Expected results would be the same total content, but the content being sorted as specified by the order by.
Specifically
>  {8552B5D9-95ED-E311-96E3-6C3BE5A86DF8,1C14927F-C28A-E311-A35E-6C3BE5A84FE4} for both telephones ending in 889.

This is intended and how the SQL standard defines how it's meant to
work.  The default frame options for which rows are in frame is RANGE
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.  ORDER BY, when used in
the window clause just defines which rows are peers of each other. If
you don't have an ORDER BY then all rows in the partition are peers of
each other, that's why you see all rows in the partition aggregated in
the latter of your two queries above.  In the first of your queries,
since you've not adjusted the frame options only rows from the start
of the frame (UNBOUNDED PRECEDING) to the current row (CURRENT ROW)
(and the current row's peers, i.e rows with the same value according
to the ORDER BY clause) are in the frame. That's why you see new
values being aggregated as the window advances.

If you want all rows in the partition to be in the frame at once then
you can either leave off the ORDER BY as you've done in the first of
your queries or you can change the frame visibility options to ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. You can also have
an ORDER BY to control the order of aggregation.  I think the
following likely will give you what you want:

select  right(a.telephone1,3), a.accountid, array_agg(accountid) over
(partition by lower(a.telephone1) order by a.accountid desc ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as
hashed_household from air_spotter.account a;

David