BUG #18125: Ordered set aggregate results (MD5 hashing) vary between postgres versions

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #18125: Ordered set aggregate results (MD5 hashing) vary between postgres versions
Дата
Msg-id 18125-73d02632bed248bb@postgresql.org
обсуждение исходный текст
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18125
Logged by:          Richard Wollhofen
Email address:      wollhofen@unisoftwareplus.com
PostgreSQL version: 16.0
Operating system:   Debian 16.0-1, Red Hat 8.5.0-10, Red Hat 8.5.0-18
Description:

I tried to implement a validation script, to compare table data before and
after an upgrade from Postgres 13.5 to Postgres 15.4. Following the idea
described by Hans-Jürgen Schönig here
https://www.cybertec-postgresql.com/en/postgresql-creating-checksums-for-tables/
I tried to generate an MD5 hash for every device_id in my raw data table.
However, the hashes after the upgrade showed differences (compared to the
hashes created before the upgrade) for around half the devices, while the
raw data was identical. Please find below a minimum example to show the
behavior. This script has to be executed with different Postgres versions,
the grouped hashes will show different results for different major versions.
Interestingly, the hashes are identical for versions 11,12,13,14. Tested
major versions on dbfiddle: 10 - 16.
However, this can only be observed, when the row number of the table is >=
79022, below this number the hashes are equal across versions. It is also
independent of the number of groups in the last query.

Link to test script
https://dbfiddle.uk/AxZPWLur

-- different hashes for Postgres 11-14 vs 15 vs 16
select version();

-- md5 aggregate sfunc
CREATE FUNCTION md5_agg_sfunc(text, anyelement)
       RETURNS text
       LANGUAGE sql
AS
$$
  SELECT md5($1 || $2::text)
$$;

-- md5 aggregate function
CREATE AGGREGATE md5_agg (ORDER BY anyelement)
(
  STYPE = text,
  SFUNC = md5_agg_sfunc,
  INITCOND = ''
);

-- test data. hashes change for a row number >= 79022 
CREATE TABLE t_test (a int, b int);
INSERT INTO t_test
       SELECT   x, x + 10
       FROM     generate_series(1, 79022) AS x;

-- whole table hash H1
SELECT md5_agg() WITHIN GROUP (ORDER BY t_test ASC)
FROM t_test;

-- ordered set grouped hashes differ across postgres versions
-- grouped hash H2, all in one group: H2 identical with H1 for PG version >=
15
SELECT
  a % 1 AS grp,
  md5_agg() WITHIN GROUP (ORDER BY t_test)
FROM t_test
GROUP BY a % 1;

-- hashes change for different postgres versions
SELECT
  a % 3 AS grp,
  md5_agg() WITHIN GROUP (ORDER BY t_test)
FROM t_test
GROUP BY a % 3;


В списке pgsql-bugs по дате отправления:

Предыдущее
От: "刘相(佑熙)"
Дата:
Сообщение: 回复:Re: BUG #18118: bug report for COMMIT AND CHAIN feature
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18126: Ordered set aggregate: result does not depend on sort order