Обсуждение: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7

Поиск
Список
Период
Сортировка

Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7

От
"Daniel Westermann (DWE)"
Дата:
Hi,

I am not sure if this qualifies as  bug, but anyway:

Source instance: PostgreSQL 13.7 on RHEL 7.9
Target instance PostgreSQL 13.7 on RHEL 8.7

This is the statement:
SELECT * FROM "rsu_adm"."data_2d_clb_global_product" f2
LEFT OUTER JOIN "rsu_adm"."clb_global_product" f1 on f1.cprd=f2.cprd
WHERE f1.cprd is null;

Per default we see a merge anti join, and this gives results, which is wrong:

rsup1=# SELECT * FROM "rsu_adm"."data_2d_clb_global_product" f2
LEFT OUTER JOIN "rsu_adm"."clb_global_product" f1 on f1.cprd=f2.cprd
WHERE f1.cprd is null;
     cprd      | xtc_id |  rprd   | prdgalsts_id |   dlz_last_transaction_ts    | cprd | xtc_id | rprd | prdgalsts_id |
dlz_last_tran

---------------+--------+---------+--------------+------------------------------+------+--------+------+--------------+--------------
 0027033       |        |  179722 |            1 | 2023-04-03 06:15:09.45135+02 |      |        |      |              |

 0112113       |        | 3199208 |            1 | 2023-04-03 06:15:09.45135+02 |      |        |      |              |

 0116713       |        | 2071012 |            1 | 2023-04-03 06:15:09.45135+02 |      |        |      |              |

 0116953       |        | 2070136 |            1 | 2023-04-03 06:15:09.45135+02 |      |        |      |              |

...


                                                                                                                  QUERY
PLAN                                                                                                                  

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Anti Join  (cost=100.84..67203.45 rows=50713 width=122)
   Output: f2.cprd, f2.xtc_id, f2.rprd, f2.prdgalsts_id, f2.dlz_last_transaction_ts, clb_global_product.cprd,
clb_global_product.xtc_id,clb_global_product.rprd, clb_global_product.prdgalsts_id,
clb_global_product.dlz_last_transaction_ts
   Merge Cond: ((f2.cprd)::text = (clb_global_product.cprd)::text)
   ->  Index Scan using data_2d_clb_global_product_pkey on rsu_adm.data_2d_clb_global_product f2  (cost=0.42..2898.56
rows=101426width=34) 
         Output: f2.cprd, f2.xtc_id, f2.rprd, f2.prdgalsts_id, f2.dlz_last_transaction_ts
   ->  Foreign Scan on ro_dlz.clb_global_product  (cost=100.42..52506.16 rows=923613 width=34)
         Output: clb_global_product.cprd, clb_global_product.xtc_id, clb_global_product.rprd,
clb_global_product.prdgalsts_id,clb_global_product.dlz_last_transaction_ts 
         Remote SQL: SELECT cprd, xtc_id, rprd, prdgalsts_id, dlz_last_transaction_ts FROM ro_rsu.clb_global_product
ORDERBY cprd ASC NULLS LAST 
(8 rows)


Disabling merge join gives the correct result:
rsup1=# set enable_mergejoin = off;
SET
rsup1=# explain verbose SELECT * FROM "rsu_adm"."data_2d_clb_global_product" f2
LEFT OUTER JOIN "rsu_adm"."clb_global_product" f1 on f1.cprd=f2.cprd
WHERE f1.cprd is null;
                                                                                                                  QUERY
PLAN                                                                                                                  

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Anti Join  (cost=60274.55..681118.72 rows=50713 width=122)
   Output: f2.cprd, f2.xtc_id, f2.rprd, f2.prdgalsts_id, f2.dlz_last_transaction_ts, clb_global_product.cprd,
clb_global_product.xtc_id,clb_global_product.rprd, clb_global_product.prdgalsts_id,
clb_global_product.dlz_last_transaction_ts
   Hash Cond: ((f2.cprd)::text = (clb_global_product.cprd)::text)
   ->  Seq Scan on rsu_adm.data_2d_clb_global_product f2  (cost=0.00..1768.26 rows=101426 width=34)
         Output: f2.cprd, f2.xtc_id, f2.rprd, f2.prdgalsts_id, f2.dlz_last_transaction_ts
   ->  Hash  (cost=41513.39..41513.39 rows=923613 width=34)
         Output: clb_global_product.cprd, clb_global_product.xtc_id, clb_global_product.rprd,
clb_global_product.prdgalsts_id,clb_global_product.dlz_last_transaction_ts 
         ->  Foreign Scan on ro_dlz.clb_global_product  (cost=100.00..41513.39 rows=923613 width=34)
               Output: clb_global_product.cprd, clb_global_product.xtc_id, clb_global_product.rprd,
clb_global_product.prdgalsts_id,clb_global_product.dlz_last_transaction_ts 
               Remote SQL: SELECT cprd, xtc_id, rprd, prdgalsts_id, dlz_last_transaction_ts FROM
ro_rsu.clb_global_product
(10 rows)

rsup1=# SELECT * FROM "rsu_adm"."data_2d_clb_global_product" f2
LEFT OUTER JOIN "rsu_adm"."clb_global_product" f1 on f1.cprd=f2.cprd
WHERE f1.cprd is null;
 cprd | xtc_id | rprd | prdgalsts_id | dlz_last_transaction_ts | cprd | xtc_id | rprd | prdgalsts_id |
dlz_last_transaction_ts 

------+--------+------+--------------+-------------------------+------+--------+------+--------------+-------------------------
(0 rows)

This is the server definition:
rsup1=# \des+
                                                                                       List of foreign servers
  Name   |  Owner   | Foreign-data wrapper | Access privileges | Type | Version |
     FDW options                                              | Description  

---------+----------+----------------------+-------------------+------+---------+------------------------------------------------------------------------------------------------------+-------------
 tgt_srv | postgres | postgres_fdw         |                   |      |         | (host '192.168.100.245', dbname
'dlzp1',port '5432', use_remote_estimate 'true', fetch_size '5000') |  
(1 row)

I am aware that the version of glibc is not the same between those red hats. Is this expected?

Thanks in advance
Daniel






Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7

От
Tom Lane
Дата:
"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:
> I am not sure if this qualifies as  bug, but anyway:

> Source instance: PostgreSQL 13.7 on RHEL 7.9
> Target instance PostgreSQL 13.7 on RHEL 8.7

> This is the statement:
> SELECT * FROM "rsu_adm"."data_2d_clb_global_product" f2
> LEFT OUTER JOIN "rsu_adm"."clb_global_product" f1 on f1.cprd=f2.cprd
> WHERE f1.cprd is null;

> Per default we see a merge anti join, and this gives results, which is wrong:

You didn't provide anything useful like the table schemas, but
correctness of a merge join depends on the servers having the same
ideas about sort ordering, and if "cprd" is a text-type column then
inconsistent collations could break that.

The given plan is at hazard for that because it intends to do
one sort locally and the other remotely:

>    Merge Cond: ((f2.cprd)::text = (clb_global_product.cprd)::text)
>    ->  Index Scan using data_2d_clb_global_product_pkey on rsu_adm.data_2d_clb_global_product f2  (cost=0.42..2898.56
rows=101426width=34) 
>          Output: f2.cprd, f2.xtc_id, f2.rprd, f2.prdgalsts_id, f2.dlz_last_transaction_ts
>    ->  Foreign Scan on ro_dlz.clb_global_product  (cost=100.42..52506.16 rows=923613 width=34)
>          Output: clb_global_product.cprd, clb_global_product.xtc_id, clb_global_product.rprd,
clb_global_product.prdgalsts_id,clb_global_product.dlz_last_transaction_ts 
>          Remote SQL: SELECT cprd, xtc_id, rprd, prdgalsts_id, dlz_last_transaction_ts FROM ro_rsu.clb_global_product
ORDERBY cprd ASC NULLS LAST 

> I am aware that the version of glibc is not the same between those red hats. Is this expected?

That's certainly a hazard, but do the servers even have the same
collation settings for these columns?

            regards, tom lane



Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7

От
"Daniel Westermann (DWE)"
Дата:
>You didn't provide anything useful like the table schemas, but
>correctness of a merge join depends on the servers having the same
>deas about sort ordering, and if "cprd" is a text-type column then
>inconsistent collations could break that.

rsup1=# \d "rsu_adm"."data_2d_clb_global_product"
                    Table "rsu_adm.data_2d_clb_global_product"
        Column          |           Type           | Collation | Nullable | Default  
-------------------------+--------------------------+-----------+----------+---------
cprd                    | character varying        |           | not null |  
xtc_id                  | numeric                  |           |          |  
rprd                    | numeric                  |           |          |  
prdgalsts_id            | integer                  |           |          |  
dlz_last_transaction_ts | timestamp with time zone |           |          |  
Indexes:
   "data_2d_clb_global_product_pkey" PRIMARY KEY, btree (cprd)
   "data_2d_clb_global_product_idx4" btree (dlz_last_transaction_ts)


rsup1=# \d  "rsu_adm"."clb_global_product"
                          View "rsu_adm.clb_global_product"
        Column          |            Type             | Collation | Nullable | Default  
-------------------------+-----------------------------+-----------+----------+---------
cprd                    | character varying           |           |          |  
xtc_id                  | numeric(10,0)               |           |          |  
rprd                    | numeric(10,0)               |           |          |  
prdgalsts_id            | numeric(10,0)               |           |          |  
dlz_last_transaction_ts | timestamp(5) with time zone |           |          |  


Both instances use the same collation;

rsup1=# \l
                                 List of databases
  Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges    
-----------+----------+----------+-------------+-------------+-----------------------
postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |  
rsup1     | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |  
template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
          |          |          |             |             | postgres=CTc/postgres
template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
          |          |          |             |             | postgres=CTc/postgres
(4 rows)


dlzp1=# \l
                                 List of databases
  Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges    
-----------+----------+----------+-------------+-------------+-----------------------
dlzp1     | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |  
postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |  
template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
          |          |          |             |             | postgres=CTc/postgres
template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
          |          |          |             |             | postgres=CTc/postgres
(4 rows)


>The given plan is at hazard for that because it intends to do
>one sort locally and the other remotely:

Remote is a view:

dlzp1=# \d ro_rsu.clb_global_product
                           View "ro_rsu.clb_global_product"
        Column          |            Type             | Collation | Nullable | Default  
-------------------------+-----------------------------+-----------+----------+---------
cprd                    | character varying           |           |          |  
xtc_id                  | numeric(10,0)               |           |          |  
rprd                    | numeric(10,0)               |           |          |  
prdgalsts_id            | numeric(10,0)               |           |          |  
dlz_last_transaction_ts | timestamp(5) with time zone |           |          |


dlzp1=# select definition from pg_views where viewname = 'clb_global_product';
              definition                
----------------------------------------
 SELECT clb_flbgpr.cprd,              +
    clb_flbgpr.xtc_id,                +
    clb_flbgpr.rprd,                  +
    clb_flbgpr.prdgalsts_id,          +
    clb_flbgpr.dlz_last_transaction_ts+
   FROM dlz_clb.clb_flbgpr;
(1 row)

dlzp1=# \d dlz_clb.clb_flbgpr
                              Table "dlz_clb.clb_flbgpr"
        Column          |            Type             | Collation | Nullable | Default  
-------------------------+-----------------------------+-----------+----------+---------
cprd                    | character varying           |           | not null |  
fp_lwr_celsius          | numeric(3,0)                |           |          |  
fp_lwr_farenheit        | numeric(3,0)                |           |          |  
fp_last_upd             | date                        |           |          |  
win_indicator           | numeric(6,0)                |           |          |  
gpr_theme               | character varying(6)        |           |          |  
comparison_symbol       | character(1)                |           |          |  
cprd_as_previous        | character varying           |           |          |  
emp_user_id_as_pcfm     | character varying           |           |          |  
xtc_id                  | numeric(10,0)               |           |          |  
gpr_oral_care           | character(1)                |           |          |  
bunit_id                | numeric(10,0)               |           |          |  
remark_id               | numeric(10,0)               |           |          |  
win_them_indicator      | numeric(6,0)                |           |          |  
plr_id                  | numeric(10,0)               |           |          |  
stability_status        | character(1)                |           |          |  
cmr_frm_tested          | character(1)                |           |          |  
rprd                    | numeric(10,0)               |           |          |  
like_level_id           | numeric(10,0)               |           |          |  
lhpr_id                 | numeric(10,0)               |           |          |  
like_perf_lvl_id        | numeric(10,0)               |           |          |  
lhdncpr_id              | numeric(10,0)               |           |          |  
like_hdnc_lvl_id        | numeric(10,0)               |           |          |  
prdgalsts_id            | numeric(10,0)               |           |          |  
oil_stock_sts           | character(1)                |           |          |  
archive_status          | character(2)                |           |          |  
dlz_last_transaction_ts | timestamp(5) with time zone |           |          |  
Indexes:
   "clb_flbgpr_pk" PRIMARY KEY, btree (cprd)
   "clb_flbgpr_lst_trn" btree (dlz_last_transaction_ts)
Triggers:
   clb_flbgpr_ins BEFORE INSERT ON dlz_clb.clb_flbgpr FOR EACH ROW EXECUTE FUNCTION dlz_adm.fct_trg_setup_transaction_ts()
   clb_flbgpr_upd BEFORE UPDATE ON dlz_clb.clb_flbgpr FOR EACH ROW WHEN (old.* IS DISTINCT FROM new.*) EXECUTE FUNCTION dlz_adm.fct
_trg_setup_transaction_ts()


Regards
Daniel


Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7

От
Tom Lane
Дата:
"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:
> Both instances use the same collation;

Yeah, doesn't look like you've made any configuration mistakes.
So either the two OSes sort differently, or there's index corruption
causing the indexscan to give bogus output.

The sample data you showed seemed to only involve numeric-ish strings,
which would be highly unlikely to change sort order across locale
updates.  But maybe there are weirder entries elsewhere in the column?

Anyway, the first thing I'd try is reindexing both tables --- doesn't
look like they're large enough to make that painful.  If that doesn't
fix it you must have a collation difference.  (Asking both systems
for a sorted dump of their cprd columns could help confirm that.)
You could probably hack around that, if an OS update isn't feasible,
by labelling the foreign table's column with some collation you aren't
using anywhere else in the local database.

            regards, tom lane



Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7

От
"Daniel Westermann (DWE)"
Дата:
>Yeah, doesn't look like you've made any configuration mistakes.
>So either the two OSes sort differently, or there's index corruption
>causing the indexscan to give bogus output.

We did a rebuild of the indexes and also vacuum full, just to be sure. Did not change anything.

>The sample data you showed seemed to only involve numeric-ish strings,
>which would be highly unlikely to change sort order across locale
>updates.  But maybe there are weirder entries elsewhere in the column?

I can probably provide a dump, but I've to ask. Would that help?

>Anyway, the first thing I'd try is reindexing both tables --- doesn't
>look like they're large enough to make that painful.  If that doesn't
>fix it you must have a collation difference.  (Asking both systems
>for a sorted dump of their cprd columns could help confirm that.)
>You could probably hack around that, if an OS update isn't feasible,
>by labelling the foreign table's column with some collation you aren't
>using anywhere else in the local database.

I'll try do that tomorrow.

Thanks
Daniel






Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7

От
Jim Mlodgenski
Дата:


On Wed, Apr 5, 2023 at 2:41 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:
> I am not sure if this qualifies as  bug, but anyway:

> Source instance: PostgreSQL 13.7 on RHEL 7.9
> Target instance PostgreSQL 13.7 on RHEL 8.7


glibc on those 2 versions of RHEL have very different ideas of what the sort order should be. Try running the following and you'll likely see different results on RHEL 7.9 vs 8.7

CREATE TABLE t1 (c1 varchar PRIMARY KEY);
INSERT INTO t1 VALUES ('1-a'), ('1a'), ('1-aa');
SELECT * FROM t1 ORDER BY c1;

Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7

От
Tom Lane
Дата:
"Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com> writes:
>> The sample data you showed seemed to only involve numeric-ish strings,
>> which would be highly unlikely to change sort order across locale
>> updates.  But maybe there are weirder entries elsewhere in the column?

> I can probably provide a dump, but I've to ask. Would that help?

I wasn't volunteering to check the data for you ;-), just suggesting
that you might find it interesting to do so yourself.

            regards, tom lane



Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7

От
Joe Conway
Дата:
On 4/5/23 15:46, Jim Mlodgenski wrote:
> 
> 
> On Wed, Apr 5, 2023 at 2:41 PM Tom Lane <tgl@sss.pgh.pa.us 
> <mailto:tgl@sss.pgh.pa.us>> wrote:
> 
>     "Daniel Westermann (DWE)" <daniel.westermann@dbi-services.com
>     <mailto:daniel.westermann@dbi-services.com>> writes:
>      > I am not sure if this qualifies as  bug, but anyway:
> 
>      > Source instance: PostgreSQL 13.7 on RHEL 7.9
>      > Target instance PostgreSQL 13.7 on RHEL 8.7
> 
> 
> glibc on those 2 versions of RHEL have very different ideas of what the 
> sort order should be. Try running the following and you'll likely see 
> different results on RHEL 7.9 vs 8.7
> 
> CREATE TABLE t1 (c1 varchar PRIMARY KEY);
> INSERT INTO t1 VALUES ('1-a'), ('1a'), ('1-aa');
> SELECT * FROM t1 ORDER BY c1;

Yep -- RHEL 7.9 is glibc 2.17 (likely -326) and 8.7 is glibc 2.28 (-211 
seems to be latest), and they are well known to sort differently even 
for "common" characters (e.g. "-")

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Wrong results with postgres_fdw and merge anti join from RHEL 7.9 to RHEL 8.7

От
"Daniel Westermann (DWE)"
Дата:
>>
>>
>> glibc on those 2 versions of RHEL have very different ideas of what the
>> sort order should be. Try running the following and you'll likely see
>> different results on RHEL 7.9 vs 8.7
>>
>> CREATE TABLE t1 (c1 varchar PRIMARY KEY);
>> INSERT INTO t1 VALUES ('1-a'), ('1a'), ('1-aa');
>> SELECT * FROM t1 ORDER BY c1;

>Yep -- RHEL 7.9 is glibc 2.17 (likely -326) and 8.7 is glibc 2.28 (-211
>seems to be latest), and they are well known to sort differently even
>for "common" characters (e.g. "-")

Got it, thank you all for your help.

Regards
Daniel