Обсуждение: Query much slower from php than psql or dbeaver

Поиск
Список
Период
Сортировка

Query much slower from php than psql or dbeaver

От
Ekaterina Amez
Дата:
Hi,

After receiving an Unknown Address error with pgsql-php@postgresql.org I've discovered this mailing list is catalogued as Inactive, so I'm sending my question to this list.

I've made a php cli script that downloads a file from FTP, loads it in a table and compares against the same table in 2 different databases that are connected to the main one with pgsql foreign data wrapper (so there are 3 DB involved via linked servers).

I've tested the query with psql and DBeaver and it takes only milliseconds: it returns 39 records and now there's only 16000 records on the table but I've tested it with <100K. When I've tested my php script the same query takes 14 minutes to return (more or less). I've checked there are no blocking processes while runnig this query, I've analyzed the table, dropped and recreated, added indexes... and every time I run my script it takes those 14 minutes.

The script opens the connection and after that makes all of the operations: some checks, load the file with COPY sentence, move data with some additional columns to final table, and after this and without dropping the connection it runs this query. After this long query, script iterates over the returned records to make some other queries for every record. And finally there are 2 more queries, that makes the reverse search: the slow query checks diferences or non-existing records from file data in the 2 linked databases, and the las 2 queries check non-existing records from their tables in the file. Everything goes smooth except this long-lasting query.

This is the query:
select t1.aaaa as maindb_aaaa, t1.bbb as maindb_bbb, t1.ccccc as maindb_ccccc, t1.timestamp_create as maindb_create,
    t1.timestamp_closed as maindb_close, t1.ddddddddd as maindb_ddddddddd, null::text as db1_sth,
    t2.eeeeeeee as db1_eeeeeeee, t2.ffffffff as db1_ffffffff, null::text as db2_sth,
    t3.eeeeeeee as db2_eeeeeeee, t3.ffffffff as db2_ffffffff
from table1 t1
left join database1_fdw.table2 t2 on t1.aaaa = t2.btatpd_aaaa
    and t1.file = 'file_name.csv'
    and t2.btatpd_fecha = '20220119120000'
    and substring(t1.bbb from 1 for 3) in (<some_values>)
left join database2_fdw.table2 t3 on t1.aaaa = t3.btatpd_aaaa
    and t1.fichero_origen = 'file_name.csv'
    and t3.btatpd_fecha = '20220119120000'
    and substring(t1.bbb from 1 for 3) in (<some_different_values)
where t1.ccccc = 'ACTIVE'
and t1.fichero_origen = 'file_name.csv'
and ((t2.eeeeeeee is null and t3.eeeeeeee is null)
    or
    (t2.eeeeeeee is not null and t1.ddddddddd <> t2.ffffffff)
    or
    (t3.eeeeeeee is not null and t1.ddddddddd <> t3.ffffffff)
    )
order by t1.bbb nulls last;

I'm using PHP 5.3.3, with PG 9.6 over CentOS 6.8. I'm connecting to PG with pg_connect and pg_query because here nobody is using PDO. And I have to warn you that I'm not a PHP expert. Also I'm aware how old these versions are, I'm trying to finish things to start with server migration.

This is the explain plan, in case it gives more info:
Sort  (cost=5345.39..5385.35 rows=15986 width=212) (actual time=112.315..112.318 rows=39 loops=1)
  Output: t1.aaaa, t1.bbb, t1.ccccc, t1.timestamp_create, t1.timestamp_closed, t1.ddddddddd, NULL::text, t2.eeeeeeee, t2.ffffffff, NULL::text, t3.eeeeeeee, t3.ffffffff
  Sort Key: t1.bbb
  Sort Method: quicksort  Memory: 29kB
  Buffers: shared hit=1255
  ->  Hash Left Join  (cost=237.54..2587.70 rows=15986 width=212) (actual time=81.272..112.248 rows=39 loops=1)
        Output: t1.aaaa, t1.bbb, t1.ccccc, t1.timestamp_create, t1.timestamp_closed, t1.ddddddddd, NULL::text, t2.eeeeeeee, t2.ffffffff, NULL::text, t3.eeeeeeee, t3.ffffffff
        Hash Cond: ((t1.aaaa)::text = (t3.btatpd_aaaa)::text)
        Join Filter: (((t1.file)::text = 'file_name.csv'::text) AND ("substring"((t1.bbb)::text, 1, 3) = ANY ('{<some_values>}'::text[])))
        Rows Removed by Join Filter: 1
        Filter: (((t2.eeeeeeee IS NULL) AND (t3.eeeeeeee IS NULL)) OR ((t2.eeeeeeee IS NOT NULL) AND (t1.ddddddddd <> t2.ffffffff)) OR ((t3.eeeeeeee IS NOT NULL) AND (t1.ddddddddd <> t3.ffffffff)))
        Rows Removed by Filter: 15794
        Buffers: shared hit=1252
        ->  Hash Left Join  (cost=118.77..2408.88 rows=15988 width=149) (actual time=71.890..101.261 rows=15820 loops=1)
              Output: t1.aaaa, t1.bbb, t1.ccccc, t1.timestamp_create, t1.timestamp_closed, t1.ddddddddd, t1.file, t2.eeeeeeee, t2.ffffffff
              Hash Cond: ((t1.aaaa)::text = (t2.btatpd_aaaa)::text)
              Join Filter: (((t1.file)::text = 'file_name.csv'::text) AND ("substring"((t1.bbb)::text, 1, 3) = ANY ('{<some_different_values>}'::text[])))
              Buffers: shared hit=1252
              ->  Seq Scan on public.table1 t1  (cost=0.00..2230.06 rows=15988 width=103) (actual time=0.176..19.882 rows=15817 loops=1)
                    Output: t1.id, t1.aaaa, t1.bbb, t1.ccccc, t1.timestamp_create, t1.timestamp_closed, t1.sbd_bundle_id, t1.ddddddddd, t1.file, t1.fecha_carga
                    Filter: (((t1.file)::text = 'file_name.csv'::text) AND ((t1.ccccc)::text = 'ACTIVE'::text))
                    Rows Removed by Filter: 49387
                    Buffers: shared hit=1252
              ->  Hash  (cost=118.73..118.73 rows=3 width=94) (actual time=71.699..71.699 rows=14244 loops=1)
                    Output: t2.eeeeeeee, t2.ffffffff, t2.btatpd_aaaa
                    Buckets: 16384 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 1074kB
                    ->  Foreign Scan on database1_fdw.table2 t2  (cost=100.00..118.73 rows=3 width=94) (actual time=3.961..67.271 rows=14244 loops=1)
                          Output: t2.eeeeeeee, t2.ffffffff, t2.btatpd_aaaa
                          Remote SQL: SELECT eeeeeeee, btatpd_aaaa, ffffffff FROM public.table2 WHERE ((btatpd_fecha = '20220119120000'::text))
        ->  Hash  (cost=118.73..118.73 rows=3 width=94) (actual time=6.512..6.513 rows=3051 loops=1)
              Output: t3.eeeeeeee, t3.ffffffff, t3.btatpd_aaaa
              Buckets: 4096 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 235kB
              ->  Foreign Scan on database2_fdw.table2 t3  (cost=100.00..118.73 rows=3 width=94) (actual time=1.494..5.857 rows=3051 loops=1)
                    Output: t3.eeeeeeee, t3.ffffffff, t3.btatpd_aaaa
                    Remote SQL: SELECT eeeeeeee, btatpd_aaaa, ffffffff FROM public.table2 WHERE ((btatpd_fecha = '20220119120000'::text))
Planning time: 2.137 ms
Execution time: 123.077 ms


Anyone can tell me why is this happening and if is there a solution to this?

Thank you for your time.

--

Ekaterina Amez González

ekaterina.amez@zunibal.com

 

 

ZUNIBAL | Idorsolo 1, 48160-Derio, Spain

Tel: +34 944 977 010 | Fax: +34 944 522 81| www.zunibal.com

 

Advertencia Legal. El contenido de este mensaje y de toda la documentación anexa es confidencial y va dirigido únicamente al destinatario del mismo. Si usted no fuera el destinatario le solicitamos que nos informe y no comunique su contenido a terceros, procediendo a su destrucción. / Legal advice. This message contains confidential information for the exclusive use of the recipient. Any unauthorised disclosure, use or dissemination, either whole or partial, is prohibited. If you are not the intended recipient of the message, please notify the sender as soon as possible. En cumplimiento de la Ley Orgánica 15/1999 de Protección de Datos de Carácter Personal, le informamos que los Datos Personales de usted que están en nuestra Base de Datos recabados con su consentimiento, forman parte de un fichero automatizado registrado en la Agencia Española de Protección de Datos. Estos datos sólo serán utilizados para realizar una correcta gestión de nuestra relación comercial. Si lo desea podrá ejercitar en todo momento los derechos de acceso, cancelación u oposición, remitiendo un correo electrónico a esta dirección. / According to the Law 15/1999 of Personal Data Protectionwe inform your data are included in our data base with your approval are registered in the spanish Agency of Data ProtectionThese data are only used for our usual business relationship. If you want you are in the right of accessing or cancelling them, just sending an e-mail to this address.

 

Antes de imprimir este mensajeasegúrese de que es necesario hacerlo. / Before printing this email, assess if it is really needed.

Re: Query much slower from php than psql or dbeaver

От
Tom Lane
Дата:
Ekaterina Amez <ekaterina.amez@zunibal.com> writes:
> I've tested the query with psql and DBeaver and it takes only milliseconds:
> it returns 39 records and now there's only 16000 records on the table but
> I've tested it with <100K. When I've tested my php script the same query
> takes 14 minutes to return (more or less).

A plausible theory is that it's not really the same query, but differs
in having some values presented as parameters not literal constants.
Depending on exactly how you do it, that can confuse the planner
leading to a poor execution plan and a long runtime.

I'd first try enabling log_statement on the server to see if the
query is really being presented exactly the same way.  Another
thing worth trying is auto_explain, to capture the plans actually
being used.

            regards, tom lane



Re: Query much slower from php than psql or dbeaver

От
Michael Lewis
Дата:
When dealing with foreign tables, I believe planning is not the same because of access to statistics (maybe has improved since 9.6 though). I just wonder... Would it be a viable option to create a materialized view using the FDW but then use the PHP script against the local tables only? Materialized views are not maintained automatically, but you have local statistics and can create indexes. Just a thought in case the data is not changing constantly and this might fit the need.

Also, it seems like perhaps the foreign queries might be more targeted if some data was encouraged to be pre-computed. What would be the expected row count from just table1?

Note- your explain plan doesn't mention "fichero_origen" field name. Is that just a copy/paste error?


with cte_interesting_t1_rows_precomputed AS materialized(
select
t1.*,
substring(t1.bbb from 1 for 3) in (<some_values>) AND t1.file = 'file_name.csv' AS needs_t2,
substring(t1.bbb from 1 for 3) in (<some_different_values>) AS needs_t3
FROM
table1 t1
where t1.ccccc = 'ACTIVE'
and t1.fichero_origen = 'file_name.csv'
)

 select t1.aaaa as maindb_aaaa, t1.bbb as maindb_bbb, t1.ccccc as maindb_ccccc, t1.timestamp_create as maindb_create,
    t1.timestamp_closed as maindb_close, t1.ddddddddd as maindb_ddddddddd, null::text as db1_sth,
    t2.eeeeeeee as db1_eeeeeeee, t2.ffffffff as db1_ffffffff, null::text as db2_sth,
    t3.eeeeeeee as db2_eeeeeeee, t3.ffffffff as db2_ffffffff
from cte_interesting_t1_rows_precomputed AS t1
left join database1_fdw.table2 AS t2 on t1.aaaa = t2.btatpd_aaaa and t2.btatpd_fecha = '20220119120000' AND needs_t2
left join database2_fdw.table2 AS t3 on t1.aaaa = t3.btatpd_aaaa and t3.btatpd_fecha = '20220119120000' AND needs_t3
where
    (t2.eeeeeeee is null and t3.eeeeeeee is null)
    or
    (t2.eeeeeeee is not null and t1.ddddddddd <> t2.ffffffff)
    or
    (t3.eeeeeeee is not null and t1.ddddddddd <> t3.ffffffff)
order by t1.bbb nulls last;

Re: Query much slower from php than psql or dbeaver

От
Ekaterina Amez
Дата:


El vie, 21 ene 2022 a las 5:04, Michael Lewis (<mlewis@entrata.com>) escribió:
When dealing with foreign tables, I believe planning is not the same because of access to statistics (maybe has improved since 9.6 though). I just wonder... Would it be a viable option to create a materialized view using the FDW but then use the PHP script against the local tables only? Materialized views are not maintained automatically, but you have local statistics and can create indexes. Just a thought in case the data is not changing constantly and this might fit the need.

Also, it seems like perhaps the foreign queries might be more targeted if some data was encouraged to be pre-computed. What would be the expected row count from just table1?

Note- your explain plan doesn't mention "fichero_origen" field name. Is that just a copy/paste error?

It's a column renaming mistake.
t1.fichero_origen = t1.file (it's the column where incoming file name is stored).




with cte_interesting_t1_rows_precomputed AS materialized(
select
t1.*,
substring(t1.bbb from 1 for 3) in (<some_values>) AND t1.file = 'file_name.csv' AS needs_t2,
substring(t1.bbb from 1 for 3) in (<some_different_values>) AS needs_t3
FROM
table1 t1
where t1.ccccc = 'ACTIVE'
and t1.fichero_origen = 'file_name.csv'
)

 select t1.aaaa as maindb_aaaa, t1.bbb as maindb_bbb, t1.ccccc as maindb_ccccc, t1.timestamp_create as maindb_create,
    t1.timestamp_closed as maindb_close, t1.ddddddddd as maindb_ddddddddd, null::text as db1_sth,
    t2.eeeeeeee as db1_eeeeeeee, t2.ffffffff as db1_ffffffff, null::text as db2_sth,
    t3.eeeeeeee as db2_eeeeeeee, t3.ffffffff as db2_ffffffff
from cte_interesting_t1_rows_precomputed AS t1
left join database1_fdw.table2 AS t2 on t1.aaaa = t2.btatpd_aaaa and t2.btatpd_fecha = '20220119120000' AND needs_t2
left join database2_fdw.table2 AS t3 on t1.aaaa = t3.btatpd_aaaa and t3.btatpd_fecha = '20220119120000' AND needs_t3
where
    (t2.eeeeeeee is null and t3.eeeeeeee is null)
    or
    (t2.eeeeeeee is not null and t1.ddddddddd <> t2.ffffffff)
    or
    (t3.eeeeeeee is not null and t1.ddddddddd <> t3.ffffffff)
order by t1.bbb nulls last;

I'm doing some tests after reading carefully your answer and Tom's, and I'll post back.
Thank you both.