Обсуждение: refreshing query id for pg_stat_statements based on comment in sql

Поиск
Список
Период
Сортировка

refreshing query id for pg_stat_statements based on comment in sql

От
Zhihong Yu
Дата:
Hi,
Currently the query id for pg_stat_statements gets calculated based on the parse nodes specifics. 
This means that the user cannot add a comment to a SQL query to test something. (though some other RDBMS allows this practice).

Consider this use case: for query q, admin looks at stats and performs some optimization (without changing the query). Admin adds / modifies the comment for q - now the query becomes q'. If query id doesn't change, there still would be one row in pg_stat_statements which makes it difficult to gauge the effectiveness of the tuning.

I want to get opinion from the community whether adding / changing comment in SQL query should result in new query id for pg_stat_statements.

Cheers

Re: refreshing query id for pg_stat_statements based on comment in sql

От
Bruce Momjian
Дата:
On Mon, Mar  7, 2022 at 09:42:26AM -0800, Zhihong Yu wrote:
> Hi,
> Currently the query id for pg_stat_statements gets calculated based on the
> parse nodes specifics. 
> This means that the user cannot add a comment to a SQL query to test something.
> (though some other RDBMS allows this practice).
> 
> Consider this use case: for query q, admin looks at stats and performs some
> optimization (without changing the query). Admin adds / modifies the comment
> for q - now the query becomes q'. If query id doesn't change, there still would
> be one row in pg_stat_statements which makes it difficult to gauge the
> effectiveness of the tuning.
> 
> I want to get opinion from the community whether adding / changing comment in
> SQL query should result in new query id for pg_stat_statements.

Uh, we don't have a parse node for comments, and I didn't think comments
were part of the query id, and my testing confirms that:

    psql -c "SET log_statement = 'all'" -c "select pg_sleep(10000) -- test1;" test
    psql -c "SET log_statement = 'all'" -c "select pg_sleep(10000) -- test2;" test

shows the comment in the logs:


    2022-03-07 19:02:19.509 EST [1075860] LOG:  statement: select pg_sleep(10000) -- test1;
    2022-03-07 19:02:24.389 EST [1075860] ERROR:  canceling statement due to user request
    2022-03-07 19:02:24.389 EST [1075860] STATEMENT:  select pg_sleep(10000) -- test1;
    2022-03-07 19:02:27.029 EST [1075893] LOG:  statement: select pg_sleep(10000) -- test2;
    2022-03-07 19:02:47.915 EST [1075893] ERROR:  canceling statement due to user request
    2022-03-07 19:02:47.915 EST [1075893] STATEMENT:  select pg_sleep(10000) -- test2;

and I see the same query_id for both:

    test=> select query, query_id from pg_stat_activity;
                         query                     |       query_id
    -----------------------------------------------+----------------------
                                                   |
                                                   |
-->     select pg_sleep(10000) -- test1;              |  2920433178127795318
     select query, query_id from pg_stat_activity; | -8032661921273433383
                                                   |
                                                   |
                                                   |
    (7 rows)
    
    test=> select query, query_id from pg_stat_activity;
                         query                     |       query_id
    -----------------------------------------------+----------------------
                                                   |
                                                   |
-->     select pg_sleep(10000) -- test2;              |  2920433178127795318
     select query, query_id from pg_stat_activity; | -8032661921273433383

I think you need to show us the problem you are having.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: refreshing query id for pg_stat_statements based on comment in sql

От
Julien Rouhaud
Дата:
Hi,

On Mon, Mar 07, 2022 at 09:42:26AM -0800, Zhihong Yu wrote:
> Hi,
> Currently the query id for pg_stat_statements gets calculated based on the
> parse nodes specifics.
> This means that the user cannot add a comment to a SQL query to test
> something. (though some other RDBMS allows this practice).
>
> Consider this use case: for query q, admin looks at stats and performs some
> optimization (without changing the query). Admin adds / modifies the
> comment for q - now the query becomes q'. If query id doesn't change, there
> still would be one row in pg_stat_statements which makes it difficult to
> gauge the effectiveness of the tuning.
>
> I want to get opinion from the community whether adding / changing comment
> in SQL query should result in new query id for pg_stat_statements.

Are you talking about optimizer hint with something like pg_hint_plan, or just
random comment like "/* we now added index blabla */ SELECT ..."?

If the former, then such an extension can already provide its own queryid
generator which can chose to ignore part or all of the comments or not.

If the latter, then it seems shortsighted to me.  At the very least not all
application can be modified to have a specific comment attached to a query.

Also, if you want check how a query if performing after doing some
modifications, you should start with some EXPLAIN ANALYZE first (or even a
simple EXPLAIN if you want to validate some new index using hypothetical
indexes).  If this is some more general change (e.g. shared_buffers,
work_mem...) then the whole system is going to perform differently, and you
certainly won't add a new comment to every single query executed.

So again it seems to me that doing pg_stat_statement snapshots and comparing
the diff between each to see how the whole workload, or specific queries, is
behaving is still the best answer here.