Обсуждение: Improve pg_stat_statements by making jumble handle savepoint names better

Поиск
Список
Период
Сортировка

Improve pg_stat_statements by making jumble handle savepoint names better

От
Greg Sabino Mullane
Дата:
Please find attached a patch to jumble savepoint name, to prevent certain transactional commands from filling up pg_stat_statements.
This has been a problem with some busy systems that use django, which likes to wrap everything in uniquely named savepoints. Soon, over 50% of
your pg_stat_statements buffer is filled with savepoint stuff, pushing out the more useful queries. As each query is unique, it looks like this:

postgres=# select calls, query, queryid from pg_stat_statements where query ~ 'save|release|rollback' order by 2;

 calls |                    query                     |       queryid
-------+----------------------------------------------+----------------------
     1 | release b900150983cd24fb0d6963f7d28e17f7     |  8797482500264589878
     1 | release ed9407630eb1000c0f6b63842defa7de     | -9206510099095862114
     1 | rollback                                     | -2049453941623996126
     1 | rollback to c900150983cd24fb0d6963f7d28e17f7 | -5335832667999552746
     1 | savepoint b900150983cd24fb0d6963f7d28e17f7   | -1888817254996647181
     1 | savepoint c47bce5c74f589f4867dbd57e9ca9f80   |   355123032993044571
     1 | savepoint c900150983cd24fb0d6963f7d28e17f7   | -5921314469994822125
     1 | savepoint d8f8e0260c64418510cefb2b06eee5cd   |  -981090856656063578
     1 | savepoint ed9407630eb1000c0f6b63842defa7de   |   -25952890433218603

As the actual name of the savepoint is not particularly useful, the patch will basically ignore the savepoint name and allow things to be collapsed:

 calls |     query      |       queryid
-------+----------------+----------------------
     2 | release $1     | -7998168840889089775
     1 | rollback       |  3749380189022910195
     1 | rollback to $1 | -1816677871228308673
     5 | savepoint $1   |  6160699978368237767

Without the patch, the only solution is to keep raising pg_stat_statements.max to larger and larger values to compensate for the pollution of the
statement pool.

Cheers,
Greg

Вложения

Re: Improve pg_stat_statements by making jumble handle savepoint names better

От
Michael Paquier
Дата:
On Mon, Jul 24, 2023 at 04:09:23PM -0400, Greg Sabino Mullane wrote:
> Without the patch, the only solution is to keep raising
> pg_stat_statements.max to larger and larger values to compensate for the
> pollution of the
> statement pool.

Yes, that can be painful depending on your workload.

    bool        chain;          /* AND CHAIN option */
+   int     location;       /* token location, or -1 if unknown */
 } TransactionStmt;
[...]
+       if ($f eq 'savepoint_name') {
+           print $jff "\tJUMBLE_LOCATION(location);\n";
+           next;
+       }

Shouldn't this new field be marked as query_jumble_location instead of
forcing the perl script to do that?  Or is there something in the
structure of TransactionStmt that makes the move difficult because of
the other transaction commands supported?

Testing for new query patterns is very important for such patches.
Could you add something in pg_stat_statements's utility.sql?  I think
that you should check the compilations of at least two savepoints with
different names to see that they compile the same query ID, for a
bunch of patterns in the grammar, say:
BEGIN;
SAVEPOINT p1;
ROLLBACK TO SAVEPOINT p1;
ROLLBACK TRANSACTION TO SAVEPOINT p1;
RELEASE SAVEPOINT p1;
SAVEPOINT p2;
ROLLBACK TO SAVEPOINT p2;
ROLLBACK TRANSACTION TO SAVEPOINT p2;
RELEASE SAVEPOINT p2;
COMMIT;
--
Michael

Вложения

Re: Improve pg_stat_statements by making jumble handle savepoint names better

От
Greg Sabino Mullane
Дата:
On Mon, Jul 24, 2023 at 6:46 PM Michael Paquier <michael@paquier.xyz> wrote:
Shouldn't this new field be marked as query_jumble_location
 
Yes, it should. I had some trouble getting it to work that way in the first place, but now I realize it was just my unfamiliarity with this part of the code. So thanks for the hint: v2 of the patch is much simplified by adding two attributes to theTransactionStmt node. I've also added some tests per your suggestion.

Unrelated to this patch, I'm struggling with meson testing. Why doesn't this update the postgres test binary?:

meson test --suite pg_stat_statements

It runs "ninja" as expected, but it does not put a new 
build/tmp_install/home/greg/pg/17/bin/postgres in place until I do a "meson test"

Cheers,
Greg

Вложения

Re: Improve pg_stat_statements by making jumble handle savepoint names better

От
Michael Paquier
Дата:
On Tue, Jul 25, 2023 at 12:37:18PM -0400, Greg Sabino Mullane wrote:
> Yes, it should. I had some trouble getting it to work that way in the first
> place, but now I realize it was just my unfamiliarity with this part of the
> code. So thanks for the hint: v2 of the patch is much simplified by adding
> two attributes to theTransactionStmt node. I've also added some tests per
> your suggestion.

-   char       *savepoint_name; /* for savepoint commands */
+   char       *savepoint_name  pg_node_attr(query_jumble_ignore);
    char       *gid;            /* for two-phase-commit related commands */
    bool        chain;          /* AND CHAIN option */
+   int     location            pg_node_attr(query_jumble_location);
 } TransactionStmt;

You got that right.  Applying query_jumble_ignore makes sure that we
don't finish with different query IDs for different savepoint names.

> Unrelated to this patch, I'm struggling with meson testing. Why doesn't
> this update the postgres test binary?:
>
> meson test --suite pg_stat_statements

Yes, it should do so, I assume.  I've seen that myself.  That's
contrary to what make check does, but perhaps things are better
integrated this way with meson.

I think that I'm OK with your proposal as savepoint names are in
defined places in these queries (contrary to some of the craziness
with BEGIN and the node structure of TransactionStmt, for example).

Has somebody an opinion or a comment to share?
--
Michael

Вложения

Re: Improve pg_stat_statements by making jumble handle savepoint names better

От
Michael Paquier
Дата:
On Wed, Jul 26, 2023 at 07:53:02AM +0900, Michael Paquier wrote:
> I think that I'm OK with your proposal as savepoint names are in
> defined places in these queries (contrary to some of the craziness
> with BEGIN and the node structure of TransactionStmt, for example).
>
> Has somebody an opinion or a comment to share?

Well, on second look this is really simple, so I've applied that after
adding back some comments that ought to be and simplifying a bit the
tests (less queries, same coverage).
--
Michael

Вложения