Обсуждение: pg_stat_statements query jumbling question

Поиск

Список

Период

Сортировка

pg_stat_statements query jumbling question

От

Satoshi Nagayasu

Дата:

01 сентября 2015 г., 06:32:39

Hi,

I have a question on jumbling queries in pg_stat_statements.

I found that JumbleRangeTable() uses relation oid in
RangeTblEntry.

Obviously, this would result different queryid when the table
gets re-created (dropped and created).

Why don't we use relation name (with looking up the catalog)
on query jumbling? For performance reason?

Regards,
-- 
NAGAYASU Satoshi <snaga@uptime.jp>

Re: pg_stat_statements query jumbling question

От

Peter Geoghegan

Дата:

01 сентября 2015 г., 06:36:56

On Mon, Aug 31, 2015 at 8:32 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
> Why don't we use relation name (with looking up the catalog)
> on query jumbling? For performance reason?

I think that there is a good case for preferring this behavior. While
it is a little confusing that pg_stat_statements does not change the
representative query string, renaming a table does not make it a
substantively different table.

There is, IIRC, one case where a string is jumbled directly (CTE
name). It's usually not the right thing, IMV.

-- 
Peter Geoghegan

Re: pg_stat_statements query jumbling question

От

Satoshi Nagayasu

Дата:

01 сентября 2015 г., 07:29:38

On 2015/09/01 12:36, Peter Geoghegan wrote:
> On Mon, Aug 31, 2015 at 8:32 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>> Why don't we use relation name (with looking up the catalog)
>> on query jumbling? For performance reason?
>
> I think that there is a good case for preferring this behavior. While
> it is a little confusing that pg_stat_statements does not change the
> representative query string, renaming a table does not make it a
> substantively different table.
>
> There is, IIRC, one case where a string is jumbled directly (CTE
> name). It's usually not the right thing, IMV.
>

Thanks for the comment. I've never considered that. Interesting.
From the users point of view, IMHO, it would be better to avoid
confusing if queryid is determined by only visible values -- userid,
dbid and query string itself.

BTW, I'm interested in improving the queryid portability now because
I'd like to use it in other extensions. :)
That's the reason why I'm looking at query jumbling here.

Thoughts?

Regards,
-- 
NAGAYASU Satoshi <snaga@uptime.jp>

Re: pg_stat_statements query jumbling question

От

Peter Geoghegan

Дата:

01 сентября 2015 г., 07:41:19

On Mon, Aug 31, 2015 at 9:29 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
> BTW, I'm interested in improving the queryid portability now because
> I'd like to use it in other extensions. :)
> That's the reason why I'm looking at query jumbling here.

Are you interested in having the query fingerprinting/jumbling
infrastructure available to all backend code? That seems like a good
idea to me generally. I would like to be able to put queryId in
log_line_prefix, or to display it within EXPLAIN, and have it
available everywhere. I like the idea of per-query
log_min_duration_statement settings.

If you want to use the queryId field directly, which I recall you
mentioning before, then that's harder. There is simply no contract
among extensions for "owning" a queryId. But when the fingerprinting
code is moved into core, then I think at that point queryId may cease
to be even a thing that pg_stat_statements theoretically has the right
to write into. Rather, it just asks the core system to do the
fingerprinting, and finds it within queryId. At the same time, other
extensions may do the same, and don't need to care about each other.

Does that work for you?

-- 
Peter Geoghegan

Re: pg_stat_statements query jumbling question

От

Satoshi Nagayasu

Дата:

01 сентября 2015 г., 07:56:11

On 2015/09/01 13:41, Peter Geoghegan wrote:
> On Mon, Aug 31, 2015 at 9:29 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>> BTW, I'm interested in improving the queryid portability now because
>> I'd like to use it in other extensions. :)
>> That's the reason why I'm looking at query jumbling here.
>
> Are you interested in having the query fingerprinting/jumbling
> infrastructure available to all backend code? That seems like a good
> idea to me generally.

Yes. I've been working on the sql_firewall extension[1], which is
totally built on top of the pg_stat_statements.

[1] http://pgsnaga.blogspot.jp/2015/08/postgresql-sql-firewall.html

As of today, sql_firewall has duplicated code for query jumbling.
But if it goes into the core, it looks fantastic.

> I would like to be able to put queryId in
> log_line_prefix, or to display it within EXPLAIN, and have it
> available everywhere. I like the idea of per-query
> log_min_duration_statement settings.

Sounds cool. :)

> If you want to use the queryId field directly, which I recall you
> mentioning before, then that's harder. There is simply no contract
> among extensions for "owning" a queryId. But when the fingerprinting
> code is moved into core, then I think at that point queryId may cease
> to be even a thing that pg_stat_statements theoretically has the right
> to write into. Rather, it just asks the core system to do the
> fingerprinting, and finds it within queryId. At the same time, other
> extensions may do the same, and don't need to care about each other.
>
> Does that work for you?

Yes. I think so.

I need some query fingerprint to determine query group. I want queryid
to keep the same value when query strings are the same (except literal 
values).

Another reason is just because I need to import/export query ids.

Regards,
-- 
NAGAYASU Satoshi <snaga@uptime.jp>

Re: pg_stat_statements query jumbling question

От

Tom Lane

Дата:

01 сентября 2015 г., 08:01:22

Satoshi Nagayasu <snaga@uptime.jp> writes:
> On 2015/09/01 13:41, Peter Geoghegan wrote:
>> If you want to use the queryId field directly, which I recall you
>> mentioning before, then that's harder. There is simply no contract
>> among extensions for "owning" a queryId. But when the fingerprinting
>> code is moved into core, then I think at that point queryId may cease
>> to be even a thing that pg_stat_statements theoretically has the right
>> to write into. Rather, it just asks the core system to do the
>> fingerprinting, and finds it within queryId. At the same time, other
>> extensions may do the same, and don't need to care about each other.
>> 
>> Does that work for you?

> Yes. I think so.

> I need some query fingerprint to determine query group. I want queryid
> to keep the same value when query strings are the same (except literal 
> values).

The problem I've got with this is the unquestioned assumption that every
application for query IDs will have exactly the same requirements for
what the ID should include or ignore.
        regards, tom lane

Re: pg_stat_statements query jumbling question

От

Satoshi Nagayasu

Дата:

01 сентября 2015 г., 08:39:58

On 2015/09/01 14:01, Tom Lane wrote:
> Satoshi Nagayasu <snaga@uptime.jp> writes:
>> On 2015/09/01 13:41, Peter Geoghegan wrote:
>>> If you want to use the queryId field directly, which I recall you
>>> mentioning before, then that's harder. There is simply no contract
>>> among extensions for "owning" a queryId. But when the fingerprinting
>>> code is moved into core, then I think at that point queryId may cease
>>> to be even a thing that pg_stat_statements theoretically has the right
>>> to write into. Rather, it just asks the core system to do the
>>> fingerprinting, and finds it within queryId. At the same time, other
>>> extensions may do the same, and don't need to care about each other.
>>>
>>> Does that work for you?
> 
>> Yes. I think so.
> 
>> I need some query fingerprint to determine query group. I want queryid
>> to keep the same value when query strings are the same (except literal
>> values).
> 
> The problem I've got with this is the unquestioned assumption that every
> application for query IDs will have exactly the same requirements for
> what the ID should include or ignore.

I'm not confident about that too, but at least, I think we will be able
to collect most common use cases as of today. (aka best guess. :)

And IMHO it would be ok to change the spec in future release.

Regards,
-- 
NAGAYASU Satoshi <snaga@uptime.jp>

Re: pg_stat_statements query jumbling question

От

Satoshi Nagayasu

Дата:

03 сентября 2015 г., 05:41:25

On 2015/09/01 14:39, Satoshi Nagayasu wrote:
> On 2015/09/01 14:01, Tom Lane wrote:
>> Satoshi Nagayasu <snaga@uptime.jp> writes:
>>> On 2015/09/01 13:41, Peter Geoghegan wrote:
>>>> If you want to use the queryId field directly, which I recall you
>>>> mentioning before, then that's harder. There is simply no contract
>>>> among extensions for "owning" a queryId. But when the fingerprinting
>>>> code is moved into core, then I think at that point queryId may cease
>>>> to be even a thing that pg_stat_statements theoretically has the right
>>>> to write into. Rather, it just asks the core system to do the
>>>> fingerprinting, and finds it within queryId. At the same time, other
>>>> extensions may do the same, and don't need to care about each other.
>>>>
>>>> Does that work for you?
>>
>>> Yes. I think so.
>>
>>> I need some query fingerprint to determine query group. I want queryid
>>> to keep the same value when query strings are the same (except literal
>>> values).
>>
>> The problem I've got with this is the unquestioned assumption that every
>> application for query IDs will have exactly the same requirements for
>> what the ID should include or ignore.
>
> I'm not confident about that too, but at least, I think we will be able
> to collect most common use cases as of today. (aka best guess. :)
>
> And IMHO it would be ok to change the spec in future release.

I know this still needs to be discussed, but I would like to submit
a patch for further discussion at the next CF, 2015-11.

Regards,
--
NAGAYASU Satoshi <snaga@uptime.jp>

Вложения

queryid.diff

Re: pg_stat_statements query jumbling question

От

Peter Geoghegan

Дата:

03 октября 2015 г., 00:18:52

On Wed, Sep 2, 2015 at 7:41 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
> I know this still needs to be discussed, but I would like to submit
> a patch for further discussion at the next CF, 2015-11.

I think I already expressed this in my explanation of the current
behavior, but to be clear: -1 from me to this proposal.

-- 
Peter Geoghegan

Re: pg_stat_statements query jumbling question

От

Satoshi Nagayasu

Дата:

10 октября 2015 г., 09:46:53

On 2015/10/03 6:18, Peter Geoghegan wrote:
> On Wed, Sep 2, 2015 at 7:41 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>> I know this still needs to be discussed, but I would like to submit
>> a patch for further discussion at the next CF, 2015-11.
>
> I think I already expressed this in my explanation of the current
> behavior, but to be clear: -1 from me to this proposal.

I would like to introduce queryId to pgaudit and sql_firewall extensions
to determine query groups. queryId could be useful if available in those
modules.

I think users may want to do that based on object names, because they
issue queries with the object names, not the internal object ids.

Changing queryId after re-creating the same table may make the user
gets confused, because the query string the user issue is not changed.

At least, I would like to give some options to be chosen by the user.
Is it possible and/or reasonable?

Regards,
-- 
NAGAYASU Satoshi <snaga@uptime.jp>

Re: pg_stat_statements query jumbling question

От

Julien Rouhaud

Дата:

31 октября 2015 г., 20:04:08

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

On 10/10/2015 08:46, Satoshi Nagayasu wrote:
> On 2015/10/03 6:18, Peter Geoghegan wrote:
>> On Wed, Sep 2, 2015 at 7:41 PM, Satoshi Nagayasu
>> <snaga@uptime.jp> wrote:
>>> I know this still needs to be discussed, but I would like to
>>> submit a patch for further discussion at the next CF, 2015-11.
>> 
>> I think I already expressed this in my explanation of the
>> current behavior, but to be clear: -1 from me to this proposal.
> 
> I would like to introduce queryId to pgaudit and sql_firewall
> extensions to determine query groups. queryId could be useful if
> available in those modules.
> 
> I think users may want to do that based on object names, because
> they issue queries with the object names, not the internal object
> ids.
> 
> Changing queryId after re-creating the same table may make the
> user gets confused, because the query string the user issue is not
> changed.
> 
> At least, I would like to give some options to be chosen by the
> user. Is it possible and/or reasonable?
> 

I'm also rather sceptical about this change.

In any case, the change you propose in this patch is not good, as you
only consider the relation name and ignore the relation's schema. You
need to also add the namespace name in the jumble state.

Also, the documentation would need to be updated, at least on this part:

"As a rule of thumb, queryid values can be assumed to be stable and
comparable only so long as the underlying server version and catalog
metadata details stay exactly the same. Two servers participating in
replication based on physical WAL replay can be expected to have
identical queryid values for the same query. However, logical
replication schemes do not promise to keep replicas identical in all
relevant details, so queryid will not be a useful identifier for
accumulating costs across a set of logical replicas. If in doubt,
direct testing is recommended."

Regards.

> Regards,

- -- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iQEcBAEBAgAGBQJWNPR+AAoJELGaJ8vfEpOqP3AH/j8Eob8jaXnhXNGsjJ7NyBkc
91pn87iW+FgyRGNH8K7wvOyPnBl3JWjlaIVShdImoph6agn6HhyXqbEceKXKKBS5
7B2fhY0xjhzFrKbQ9NpwxAvnPKA4ChOFsFmN/YtJMZzBEUHMDBcZpixhoGXy6iKB
Q44blhmZswXtJ744p3oslUNJX0eKoIZuqxRihVvUJYo/7Ogh0kWje5W4btA+CT/k
/IkQcMgUZj8jhyNYqBTaxlgNRJvTmk+iSuieAL6Fjjwq/VhR9QyD+4viKC+q0Nf6
puL74qPvwLzvy3+MXGc0dg+8m+PwZWcDg/pZSka+5EwJaHUv5bjouDZqimHH978=
=jlQP
-----END PGP SIGNATURE-----

Re: pg_stat_statements query jumbling question

От

Peter Geoghegan

Дата:

04 ноября 2015 г., 05:01:27

On Sat, Oct 31, 2015 at 10:03 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
>> At least, I would like to give some options to be chosen by the
>> user. Is it possible and/or reasonable?
>>
>
> I'm also rather sceptical about this change.

Is anyone willing to argue for it, apart from Satoshi?


-- 
Peter Geoghegan

Re: pg_stat_statements query jumbling question

От

Michael Paquier

Дата:

14 ноября 2015 г., 17:06:03

On Sun, Nov 1, 2015 at 2:03 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello,
>
> On 10/10/2015 08:46, Satoshi Nagayasu wrote:
>> On 2015/10/03 6:18, Peter Geoghegan wrote:
>>> On Wed, Sep 2, 2015 at 7:41 PM, Satoshi Nagayasu
>>> <snaga@uptime.jp> wrote:
>>>> I know this still needs to be discussed, but I would like to
>>>> submit a patch for further discussion at the next CF, 2015-11.
>>>
>>> I think I already expressed this in my explanation of the
>>> current behavior, but to be clear: -1 from me to this proposal.
>>
>> I would like to introduce queryId to pgaudit and sql_firewall
>> extensions to determine query groups. queryId could be useful if
>> available in those modules.
>>
>> I think users may want to do that based on object names, because
>> they issue queries with the object names, not the internal object
>> ids.
>>
>> Changing queryId after re-creating the same table may make the
>> user gets confused, because the query string the user issue is not
>> changed.
>>
>> At least, I would like to give some options to be chosen by the
>> user. Is it possible and/or reasonable?
>>
>
> I'm also rather sceptical about this change.

Hm. Thinking a bit about this patch, it presents the advantage to be
able to track the same queries easily across systems even if those
tables have been created with a different OID. Say for example a
production server and a test server, the test server having a bunch of
other relations that messed up with OID consistency. So you could
compare more easily statistics across those servers with postgres_fdw
for example, and I guess that's what Nagayasu-san had in mind.

> In any case, the change you propose in this patch is not good, as you
> only consider the relation name and ignore the relation's schema. You
> need to also add the namespace name in the jumble state.

Yep, pg_class ensures the uniqueness of relations using (relname,
relnamespace), so we would need as well the namespace. Now, looking at
the code of pg_stat_statements the database OID is not used to
generate the queryid, right, hence don't we need to add as well the
database name of this relation in the set?

Also, isn't this patch actually broken if we rename relations and/or
its namespace? It seems that we would begin to generate inconsistent
query IDs that happens.
-- 
Michael

Re: pg_stat_statements query jumbling question

От

Tom Lane

Дата:

14 ноября 2015 г., 19:35:05

Michael Paquier <michael.paquier@gmail.com> writes:
> On Sun, Nov 1, 2015 at 2:03 AM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>> I'm also rather sceptical about this change.

> Hm. Thinking a bit about this patch, it presents the advantage to be
> able to track the same queries easily across systems even if those
> tables have been created with a different OID.

That argument would only hold if *every* use of OIDs in the jumbles
were replaced by names --- not only tables, but types, operators,
functions, etc.  I'm already concerned that the additional name
lookups will cost a lot of performance, and I think adding so many
more would probably be disastrous.

> Also, isn't this patch actually broken if we rename relations and/or
> its namespace?

Well, it would mean that the query would no longer be considered "the
same".  You could argue either way whether that's good or bad.  But
yeah, this approach will break one set of use-cases to enable another
set.

On the whole, I think my vote is to reject this patch.
        regards, tom lane

Re: pg_stat_statements query jumbling question

От

Michael Paquier

Дата:

15 ноября 2015 г., 14:27:40

On Sun, Nov 15, 2015 at 1:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Sun, Nov 1, 2015 at 2:03 AM, Julien Rouhaud
>> <julien.rouhaud@dalibo.com> wrote:
>>> I'm also rather sceptical about this change.
>
>> Hm. Thinking a bit about this patch, it presents the advantage to be
>> able to track the same queries easily across systems even if those
>> tables have been created with a different OID.
>
> That argument would only hold if *every* use of OIDs in the jumbles
> were replaced by names --- not only tables, but types, operators,
> functions, etc.  I'm already concerned that the additional name
> lookups will cost a lot of performance, and I think adding so many
> more would probably be disastrous.

Yeah, I was thinking about a GUC switch to change from one mode to
another yesterday night before sleeping. Now if there was a patch
actually implementing that, and proving that this has no performance
impact, well I think that this may be worth considering for
integration. But we're far from that

>> Also, isn't this patch actually broken if we rename relations and/or
>> its namespace?
>
> Well, it would mean that the query would no longer be considered "the
> same".  You could argue either way whether that's good or bad.  But
> yeah, this approach will break one set of use-cases to enable another
> set.
>
> On the whole, I think my vote is to reject this patch.

Agreed. It's clear that the patch as-is is not enough.
-- 
Michael

Re: pg_stat_statements query jumbling question

От

legrand legrand

Дата:

11 сентября 2018 г., 00:32:36

Yes, I

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: pg_stat_statements query jumbling question

Вложения