Обсуждение: pg_stat_statements query jumbling question
Hi, I have a question on jumbling queries in pg_stat_statements. I found that JumbleRangeTable() uses relation oid in RangeTblEntry. Obviously, this would result different queryid when the table gets re-created (dropped and created). Why don't we use relation name (with looking up the catalog) on query jumbling? For performance reason? Regards, -- NAGAYASU Satoshi <snaga@uptime.jp>
On Mon, Aug 31, 2015 at 8:32 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > Why don't we use relation name (with looking up the catalog) > on query jumbling? For performance reason? I think that there is a good case for preferring this behavior. While it is a little confusing that pg_stat_statements does not change the representative query string, renaming a table does not make it a substantively different table. There is, IIRC, one case where a string is jumbled directly (CTE name). It's usually not the right thing, IMV. -- Peter Geoghegan
On 2015/09/01 12:36, Peter Geoghegan wrote: > On Mon, Aug 31, 2015 at 8:32 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote: >> Why don't we use relation name (with looking up the catalog) >> on query jumbling? For performance reason? > > I think that there is a good case for preferring this behavior. While > it is a little confusing that pg_stat_statements does not change the > representative query string, renaming a table does not make it a > substantively different table. > > There is, IIRC, one case where a string is jumbled directly (CTE > name). It's usually not the right thing, IMV. > Thanks for the comment. I've never considered that. Interesting. From the users point of view, IMHO, it would be better to avoid confusing if queryid is determined by only visible values -- userid, dbid and query string itself. BTW, I'm interested in improving the queryid portability now because I'd like to use it in other extensions. :) That's the reason why I'm looking at query jumbling here. Thoughts? Regards, -- NAGAYASU Satoshi <snaga@uptime.jp>
On Mon, Aug 31, 2015 at 9:29 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > BTW, I'm interested in improving the queryid portability now because > I'd like to use it in other extensions. :) > That's the reason why I'm looking at query jumbling here. Are you interested in having the query fingerprinting/jumbling infrastructure available to all backend code? That seems like a good idea to me generally. I would like to be able to put queryId in log_line_prefix, or to display it within EXPLAIN, and have it available everywhere. I like the idea of per-query log_min_duration_statement settings. If you want to use the queryId field directly, which I recall you mentioning before, then that's harder. There is simply no contract among extensions for "owning" a queryId. But when the fingerprinting code is moved into core, then I think at that point queryId may cease to be even a thing that pg_stat_statements theoretically has the right to write into. Rather, it just asks the core system to do the fingerprinting, and finds it within queryId. At the same time, other extensions may do the same, and don't need to care about each other. Does that work for you? -- Peter Geoghegan
On 2015/09/01 13:41, Peter Geoghegan wrote: > On Mon, Aug 31, 2015 at 9:29 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote: >> BTW, I'm interested in improving the queryid portability now because >> I'd like to use it in other extensions. :) >> That's the reason why I'm looking at query jumbling here. > > Are you interested in having the query fingerprinting/jumbling > infrastructure available to all backend code? That seems like a good > idea to me generally. Yes. I've been working on the sql_firewall extension[1], which is totally built on top of the pg_stat_statements. [1] http://pgsnaga.blogspot.jp/2015/08/postgresql-sql-firewall.html As of today, sql_firewall has duplicated code for query jumbling. But if it goes into the core, it looks fantastic. > I would like to be able to put queryId in > log_line_prefix, or to display it within EXPLAIN, and have it > available everywhere. I like the idea of per-query > log_min_duration_statement settings. Sounds cool. :) > If you want to use the queryId field directly, which I recall you > mentioning before, then that's harder. There is simply no contract > among extensions for "owning" a queryId. But when the fingerprinting > code is moved into core, then I think at that point queryId may cease > to be even a thing that pg_stat_statements theoretically has the right > to write into. Rather, it just asks the core system to do the > fingerprinting, and finds it within queryId. At the same time, other > extensions may do the same, and don't need to care about each other. > > Does that work for you? Yes. I think so. I need some query fingerprint to determine query group. I want queryid to keep the same value when query strings are the same (except literal values). Another reason is just because I need to import/export query ids. Regards, -- NAGAYASU Satoshi <snaga@uptime.jp>
Satoshi Nagayasu <snaga@uptime.jp> writes: > On 2015/09/01 13:41, Peter Geoghegan wrote: >> If you want to use the queryId field directly, which I recall you >> mentioning before, then that's harder. There is simply no contract >> among extensions for "owning" a queryId. But when the fingerprinting >> code is moved into core, then I think at that point queryId may cease >> to be even a thing that pg_stat_statements theoretically has the right >> to write into. Rather, it just asks the core system to do the >> fingerprinting, and finds it within queryId. At the same time, other >> extensions may do the same, and don't need to care about each other. >> >> Does that work for you? > Yes. I think so. > I need some query fingerprint to determine query group. I want queryid > to keep the same value when query strings are the same (except literal > values). The problem I've got with this is the unquestioned assumption that every application for query IDs will have exactly the same requirements for what the ID should include or ignore. regards, tom lane
On 2015/09/01 14:01, Tom Lane wrote: > Satoshi Nagayasu <snaga@uptime.jp> writes: >> On 2015/09/01 13:41, Peter Geoghegan wrote: >>> If you want to use the queryId field directly, which I recall you >>> mentioning before, then that's harder. There is simply no contract >>> among extensions for "owning" a queryId. But when the fingerprinting >>> code is moved into core, then I think at that point queryId may cease >>> to be even a thing that pg_stat_statements theoretically has the right >>> to write into. Rather, it just asks the core system to do the >>> fingerprinting, and finds it within queryId. At the same time, other >>> extensions may do the same, and don't need to care about each other. >>> >>> Does that work for you? > >> Yes. I think so. > >> I need some query fingerprint to determine query group. I want queryid >> to keep the same value when query strings are the same (except literal >> values). > > The problem I've got with this is the unquestioned assumption that every > application for query IDs will have exactly the same requirements for > what the ID should include or ignore. I'm not confident about that too, but at least, I think we will be able to collect most common use cases as of today. (aka best guess. :) And IMHO it would be ok to change the spec in future release. Regards, -- NAGAYASU Satoshi <snaga@uptime.jp>
On 2015/09/01 14:39, Satoshi Nagayasu wrote: > On 2015/09/01 14:01, Tom Lane wrote: >> Satoshi Nagayasu <snaga@uptime.jp> writes: >>> On 2015/09/01 13:41, Peter Geoghegan wrote: >>>> If you want to use the queryId field directly, which I recall you >>>> mentioning before, then that's harder. There is simply no contract >>>> among extensions for "owning" a queryId. But when the fingerprinting >>>> code is moved into core, then I think at that point queryId may cease >>>> to be even a thing that pg_stat_statements theoretically has the right >>>> to write into. Rather, it just asks the core system to do the >>>> fingerprinting, and finds it within queryId. At the same time, other >>>> extensions may do the same, and don't need to care about each other. >>>> >>>> Does that work for you? >> >>> Yes. I think so. >> >>> I need some query fingerprint to determine query group. I want queryid >>> to keep the same value when query strings are the same (except literal >>> values). >> >> The problem I've got with this is the unquestioned assumption that every >> application for query IDs will have exactly the same requirements for >> what the ID should include or ignore. > > I'm not confident about that too, but at least, I think we will be able > to collect most common use cases as of today. (aka best guess. :) > > And IMHO it would be ok to change the spec in future release. I know this still needs to be discussed, but I would like to submit a patch for further discussion at the next CF, 2015-11. Regards, -- NAGAYASU Satoshi <snaga@uptime.jp>
Вложения
On Wed, Sep 2, 2015 at 7:41 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > I know this still needs to be discussed, but I would like to submit > a patch for further discussion at the next CF, 2015-11. I think I already expressed this in my explanation of the current behavior, but to be clear: -1 from me to this proposal. -- Peter Geoghegan
On 2015/10/03 6:18, Peter Geoghegan wrote: > On Wed, Sep 2, 2015 at 7:41 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote: >> I know this still needs to be discussed, but I would like to submit >> a patch for further discussion at the next CF, 2015-11. > > I think I already expressed this in my explanation of the current > behavior, but to be clear: -1 from me to this proposal. I would like to introduce queryId to pgaudit and sql_firewall extensions to determine query groups. queryId could be useful if available in those modules. I think users may want to do that based on object names, because they issue queries with the object names, not the internal object ids. Changing queryId after re-creating the same table may make the user gets confused, because the query string the user issue is not changed. At least, I would like to give some options to be chosen by the user. Is it possible and/or reasonable? Regards, -- NAGAYASU Satoshi <snaga@uptime.jp>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, On 10/10/2015 08:46, Satoshi Nagayasu wrote: > On 2015/10/03 6:18, Peter Geoghegan wrote: >> On Wed, Sep 2, 2015 at 7:41 PM, Satoshi Nagayasu >> <snaga@uptime.jp> wrote: >>> I know this still needs to be discussed, but I would like to >>> submit a patch for further discussion at the next CF, 2015-11. >> >> I think I already expressed this in my explanation of the >> current behavior, but to be clear: -1 from me to this proposal. > > I would like to introduce queryId to pgaudit and sql_firewall > extensions to determine query groups. queryId could be useful if > available in those modules. > > I think users may want to do that based on object names, because > they issue queries with the object names, not the internal object > ids. > > Changing queryId after re-creating the same table may make the > user gets confused, because the query string the user issue is not > changed. > > At least, I would like to give some options to be chosen by the > user. Is it possible and/or reasonable? > I'm also rather sceptical about this change. In any case, the change you propose in this patch is not good, as you only consider the relation name and ignore the relation's schema. You need to also add the namespace name in the jumble state. Also, the documentation would need to be updated, at least on this part: "As a rule of thumb, queryid values can be assumed to be stable and comparable only so long as the underlying server version and catalog metadata details stay exactly the same. Two servers participating in replication based on physical WAL replay can be expected to have identical queryid values for the same query. However, logical replication schemes do not promise to keep replicas identical in all relevant details, so queryid will not be a useful identifier for accumulating costs across a set of logical replicas. If in doubt, direct testing is recommended." Regards. > Regards, - -- Julien Rouhaud http://dalibo.com - http://dalibo.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (GNU/Linux) iQEcBAEBAgAGBQJWNPR+AAoJELGaJ8vfEpOqP3AH/j8Eob8jaXnhXNGsjJ7NyBkc 91pn87iW+FgyRGNH8K7wvOyPnBl3JWjlaIVShdImoph6agn6HhyXqbEceKXKKBS5 7B2fhY0xjhzFrKbQ9NpwxAvnPKA4ChOFsFmN/YtJMZzBEUHMDBcZpixhoGXy6iKB Q44blhmZswXtJ744p3oslUNJX0eKoIZuqxRihVvUJYo/7Ogh0kWje5W4btA+CT/k /IkQcMgUZj8jhyNYqBTaxlgNRJvTmk+iSuieAL6Fjjwq/VhR9QyD+4viKC+q0Nf6 puL74qPvwLzvy3+MXGc0dg+8m+PwZWcDg/pZSka+5EwJaHUv5bjouDZqimHH978= =jlQP -----END PGP SIGNATURE-----
On Sat, Oct 31, 2015 at 10:03 AM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote: >> At least, I would like to give some options to be chosen by the >> user. Is it possible and/or reasonable? >> > > I'm also rather sceptical about this change. Is anyone willing to argue for it, apart from Satoshi? -- Peter Geoghegan
On Sun, Nov 1, 2015 at 2:03 AM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello, > > On 10/10/2015 08:46, Satoshi Nagayasu wrote: >> On 2015/10/03 6:18, Peter Geoghegan wrote: >>> On Wed, Sep 2, 2015 at 7:41 PM, Satoshi Nagayasu >>> <snaga@uptime.jp> wrote: >>>> I know this still needs to be discussed, but I would like to >>>> submit a patch for further discussion at the next CF, 2015-11. >>> >>> I think I already expressed this in my explanation of the >>> current behavior, but to be clear: -1 from me to this proposal. >> >> I would like to introduce queryId to pgaudit and sql_firewall >> extensions to determine query groups. queryId could be useful if >> available in those modules. >> >> I think users may want to do that based on object names, because >> they issue queries with the object names, not the internal object >> ids. >> >> Changing queryId after re-creating the same table may make the >> user gets confused, because the query string the user issue is not >> changed. >> >> At least, I would like to give some options to be chosen by the >> user. Is it possible and/or reasonable? >> > > I'm also rather sceptical about this change. Hm. Thinking a bit about this patch, it presents the advantage to be able to track the same queries easily across systems even if those tables have been created with a different OID. Say for example a production server and a test server, the test server having a bunch of other relations that messed up with OID consistency. So you could compare more easily statistics across those servers with postgres_fdw for example, and I guess that's what Nagayasu-san had in mind. > In any case, the change you propose in this patch is not good, as you > only consider the relation name and ignore the relation's schema. You > need to also add the namespace name in the jumble state. Yep, pg_class ensures the uniqueness of relations using (relname, relnamespace), so we would need as well the namespace. Now, looking at the code of pg_stat_statements the database OID is not used to generate the queryid, right, hence don't we need to add as well the database name of this relation in the set? Also, isn't this patch actually broken if we rename relations and/or its namespace? It seems that we would begin to generate inconsistent query IDs that happens. -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > On Sun, Nov 1, 2015 at 2:03 AM, Julien Rouhaud > <julien.rouhaud@dalibo.com> wrote: >> I'm also rather sceptical about this change. > Hm. Thinking a bit about this patch, it presents the advantage to be > able to track the same queries easily across systems even if those > tables have been created with a different OID. That argument would only hold if *every* use of OIDs in the jumbles were replaced by names --- not only tables, but types, operators, functions, etc. I'm already concerned that the additional name lookups will cost a lot of performance, and I think adding so many more would probably be disastrous. > Also, isn't this patch actually broken if we rename relations and/or > its namespace? Well, it would mean that the query would no longer be considered "the same". You could argue either way whether that's good or bad. But yeah, this approach will break one set of use-cases to enable another set. On the whole, I think my vote is to reject this patch. regards, tom lane
On Sun, Nov 15, 2015 at 1:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> On Sun, Nov 1, 2015 at 2:03 AM, Julien Rouhaud >> <julien.rouhaud@dalibo.com> wrote: >>> I'm also rather sceptical about this change. > >> Hm. Thinking a bit about this patch, it presents the advantage to be >> able to track the same queries easily across systems even if those >> tables have been created with a different OID. > > That argument would only hold if *every* use of OIDs in the jumbles > were replaced by names --- not only tables, but types, operators, > functions, etc. I'm already concerned that the additional name > lookups will cost a lot of performance, and I think adding so many > more would probably be disastrous. Yeah, I was thinking about a GUC switch to change from one mode to another yesterday night before sleeping. Now if there was a patch actually implementing that, and proving that this has no performance impact, well I think that this may be worth considering for integration. But we're far from that >> Also, isn't this patch actually broken if we rename relations and/or >> its namespace? > > Well, it would mean that the query would no longer be considered "the > same". You could argue either way whether that's good or bad. But > yeah, this approach will break one set of use-cases to enable another > set. > > On the whole, I think my vote is to reject this patch. Agreed. It's clear that the patch as-is is not enough. -- Michael
Yes, I Regards PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html