Обсуждение: Re: [PERFORM] Query much slower when run from postgres function
Tom Lane <tgl 'at' sss.pgh.pa.us> writes: > Mario Splivalo <mario.splivalo@megafon.hr> writes: >> Now I'm confused, why is 'sql' function much slower than 'direct' SELECT? > > Usually the reason for this is that the planner chooses a different plan > when it has knowledge of the particular value you are searching for than > when it does not. Yes, and since Mario is coming from JDBC, I'll share my part on this: I also noticed some very wrong plans in JDBC because of the "optimization" in prepared statements consisting of planning once for all runs, e.g. without any parameter values to help planning. My understanding is that practically, it's difficult for the planner to opt for an index (or not) because the selectivity of a parameter value may be much different when the actual value changes. Normally, the planner "thinks" that planning is so costly that it's better to plan once for all runs, but practically for our use, this is very wrong (it may be very good for some uses, though it would be interesting to know the actual uses share). Until it's possible to specifically tell the JDBC driver (and/or PG?) to not plan once for all runs (or is there something better to think of?), or the whole thing would be more clever (off the top of my head, PG could try to replan with the first actual values - or first xx actual values - and if the plan is different, then flag that prepared statement for replanning each time if the overall time estimate is different enough), I've opted to tell the JDBC driver to use the protocol version 2, as prepared statements were not so much prepared back then (IIRC parameter interpolation is performed in driver and the whole SQL query is passed each time, parsed, and planned) using protocolVersion=2 in the JDBC URL. So far it worked very well for us. -- Guillaume Cottenceau
On Mon, Mar 9, 2009 at 5:51 PM, Guillaume Cottenceau <gc@mnc.ch> wrote: > Until it's possible to specifically tell the JDBC driver (and/or > PG?) to not plan once for all runs (or is there something better > to think of?), or the whole thing would be more clever (off the > top of my head, PG could try to replan with the first actual > values - or first xx actual values - and if the plan is > different, then flag that prepared statement for replanning each > time if the overall time estimate is different enough), I've > opted to tell the JDBC driver to use the protocol version 2, as > prepared statements were not so much prepared back then (IIRC > parameter interpolation is performed in driver and the whole SQL > query is passed each time, parsed, and planned) using > protocolVersion=2 in the JDBC URL. So far it worked very well for > us. Unnamed prepared statements are planned after binding the values, starting with 8.3, or more precisely starting with 8.3.2 as early 8.3 versions were partially broken on this behalf. It's not always possible to use protocol version 2 as it's quite limited (especially considering the exceptions returned). -- Guillaume
Guillaume Smet <guillaume.smet@gmail.com> writes: > Unnamed prepared statements are planned after binding the values, > starting with 8.3, or more precisely starting with 8.3.2 as early 8.3 > versions were partially broken on this behalf. No, 8.2 did it too (otherwise we wouldn't have considered 8.3.0 to be broken...). The thing I'm not too clear about is what "use of an unnamed statement" translates to for a JDBC user. regards, tom lane
Tom Lane schrieb: > Guillaume Smet <guillaume.smet@gmail.com> writes: >> Unnamed prepared statements are planned after binding the values, >> starting with 8.3, or more precisely starting with 8.3.2 as early 8.3 >> versions were partially broken on this behalf. > > No, 8.2 did it too (otherwise we wouldn't have considered 8.3.0 to be > broken...). The thing I'm not too clear about is what "use of an > unnamed statement" translates to for a JDBC user. > > regards, tom lane > I followed another post in the PHP List. Andrew McMillan was talking about his experiences with udf's in Oracle and PG (--> look for subject: Re: [PHP] pl/php for windows). He was writing that, by using udf's, the planner sometimes uses strange and not performant plans. So generally I understood that using udf's is a good idea - compared with the work I have to do when I code that e.g in PHP and also compared to the better resulting performance with udf's. So what is your experience with using udf's (plpgsql)? Is there something like "use it in this case but not in that case"? Your answers are very welcome ... Cheers Andy
Guillaume Cottenceau wrote: >>> Now I'm confused, why is 'sql' function much slower than 'direct' SELECT? >> Usually the reason for this is that the planner chooses a different plan >> when it has knowledge of the particular value you are searching for than >> when it does not. > > Yes, and since Mario is coming from JDBC, I'll share my part on > this: I also noticed some very wrong plans in JDBC because of the > "optimization" in prepared statements consisting of planning once > for all runs, e.g. without any parameter values to help planning. > For what is worth: When I call postgres function via JDBC, I have almost the same execution time as when calling function from psql. When I call SELECT COUNT(*)... WHERE... query from JDBC, I again have almost the same execution time as when executing query from psql. Postgres function takes around 200ms, and SELECT query takes around 2-4ms. Mike
On Mon, Mar 9, 2009 at 1:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Tom,
The driver will use unnamed statements for all statements until it sees the same statement N times where N is 5 I believe, after that it uses a named statement.
Dave
Guillaume Smet <guillaume.smet@gmail.com> writes:No, 8.2 did it too (otherwise we wouldn't have considered 8.3.0 to be
> Unnamed prepared statements are planned after binding the values,
> starting with 8.3, or more precisely starting with 8.3.2 as early 8.3
> versions were partially broken on this behalf.
broken...). The thing I'm not too clear about is what "use of an
unnamed statement" translates to for a JDBC user.
Tom,
The driver will use unnamed statements for all statements until it sees the same statement N times where N is 5 I believe, after that it uses a named statement.
Dave
> > The driver will use unnamed statements for all statements until it > sees the same statement N times where N is 5 I believe, after that it > uses a named statement. > > Shame there's no syntax for it to pass the a table of the parameters to the server when it creates the named statement as planner hints. James
- And how do you do that from JDBC? There is no standard concept of ‘unnamed’ prepared statements in most database APIs, and if there were the behavior would be db specific. Telling PG to plan after binding should be more flexible than unnamed prepared statements — or at least more transparent to standard APIs. E.g. SET plan_prepared_postbind=’true’.
- How do you use those on a granularity other than global from jdbc? ( — I tried setting max_prepared_transactions to 0 but this didn’t seem to work either, and it would be global if it did). Prepared statements are still great for things like selecting off a primary key, or especially inserts. Controls at the connection or user level would be significantly more valuable than global ones.
- Is it possible to test them from psql? (documentation is weak, PREPARE requires a name, functions require names, etc .. C api has docs but that’s not of use for most).
I’d love to know if there were answers to the above that were workable.
In the end, we had to write our own client side code to deal with sql injection safely and avoid jdbc prepared statements to get acceptable performance in many cases (all cases involving partitioned tables, a few others). At least dollar-quotes are powerful and useful for dealing with this. Since the most important benefit of prepared statements is code clarity and sql injection protection, its sad to see weakness in control/configuration over prepared statement behavior at the parse/plan level get in the way of using them for those benefits.
On 3/9/09 9:04 AM, "Guillaume Smet" <guillaume.smet@gmail.com> wrote:
On Mon, Mar 9, 2009 at 5:51 PM, Guillaume Cottenceau <gc@mnc.ch> wrote:
> Until it's possible to specifically tell the JDBC driver (and/or
> PG?) to not plan once for all runs (or is there something better
> to think of?), or the whole thing would be more clever (off the
> top of my head, PG could try to replan with the first actual
> values - or first xx actual values - and if the plan is
> different, then flag that prepared statement for replanning each
> time if the overall time estimate is different enough), I've
> opted to tell the JDBC driver to use the protocol version 2, as
> prepared statements were not so much prepared back then (IIRC
> parameter interpolation is performed in driver and the whole SQL
> query is passed each time, parsed, and planned) using
> protocolVersion=2 in the JDBC URL. So far it worked very well for
> us.
Unnamed prepared statements are planned after binding the values,
starting with 8.3, or more precisely starting with 8.3.2 as early 8.3
versions were partially broken on this behalf.
It's not always possible to use protocol version 2 as it's quite
limited (especially considering the exceptions returned).
--
Guillaume
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
Dave Cramer wrote: > > > On Mon, Mar 9, 2009 at 1:16 PM, Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > > Guillaume Smet <guillaume.smet@gmail.com > <mailto:guillaume.smet@gmail.com>> writes: > > Unnamed prepared statements are planned after binding the values, > > starting with 8.3, or more precisely starting with 8.3.2 as early 8.3 > > versions were partially broken on this behalf. > > No, 8.2 did it too (otherwise we wouldn't have considered 8.3.0 to be > broken...). The thing I'm not too clear about is what "use of an > unnamed statement" translates to for a JDBC user. > > > Tom, > > The driver will use unnamed statements for all statements until it sees > the same statement N times where N is 5 I believe, after that it uses a > named statement. Right, with the caveat that "the same statement" means "exactly the same PreparedStatement object". If you happen to run the same (textual) query via two different PreparedStatement objects, they're still considered different queries for the purposes of this threshold. You can also tune the threshold via the prepareThreshold parameter in the driver URL, or use org.postgresql.PGStatement.setPrepareThreshold (an extension interface implemented by the driver on its Statement objects) on a per-statement basis. prepareThreshold=0 is a special value that means "never use a named statement". The idea behind the threshold is that if a PreparedStatement object is reused, that's a fairly good indication that the application wants to run the same query many times with different parameters (since it's going to the trouble of preserving the statement object for reuse). But it's all tunable if needed. Also see http://jdbc.postgresql.org/documentation/head/server-prepare.html -O
Scott Carey wrote: > > 1. And how do you do that from JDBC? There is no standard concept of > ‘unnamed’ prepared statements in most database APIs, and if there > were the behavior would be db specific. Telling PG to plan after > binding should be more flexible than unnamed prepared statements — > or at least more transparent to standard APIs. E.g. SET > plan_prepared_postbind=’true’. I've suggested that as a protocol-level addition in the past, but it would mean a new protocol version. The named vs. unnamed statement behaviour was an attempt to crowbar it into the protocol without requiring a version change. If it's really a planner behaviour thing, maybe it does belong at the SET level, but I believe that there's usually an aversion to having to SET anything per query to get reasonable plans. > 2. How do you use those on a granularity other than global from jdbc? prepareThreshold=N (as part of a connection URL), org.postgresql.PGConnection.setPrepareThreshold() (connection-level granularity), org.postgresql.PGStatement.setPrepareThreshold() (statement-level granularity). See the driver docs. > ( — I tried setting max_prepared_transactions to 0 but this > didn’t seem to work either, and it would be global if it did). max_prepared_transactions is to do with two-phase commit, not prepared statements. > In the end, we had to write our own client side code to deal with sql > injection safely and avoid jdbc prepared statements to get acceptable > performance in many cases (all cases involving partitioned tables, a few > others). At least dollar-quotes are powerful and useful for dealing > with this. Since the most important benefit of prepared statements is > code clarity and sql injection protection, its sad to see weakness in > control/configuration over prepared statement behavior at the parse/plan > level get in the way of using them for those benefits. It's unfortunate that ended up doing this, because it >is< all configurable on the JDBC side. Did you ask on pgsql-jdbc? -O
On 3/9/09 1:40 PM, "Oliver Jowett" <oliver@opencloud.com> wrote:
There’s a strong aversion, but I find myself re-writing queries to get good plans, a de-facto hint really. Its mandatory in the land of partitioned tables and large aggregates, much more rare elsewhere. I have a higher aversion to rewriting queries then telling the planner to use more information or to provide it with more information.
I’m still not sure that unnamed prepared statements will help my case. If the driver is using unnamed prepared statements for the first 5 uses of a query then naming it, I should see the first 5 uses significantly faster than those after. I’ll keep an eye out for that in the places where we are still using prepared statements that can cause problems and in the old log files. Until another issue comes up, there isn’t sufficient motivation to fix what is no longer broken for us.
Thanks for the good info on dealing with configuring unnamed prepared statements with the jdbc driver. That may come in very handy later.
Scott Carey wrote:
>
> 1. And how do you do that from JDBC? There is no standard concept of
I've suggested that as a protocol-level addition in the past, but it
would mean a new protocol version. The named vs. unnamed statement
behaviour was an attempt to crowbar it into the protocol without
requiring a version change. If it's really a planner behaviour thing,
maybe it does belong at the SET level, but I believe that there's
usually an aversion to having to SET anything per query to get
reasonable plans.
There’s a strong aversion, but I find myself re-writing queries to get good plans, a de-facto hint really. Its mandatory in the land of partitioned tables and large aggregates, much more rare elsewhere. I have a higher aversion to rewriting queries then telling the planner to use more information or to provide it with more information.
I know I’ve tried the connection URL thing one time and that did not fix the performance problem. I did not know if it was user error. Without knowing how to trace what the query really was or if the setting was working properly, or having any other easy avenue to see if an unnamed prepared statement even fixed my problem, I had to resort to what would clearly fix it (there was only 1 day to fix it, and there was one proven way to fix it). I would love to be able to try out an unnamed prepared statement in psql, to prove that it even works to solve the query planning issue or not. In the end, it was simpler to change the code and probably less time consuming than all the options other than the connection URL setting.
> 2. How do you use those on a granularity other than global from jdbc?
prepareThreshold=N (as part of a connection URL),
org.postgresql.PGConnection.setPrepareThreshold() (connection-level
granularity), org.postgresql.PGStatement.setPrepareThreshold()
(statement-level granularity). See the driver docs.
Thanks! Good to know, the configuration documentation could be more clear... I got the two prepares confused.
> ( — I tried setting max_prepared_transactions to 0 but this
> didn’t seem to work either, and it would be global if it did).
max_prepared_transactions is to do with two-phase commit, not prepared
statements.
I searched the archives, and did find a reference to the connection URL setting and recall trying that but not seeing the expected result. Rather than debugging, a decision was made to go with the solution that worked and be done with it. This was also when we were in production on 8.3.1 or 8.3.2 or so, so the bugs there might have caused some confusion in the rush to solve the issue.
> In the end, we had to write our own client side code to deal with sql
> injection safely and avoid jdbc prepared statements to get acceptable
It's unfortunate that ended up doing this, because it >is< all
configurable on the JDBC side. Did you ask on pgsql-jdbc?
-O
I’m still not sure that unnamed prepared statements will help my case. If the driver is using unnamed prepared statements for the first 5 uses of a query then naming it, I should see the first 5 uses significantly faster than those after. I’ll keep an eye out for that in the places where we are still using prepared statements that can cause problems and in the old log files. Until another issue comes up, there isn’t sufficient motivation to fix what is no longer broken for us.
Thanks for the good info on dealing with configuring unnamed prepared statements with the jdbc driver. That may come in very handy later.
Oliver Jowett <oliver 'at' opencloud.com> writes: > The idea behind the threshold is that if a PreparedStatement object is > reused, that's a fairly good indication that the application wants to > run the same query many times with different parameters (since it's > going to the trouble of preserving the statement object for reuse). But Or it may just need the safeness of driver/database parameter "interpolation", to get a "free" efficient safeguard against SQL injection. As for myself, I have found no other way to obtain driver/database parameter interpolation. So sometimes I use prepared statements even for running a query only once. I am unsure it is a widely used pattern, but SQL injection being quite important to fight against, I think I may not be the only one. -- Guillaume Cottenceau
Guillaume Cottenceau wrote: > Oliver Jowett <oliver 'at' opencloud.com> writes: > >> The idea behind the threshold is that if a PreparedStatement object is >> reused, that's a fairly good indication that the application wants to >> run the same query many times with different parameters (since it's >> going to the trouble of preserving the statement object for reuse). But > > Or it may just need the safeness of driver/database parameter > "interpolation", to get a "free" efficient safeguard against SQL > injection. In which case, the application usually throws the PreparedStatement object away after executing it once, and the threshold is never reached. As I said, the application has to do extra work to preserve exactly the same PreparedStatement object for reuse before the threshold applies, at which point it's reasonable to assume that it could be a performance-sensitive query that would benefit from preserving the query plan and avoiding parse/plan costs on every execution. It's just a heuristic because there *is* a tradeoff and many/most applications are not going to be customized specifically to know about that tradeoff. And it's configurable because the tradeoff is not the same in every case. Do you have a suggestion for a better way to decide when to use a named statement? -O
Oliver Jowett <oliver 'at' opencloud.com> writes: > Guillaume Cottenceau wrote: >> Oliver Jowett <oliver 'at' opencloud.com> writes: >> >>> The idea behind the threshold is that if a PreparedStatement object is >>> reused, that's a fairly good indication that the application wants to >>> run the same query many times with different parameters (since it's >>> going to the trouble of preserving the statement object for reuse). But >> >> Or it may just need the safeness of driver/database parameter >> "interpolation", to get a "free" efficient safeguard against SQL >> injection. > > In which case, the application usually throws the PreparedStatement > object away after executing it once, and the threshold is never reached. > As I said, the application has to do extra work to preserve exactly the > same PreparedStatement object for reuse before the threshold applies, at > which point it's reasonable to assume that it could be a > performance-sensitive query that would benefit from preserving the query > plan and avoiding parse/plan costs on every execution. Thanks for the clarification! That may just be me, but I see two issues here: first, parsing and planning are tied together, but parsing should be always done first time only as I see no point in reparsing in subsequent uses of the PreparedStatement?; second, it's still questionable that a "performance-sensitive" query should mean benefiting from preserving the query plan: I have seen dramatic use cases where the preserved query plan opted for a seqscan and then the query was orders of magnitude slower than it should because the actual then used values would have qualified for an indexscan. > It's just a heuristic because there *is* a tradeoff and many/most > applications are not going to be customized specifically to know about > that tradeoff. And it's configurable because the tradeoff is not the > same in every case. Yes, and it's well documented, actually. I obviously didn't read it carefully enough last time :/ I guess my approach of using the protocol version 2 should be replaced by unsetting the prepared threshold.. I think I came up with that workaround after that post from Kris: http://archives.postgresql.org/pgsql-jdbc/2008-03/msg00070.php because strangely, you and I intervened in that thread, but the prepared threshold issue was not raised, so I followed the protocolVersion=2 path. Did I miss something - e.g. is the topic today different from the topic back then, for some reason? Am I wrong in assuming that your "please replan this statement every time you get new parameters" suggestion is nearly-achievable with unsetting the prepared threshold ("nearly" being the difference between replanning always, and replanning only when parameters are new)? Anyway, documentation-wise, I've tried to think of how the documentation could be a little more aggressive with the warning: http://zarb.org/~gc/t/jdbc-more-cautious-preparedstatements.diff That said, there's something more: when the documentation says: There are a number of ways to enable server side prepared statements depending on your application's needs. The general method is to set a threshold for a PreparedStatement. I assume that by default server side prepared statements are *not* enabled, although it seems to be the case, with a threshold of 5 as a simple test shows when using driver 8.3-604.jdbc3 (on PG 8.3.6). I think that either they should not be enabled by default (really, it could be better with, but it could be so much worse that is it really a good idea to make a "dropin" use of the driver use it?), or the documentation should clearly state they are, and add even more warnings about potential drawbacks. WDYT? http://zarb.org/~gc/t/jdbc-more-cautious-preparedstatements2.diff Btw, how can the doc be built? "ant doc" failed on missing docbook.stylesheet but I was unable to find how to set that value. > Do you have a suggestion for a better way to decide when to use a named > statement? Oh, I feel I don't have the qualifications to answer that question, sorry! The only thing I could think of, was what I talked about in a previous mail, e.g. save all plans of the first xx queries before reaching the threshold, and then when the threshold is reached, compare the global cost estimates of the saved plans, and do not activate server side prepare if they are too different, as caching the plan for that query would probably yield too slow results sometimes. Ideally, I guess a new PG-specific method should be added to activate that feature (and set the value for "are the plans too different?"). But bear in mind that it may be a stupid idea :) -- Guillaume Cottenceau
On Tue, 10 Mar 2009, Guillaume Cottenceau wrote: > Btw, how can the doc be built? "ant doc" failed on missing > docbook.stylesheet but I was unable to find how to set that > value. > Create a file named build.local.properties and put something like the following in it: docbook.stylesheet=/usr/share/sgml/docbook/stylesheet/xsl/nwalsh/xhtml/chunk.xsl docbook.dtd=/usr/share/sgml/docbook/dtd/xml/4.2 Kris Jurka