Обсуждение: Prepared Statement Query Planning
Hi, I'm trying to use a PreparedStatement to query a small number of rows from a table containing over 500 million rows. I'm encountering bad query plans due to the query plan being constructed without access to the bind variables. I've read this link which suggests disabling server side prepared statements but it doesn't appear to be helping. I am happy to disable all server side prepared statements because I am effectively performing batch queries where the cost of a query plan is relatively small compared to the execution of the query. http://jdbc.postgresql.org/documentation/83/server-prepare.html I've added "prepareThreshold=0", and "prepareThreshold=3" to the connection URL but neither appear to have any effect. I've checked the prepare threshold on the statement objects and found that they are being set to the values I specify. Is there any way of verifying what type of statement is being issued to the server? This is the query: SELECT e.id, e.version, e.timestamp, e.visible, u.data_public, u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude FROM nodes e INNER JOIN ( SELECT id, version FROM nodes WHERE timestamp > ? AND timestamp <= ? ) t ON e.id = t.id AND e.version = t.version INNER JOIN changesets c ON e.changeset_id = c.id INNER JOIN users u ON c.user_id = u.id The node table contains approx 500 million rows, the user table approx 100 thousand rows, and the changeset table somewhere in between but closer to the lower end. The server has 32 GB of RAM so can fit smaller tables in RAM, but not the node table. The query retrieves all rows within a timestamp range. I realise the query could be re-written without the sub-select, but it is implemented in this way as a result of the way the query is dynamically constructed and allows the sub-select portion can be switched out for other row selection criteria. The timestamp range varies anywhere from 1 minute to 1 day, but no larger than that. The table contains data over a period of several years. The full JDBC trace with loglevel=2 is below. Is it possible to tell whether server side prepared statements are being used from this? Note that this has been obtained from a windows laptop running PostgreSQL 8.3.5, but the real problem is occurring on an Ubuntu Linux server running PostgreSQL 8.3.7. The same JDBC driver is being used across the board. 17:48:46.077 (1) PostgreSQL 8.3 JDBC4 with SSL (build 603) 17:48:46.091 (1) Trying to establish a protocol version 3 connection to localhost:5432 17:48:46.204 (1) FE=> StartupPacket(user=osm, database=api06_test, client_encoding=UNICODE, DateStyle=ISO, extra_float_digits=2) 17:48:46.382 (1) <=BE AuthenticationReqMD5(salt=xxxxxxxx) 17:48:46.391 (1) FE=> Password(md5digest=md5xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx) 17:48:46.397 (1) <=BE AuthenticationOk 17:48:46.408 (1) <=BE ParameterStatus(client_encoding = UNICODE) 17:48:46.408 (1) <=BE ParameterStatus(DateStyle = ISO, DMY) 17:48:46.409 (1) <=BE ParameterStatus(integer_datetimes = off) 17:48:46.409 (1) <=BE ParameterStatus(is_superuser = on) 17:48:46.409 (1) <=BE ParameterStatus(server_encoding = UTF8) 17:48:46.409 (1) <=BE ParameterStatus(server_version = 8.3.5) 17:48:46.409 (1) <=BE ParameterStatus(session_authorization = osm) 17:48:46.409 (1) <=BE ParameterStatus(standard_conforming_strings = off) 17:48:46.409 (1) <=BE ParameterStatus(TimeZone = Australia/Canberra) 17:48:46.409 (1) <=BE BackendKeyData(pid=8292,ckey=224285055) 17:48:46.409 (1) <=BE ReadyForQuery(I) 17:48:46.409 (1) compatible = 8.3 17:48:46.409 (1) loglevel = 2 17:48:46.409 (1) prepare threshold = 0 17:48:46.442 (1) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@1b8e059, maxRows=0, fetchSize=10000, flags=9 17:48:46.442 (1) FE=> Parse(stmt=S_1,query="BEGIN",oids={}) 17:48:46.443 (1) FE=> Bind(stmt=S_1,portal=null) 17:48:46.443 (1) FE=> Execute(portal=null,limit=0) 17:48:46.443 (1) FE=> Parse(stmt=S_2,query="SELECT e.id, e.version, e.timestamp, e.visible, u.data_public, u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude FROM nodes e LEFT OUTER JOIN changesets c ON e.changeset_id = c.id LEFT OUTER JOIN users u ON c.user_id = u.id WHERE e.timestamp > $1 AND e.timestamp <= $2 ORDER BY e.id, e.version",oids={0,0}) 17:48:46.444 (1) FE=> Describe(statement=S_2) 17:48:46.444 (1) FE=> Bind(stmt=S_2,portal=C_3,$1=<1970-01-01 10:00:00.000000 +10:00:00>,$2=<2009-08-28 17:48:45.932000 +10:00:00>) 17:48:46.445 (1) FE=> Execute(portal=C_3,limit=10000) 17:48:46.445 (1) FE=> Sync 17:48:46.636 (1) <=BE ParseComplete [S_1] 17:48:46.636 (1) <=BE BindComplete [null] 17:48:46.636 (1) <=BE CommandStatus(BEGIN) 17:48:46.636 (1) <=BE ParseComplete [S_2] 17:48:46.636 (1) <=BE ParameterDescription 17:48:46.637 (1) <=BE RowDescription(10) 17:48:46.638 (1) <=BE BindComplete [C_3] 17:48:46.638 (1) <=BE CommandStatus(SELECT) 17:48:46.650 (1) <=BE ReadyForQuery(T) 17:48:46.651 (1) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Connection$TransactionCommandHandler@7b6889, maxRows=0, fetchSize=0, flags=22 17:48:46.651 (1) FE=> CloseStatement(S_2) 17:48:46.651 (1) FE=> ClosePortal(C_3) 17:48:46.651 (1) FE=> Parse(stmt=S_4,query="COMMIT",oids={}) 17:48:46.651 (1) FE=> Bind(stmt=S_4,portal=null) 17:48:46.651 (1) FE=> Execute(portal=null,limit=1) 17:48:46.651 (1) FE=> Sync 17:48:46.652 (1) <=BE CloseComplete 17:48:46.652 (1) <=BE CloseComplete 17:48:46.652 (1) <=BE ParseComplete [S_4] 17:48:46.652 (1) <=BE BindComplete [null] 17:48:46.652 (1) <=BE CommandStatus(COMMIT) 17:48:46.652 (1) <=BE ReadyForQuery(I) 17:48:46.652 (1) FE=> Terminate If it helps, the following blog entry includes two query plan diagrams with and without bind variables. http://www.odecee.com.au/blogs/?p=134 Any suggestions on what I'm doing wrong? Brett
2009/8/29 Brett Henderson <brett@bretth.com>
This is the query:
SELECT e.id, e.version, e.timestamp, e.visible, u.data_public,
u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude
FROM nodes e
INNER JOIN (
SELECT id, version FROM
nodes WHERE timestamp > ? AND timestamp <= ?
) t ON e.id = t.id AND e.version = t.version
INNER JOIN changesets c ON e.changeset_id = c.id
INNER JOIN users u ON c.user_id = u.id
The node table contains approx 500 million rows, the user table approx 100 thousand rows, and the changeset table somewhere in between but closer to the lower end. The server has 32 GB of RAM so can fit smaller tables in RAM, but not the node table.
The query retrieves all rows within a timestamp range. I realise the query could be re-written without the sub-select, but it is implemented in this way as a result of the way the query is dynamically constructed and allows the sub-select portion can be switched out for other row selection criteria.
It the subselect is only for switching out, it can be rewritten to:
SELECT e.id, e.version, e.timestamp, e.visible, u.data_public,
u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude
FROM nodes e
INNER JOIN (
values (?,?)
) t(from, to) ON timestamp > from AND timestamp <= to
INNER JOIN changesets c ON e.changeset_id = c.id
INNER JOIN users u ON c.user_id = u.id
Still don't know if this will make things better or worse.
Another (better) option could be not to add a join, but replace original "from nodes e" in this case:
SELECT e.id, e.version, e.timestamp, e.visible, u.data_public,
u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude
FROM (select * from nodes where timestamp > ? AND timestamp <= ?) e
INNER JOIN changesets c ON e.changeset_id = c.id
INNER JOIN users u ON c.user_id = u.id
Best regards, Vitalii Tymchyshyn
Віталій Тимчишин wrote:
However I don't think it will work in my case. I currently have three different ways of selecting records, 1. by timestamp range (as in my initial example), 2. by records in a temp table, and 3. unrestricted. The unrestricted example isn't an issue because a full table scan is appropriate in that case. However the temp table one is a different matter. In that case the query looks like this:
SELECT e.id, e.version, e.timestamp, e.visible, u.data_public,
u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude
FROM nodes e
INNER JOIN
tmp_nodes
t ON e.id = t.id AND e.version = t.version
INNER JOIN changesets c ON e.changeset_id = c.id
INNER JOIN users u ON c.user_id = u.id
The tmp_nodes table contains only two columns id and version which have been built up by previous queries. Most of the query remains identical, but instead of doing a "SELECT id, version FROM nodes WHERE timestamp > ? AND timestamp <= ?", I join to tmp_nodes which contains only the records I'm interested in. Originally I was creating a temp table in the timestamp range case as well but moved away in an attempt to get better performance, I suspect I was encountering bad query plans in that case as well.
It the subselect is only for switching out, it can be rewritten to:Thanks for the tips. I haven't see the first style before, and hadn't considered the second.
SELECT e.id, e.version, e.timestamp, e.visible, u.data_public,
u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude
FROM nodes e
INNER JOIN (
values (?,?)
) t(from, to) ON timestamp > from AND timestamp <= to
INNER JOIN changesets c ON e.changeset_id = c.id
INNER JOIN users u ON c.user_id = u.id
Still don't know if this will make things better or worse.
Another (better) option could be not to add a join, but replace original "from nodes e" in this case:
SELECT e.id, e.version, e.timestamp, e.visible, u.data_public,
u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude
FROM (select * from nodes where timestamp > ? AND timestamp <= ?) e
INNER JOIN changesets c ON e.changeset_id = c.id
INNER JOIN users u ON c.user_id = u.id
However I don't think it will work in my case. I currently have three different ways of selecting records, 1. by timestamp range (as in my initial example), 2. by records in a temp table, and 3. unrestricted. The unrestricted example isn't an issue because a full table scan is appropriate in that case. However the temp table one is a different matter. In that case the query looks like this:
SELECT e.id, e.version, e.timestamp, e.visible, u.data_public,
u.id AS user_id, u.display_name, e.changeset_id, e.latitude, e.longitude
FROM nodes e
INNER JOIN
tmp_nodes
t ON e.id = t.id AND e.version = t.version
INNER JOIN changesets c ON e.changeset_id = c.id
INNER JOIN users u ON c.user_id = u.id
The tmp_nodes table contains only two columns id and version which have been built up by previous queries. Most of the query remains identical, but instead of doing a "SELECT id, version FROM nodes WHERE timestamp > ? AND timestamp <= ?", I join to tmp_nodes which contains only the records I'm interested in. Originally I was creating a temp table in the timestamp range case as well but moved away in an attempt to get better performance, I suspect I was encountering bad query plans in that case as well.
Brett Henderson wrote: > The full JDBC trace with loglevel=2 is below. Is it possible to tell > whether server side prepared statements are being used from this? Note > that this has been obtained from a windows laptop running PostgreSQL > 8.3.5, but the real problem is occurring on an Ubuntu Linux server > running PostgreSQL 8.3.7. The same JDBC driver is being used across the > board. > 17:48:46.443 (1) FE=> Parse(stmt=S_2,query="SELECT e.id, e.version, > e.timestamp, e.visible, u.data_public, u.id AS user_id, u.display_name, > e.changeset_id, e.latitude, e.longitude FROM nodes e LEFT OUTER JOIN > changesets c ON e.changeset_id = c.id LEFT OUTER JOIN users u ON > c.user_id = u.id WHERE e.timestamp > $1 AND e.timestamp <= $2 ORDER BY > e.id, e.version",oids={0,0}) > 17:48:46.444 (1) FE=> Describe(statement=S_2) > 17:48:46.444 (1) FE=> Bind(stmt=S_2,portal=C_3,$1=<1970-01-01 > 10:00:00.000000 +10:00:00>,$2=<2009-08-28 17:48:45.932000 +10:00:00>) > 17:48:46.445 (1) FE=> Execute(portal=C_3,limit=10000) > 17:48:46.445 (1) FE=> Sync This is using a named statement (S_2 is the name) It looks like you're using setFetchSize(). That forces use of a named statement regardless of prepareThreshold (we have to keep the statement and corresponding portal alive so we can do subsequent fetches, even if there are some other intervening queries, so we can't use the unnamed statement) -O
Oliver Jowett wrote: > Brett Henderson wrote: > >> The full JDBC trace with loglevel=2 is below. Is it possible to tell >> whether server side prepared statements are being used from this? >> Note that this has been obtained from a windows laptop running >> PostgreSQL 8.3.5, but the real problem is occurring on an Ubuntu >> Linux server running PostgreSQL 8.3.7. The same JDBC driver is being >> used across the board. > >> 17:48:46.443 (1) FE=> Parse(stmt=S_2,query="SELECT e.id, e.version, >> e.timestamp, e.visible, u.data_public, u.id AS user_id, >> u.display_name, e.changeset_id, e.latitude, e.longitude FROM nodes e >> LEFT OUTER JOIN changesets c ON e.changeset_id = c.id LEFT OUTER JOIN >> users u ON c.user_id = u.id WHERE e.timestamp > $1 AND e.timestamp <= >> $2 ORDER BY e.id, e.version",oids={0,0}) >> 17:48:46.444 (1) FE=> Describe(statement=S_2) >> 17:48:46.444 (1) FE=> Bind(stmt=S_2,portal=C_3,$1=<1970-01-01 >> 10:00:00.000000 +10:00:00>,$2=<2009-08-28 17:48:45.932000 +10:00:00>) >> 17:48:46.445 (1) FE=> Execute(portal=C_3,limit=10000) >> 17:48:46.445 (1) FE=> Sync > > This is using a named statement (S_2 is the name) > > It looks like you're using setFetchSize(). That forces use of a named > statement regardless of prepareThreshold (we have to keep the > statement and corresponding portal alive so we can do subsequent > fetches, even if there are some other intervening queries, so we can't > use the unnamed statement) Yes, I'm currently using a fetch size of 10000. I can't allow all results to be read at once because there can potentially be a huge number of results in the queries. I've just tested it out, and sure enough leaving the fetch size at 0 prevents the use of named statements. It sounds like I'm stuck with forcing the query planner via set statements. I'm currently using "set enable_seqscan=false;set enable_hashjoin=false;set enable_mergejoin=false" and getting a good query plan. I'm not familiar with PostgreSQL internals, but I assume a portal is basically a cursor? So is there no way of creating a cursor and fetching results in batches using an unnamed statement? Brett
Brett Henderson <brett@bretth.com> writes: > However I don't think it will work in my case. I currently have three > different ways of selecting records, 1. by timestamp range (as in my > initial example), 2. by records in a temp table, and 3. unrestricted. > The unrestricted example isn't an issue because a full table scan is > appropriate in that case. However the temp table one is a different > matter. In the case of a temp table you have to ANALYZE the temp table after filling it; otherwise the planner is working completely blind as to what is in it. (autovacuum will not do this for you...) regards, tom lane
2009/8/29 Brett Henderson <brett@bretth.com>
Yes, I'm currently using a fetch size of 10000. I can't allow all results to be read at once because there can potentially be a huge number of results in the queries. I've just tested it out, and sure enough leaving the fetch size at 0 prevents the use of named statements.Oliver Jowett wrote:Brett Henderson wrote:
It looks like you're using setFetchSize(). That forces use of a named statement regardless of prepareThreshold (we have to keep the statement and corresponding portal alive so we can do subsequent fetches, even if there are some other intervening queries, so we can't use the unnamed statement)
How about "create or replace temp view tmp_v as <query>" without setFetchSize and then "select * from tmp_v" with setFetchSize? Not sure still if DDL can use prepared query parameters.
Another options would be either to use protocolLevel=2 (writing from memory, may misspell parameter name) - this will force client-side parameter embedding or to embed your timestamps into query text by yourself and not to use PreparedStatement at all. Fortunatelly this are not strings - so no quoting needed.
Best regards, Vitalii Tymchyshyn
Віталій Тимчишин wrote: > > > 2009/8/29 Brett Henderson <brett@bretth.com <mailto:brett@bretth.com>> > > Oliver Jowett wrote: > > Brett Henderson wrote: > > It looks like you're using setFetchSize(). That forces use of > a named statement regardless of prepareThreshold (we have to > keep the statement and corresponding portal alive so we can do > subsequent fetches, even if there are some other intervening > queries, so we can't use the unnamed statement) > > Yes, I'm currently using a fetch size of 10000. I can't allow all > results to be read at once because there can potentially be a huge > number of results in the queries. I've just tested it out, and > sure enough leaving the fetch size at 0 prevents the use of named > statements. > > > How about "create or replace temp view tmp_v as <query>" without > setFetchSize and then "select * from tmp_v" with setFetchSize? Not > sure still if DDL can use prepared query parameters. I don't think I can do that. I couldn't use the syntax "PREPARE STATEMENT mystatement (timestamp, timestamp) AS CREATE TEMP VIEW tmp_nodes AS SELECT id, version FROM nodes WHERE timestamp > ? AND timestamp <= ?" > Another options would be either to use protocolLevel=2 (writing from > memory, may misspell parameter name) - this will force client-side > parameter embedding or to embed your timestamps into query text by > yourself and not to use PreparedStatement at all. Fortunatelly this > are not strings - so no quoting needed. Will this force me to load all results in memory (ie. prevent me from reading resultsets in batches)? I'm hesitant to force the older protocol, I've read elsewhere that it should be avoided if possible (something about exception handling??). At this point it sounds like the lesser evil is to to specify some "set local enable_seqscan = false" type statements. It seems to be working well enough. Brett
Tom Lane wrote:
Thanks for the tip. I haven't tried the temp table queries yet. They're next.Brett Henderson <brett@bretth.com> writes:However I don't think it will work in my case. I currently have three different ways of selecting records, 1. by timestamp range (as in my initial example), 2. by records in a temp table, and 3. unrestricted. The unrestricted example isn't an issue because a full table scan is appropriate in that case. However the temp table one is a different matter.In the case of a temp table you have to ANALYZE the temp table after filling it; otherwise the planner is working completely blind as to what is in it. (autovacuum will not do this for you...)