Обсуждение: How to prevent jdbc from sending any results back to the client ?
Hi,
I'm working on an academic project and I need to benchmark PostgreSQL.
I'm intersted only about the performance of the DBMS itself and I'm trying to keep things simple in my measurements.
Preferably I'd like to ignore the query results at the client side but jdbc seems to return results even if I don't call next() on the Resultset (is that true ?).
As a consequence, I can't measure acurately a per query execution time since the time I get depends also on the time spent to send the answer (or part of it) to the client.
setFetchSize(1) doesn't seem to help much.
Can I hack the driver and diminish the overhead explained above ?
Cheers,
Dimitris
The fetch size only comes into play if your are in a transaction. You have to disable auto commit and set the fetch size before executing your query. Otherwise the entire query result will be read and buffered in memory.
An alternative is to run the command as an EXPLAIN ANALYZE[1]. The server will then execute the entire operation but instead of sending back the data it will send the query plan and runtime statistics.
Regards,
Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | http://www.jackdb.com/
Hi,I'm working on an academic project and I need to benchmark PostgreSQL.I'm intersted only about the performance of the DBMS itself and I'm trying to keep things simple in my measurements.Preferably I'd like to ignore the query results at the client side but jdbc seems to return results even if I don't call next() on the Resultset (is that true ?).As a consequence, I can't measure acurately a per query execution time since the time I get depends also on the time spent to send the answer (or part of it) to the client.setFetchSize(1) doesn't seem to help much.Can I hack the driver and diminish the overhead explained above ?Cheers,Dimitris
Dimitris,
You would be better off running queries such as explain analyze which do not return results, but do time the query. Every postgresql client library will have to wait for the results. That is essentially the way the protocol works
Dave
On 19 April 2014 15:02, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
The fetch size only comes into play if your are in a transaction. You have to disable auto commit and set the fetch size before executing your query. Otherwise the entire query result will be read and buffered in memory.An alternative is to run the command as an EXPLAIN ANALYZE[1]. The server will then execute the entire operation but instead of sending back the data it will send the query plan and runtime statistics.
Regards,
Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | http://www.jackdb.com/Hi,I'm working on an academic project and I need to benchmark PostgreSQL.I'm intersted only about the performance of the DBMS itself and I'm trying to keep things simple in my measurements.Preferably I'd like to ignore the query results at the client side but jdbc seems to return results even if I don't call next() on the Resultset (is that true ?).As a consequence, I can't measure acurately a per query execution time since the time I get depends also on the time spent to send the answer (or part of it) to the client.setFetchSize(1) doesn't seem to help much.Can I hack the driver and diminish the overhead explained above ?Cheers,Dimitris
Thanks, EXPLAIN ANALYSE might be an option for me.
One more question, because in certain cases I get strange results.
Does jdbc do any kind of result or query caching by default ?
-dk
On Sun, Apr 20, 2014 at 2:22 AM, Dave Cramer <pg@fastcrypt.com> wrote:
Dimitris,You would be better off running queries such as explain analyze which do not return results, but do time the query. Every postgresql client library will have to wait for the results. That is essentially the way the protocol worksDaveOn 19 April 2014 15:02, Sehrope Sarkuni <sehrope@jackdb.com> wrote:The fetch size only comes into play if your are in a transaction. You have to disable auto commit and set the fetch size before executing your query. Otherwise the entire query result will be read and buffered in memory.An alternative is to run the command as an EXPLAIN ANALYZE[1]. The server will then execute the entire operation but instead of sending back the data it will send the query plan and runtime statistics.
Regards,
Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | http://www.jackdb.com/Hi,I'm working on an academic project and I need to benchmark PostgreSQL.I'm intersted only about the performance of the DBMS itself and I'm trying to keep things simple in my measurements.Preferably I'd like to ignore the query results at the client side but jdbc seems to return results even if I don't call next() on the Resultset (is that true ?).As a consequence, I can't measure acurately a per query execution time since the time I get depends also on the time spent to send the answer (or part of it) to the client.setFetchSize(1) doesn't seem to help much.Can I hack the driver and diminish the overhead explained above ?Cheers,Dimitris
Dimitris,
No it does not do query caching. The backend does cache so typically the first query will take a while, then subsequent queries will take much less time as the data is in RAM.
On 20 April 2014 14:46, Dimitris Karampinas <dkarampin@gmail.com> wrote:
Thanks, EXPLAIN ANALYSE might be an option for me.One more question, because in certain cases I get strange results.Does jdbc do any kind of result or query caching by default ?-dkOn Sun, Apr 20, 2014 at 2:22 AM, Dave Cramer <pg@fastcrypt.com> wrote:Dimitris,You would be better off running queries such as explain analyze which do not return results, but do time the query. Every postgresql client library will have to wait for the results. That is essentially the way the protocol worksDaveOn 19 April 2014 15:02, Sehrope Sarkuni <sehrope@jackdb.com> wrote:The fetch size only comes into play if your are in a transaction. You have to disable auto commit and set the fetch size before executing your query. Otherwise the entire query result will be read and buffered in memory.An alternative is to run the command as an EXPLAIN ANALYZE[1]. The server will then execute the entire operation but instead of sending back the data it will send the query plan and runtime statistics.
Regards,
Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | http://www.jackdb.com/Hi,I'm working on an academic project and I need to benchmark PostgreSQL.I'm intersted only about the performance of the DBMS itself and I'm trying to keep things simple in my measurements.Preferably I'd like to ignore the query results at the client side but jdbc seems to return results even if I don't call next() on the Resultset (is that true ?).As a consequence, I can't measure acurately a per query execution time since the time I get depends also on the time spent to send the answer (or part of it) to the client.setFetchSize(1) doesn't seem to help much.Can I hack the driver and diminish the overhead explained above ?Cheers,Dimitris
I know that the backend process caches the pages in shared buffers so I set up a warm up phase before I start benchmarking.
The thing is that whenever I use psql to execute my query it takes some seconds and then it returns with the results.
For the case of jdbc driver, it returns imediately. Shouldn't it block for some seconds too and then return ?
On Sun, Apr 20, 2014 at 8:53 PM, Dave Cramer <pg@fastcrypt.com> wrote:
Dimitris,No it does not do query caching. The backend does cache so typically the first query will take a while, then subsequent queries will take much less time as the data is in RAM.On 20 April 2014 14:46, Dimitris Karampinas <dkarampin@gmail.com> wrote:Thanks, EXPLAIN ANALYSE might be an option for me.One more question, because in certain cases I get strange results.Does jdbc do any kind of result or query caching by default ?-dkOn Sun, Apr 20, 2014 at 2:22 AM, Dave Cramer <pg@fastcrypt.com> wrote:Dimitris,You would be better off running queries such as explain analyze which do not return results, but do time the query. Every postgresql client library will have to wait for the results. That is essentially the way the protocol worksDaveOn 19 April 2014 15:02, Sehrope Sarkuni <sehrope@jackdb.com> wrote:The fetch size only comes into play if your are in a transaction. You have to disable auto commit and set the fetch size before executing your query. Otherwise the entire query result will be read and buffered in memory.An alternative is to run the command as an EXPLAIN ANALYZE[1]. The server will then execute the entire operation but instead of sending back the data it will send the query plan and runtime statistics.
Regards,
Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | http://www.jackdb.com/Hi,I'm working on an academic project and I need to benchmark PostgreSQL.I'm intersted only about the performance of the DBMS itself and I'm trying to keep things simple in my measurements.Preferably I'd like to ignore the query results at the client side but jdbc seems to return results even if I don't call next() on the Resultset (is that true ?).As a consequence, I can't measure acurately a per query execution time since the time I get depends also on the time spent to send the answer (or part of it) to the client.setFetchSize(1) doesn't seem to help much.Can I hack the driver and diminish the overhead explained above ?Cheers,Dimitris
There should be no difference. Can you post an example. Also there is more caching than just shared buffers. What you are attempting is difficult
On 2014-04-20 3:55 PM, "Dimitris Karampinas" <dkarampin@gmail.com> wrote:
I know that the backend process caches the pages in shared buffers so I set up a warm up phase before I start benchmarking.The thing is that whenever I use psql to execute my query it takes some seconds and then it returns with the results.For the case of jdbc driver, it returns imediately. Shouldn't it block for some seconds too and then return ?On Sun, Apr 20, 2014 at 8:53 PM, Dave Cramer <pg@fastcrypt.com> wrote:Dimitris,No it does not do query caching. The backend does cache so typically the first query will take a while, then subsequent queries will take much less time as the data is in RAM.On 20 April 2014 14:46, Dimitris Karampinas <dkarampin@gmail.com> wrote:Thanks, EXPLAIN ANALYSE might be an option for me.One more question, because in certain cases I get strange results.Does jdbc do any kind of result or query caching by default ?-dkOn Sun, Apr 20, 2014 at 2:22 AM, Dave Cramer <pg@fastcrypt.com> wrote:Dimitris,You would be better off running queries such as explain analyze which do not return results, but do time the query. Every postgresql client library will have to wait for the results. That is essentially the way the protocol worksDaveOn 19 April 2014 15:02, Sehrope Sarkuni <sehrope@jackdb.com> wrote:The fetch size only comes into play if your are in a transaction. You have to disable auto commit and set the fetch size before executing your query. Otherwise the entire query result will be read and buffered in memory.An alternative is to run the command as an EXPLAIN ANALYZE[1]. The server will then execute the entire operation but instead of sending back the data it will send the query plan and runtime statistics.
Regards,
Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | http://www.jackdb.com/Hi,I'm working on an academic project and I need to benchmark PostgreSQL.I'm intersted only about the performance of the DBMS itself and I'm trying to keep things simple in my measurements.Preferably I'd like to ignore the query results at the client side but jdbc seems to return results even if I don't call next() on the Resultset (is that true ?).As a consequence, I can't measure acurately a per query execution time since the time I get depends also on the time spent to send the answer (or part of it) to the client.setFetchSize(1) doesn't seem to help much.Can I hack the driver and diminish the overhead explained above ?Cheers,Dimitris
One possible problem with using EXPLAIN ANALYZE is that the cost of timing each step can artificially inflate the query time...however you can avoid this by using the variant: EXPLAIN (ANALYZE,TIMING FALSE) statement Which still does the query, but skips timing each step (which I think probably what you want). It still says how long the entire statement took. regards Mark On 20/04/14 12:22, Dave Cramer wrote: > Dimitris, > > You would be better off running queries such as explain analyze which do > not return results, but do time the query. Every postgresql client > library will have to wait for the results. That is essentially the way > the protocol works > > Dave > > Dave Cramer > > dave.cramer(at)credativ(dot)ca > http://www.credativ.ca > > > On 19 April 2014 15:02, Sehrope Sarkuni <sehrope@jackdb.com > <mailto:sehrope@jackdb.com>> wrote: > > The fetch size only comes into play if your are in a transaction. > You have to disable auto commit and set the fetch size before > executing your query. Otherwise the entire query result will be read > and buffered in memory. > > An alternative is to run the command as an EXPLAIN ANALYZE[1]. The > server will then execute the entire operation but instead of sending > back the data it will send the query plan and runtime statistics. > > [1]: http://www.postgresql.org/docs/9.3/static/sql-explain.html > > Regards, > Sehrope Sarkuni > Founder & CEO | JackDB, Inc. | http://www.jackdb.com/ > > On Apr 19, 2014, at 2:48 PM, Dimitris Karampinas > <dkarampin@gmail.com <mailto:dkarampin@gmail.com>> wrote: > >> Hi, >> >> I'm working on an academic project and I need to benchmark PostgreSQL. >> I'm intersted only about the performance of the DBMS itself and >> I'm trying to keep things simple in my measurements. >> Preferably I'd like to ignore the query results at the client side >> but jdbc seems to return results even if I don't call next() on >> the Resultset (is that true ?). >> As a consequence, I can't measure acurately a per query execution >> time since the time I get depends also on the time spent to send >> the answer (or part of it) to the client. >> setFetchSize(1) doesn't seem to help much. >> Can I hack the driver and diminish the overhead explained above ? >> >> Cheers, >> Dimitris > >
FWIW: the difference is noticeable, even on modern CPU types (this is an i7 4770): bench=# EXPLAIN (ANALYZE,TIMING FALSE) SELECT aid,bid FROM pgbench_accounts; QUERY PLAN -------------------------------------------------------------------------------------------------------- Seq Scan on pgbench_accounts (cost=0.00..26394.00 rows=1000000 width=8) (actual rows=1000000 loops=1) Planning time: 0.066 ms Total runtime: 80.172 ms (3 rows) bench=# EXPLAIN (ANALYZE,TIMING) SELECT aid,bid FROM pgbench_accounts; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Seq Scan on pgbench_accounts (cost=0.00..26394.00 rows=1000000 width=8) (actual time=0.010..98.167 rows=1000000 loops=1) Planning time: 0.063 ms Total runtime: 124.818 ms (3 rows) On 21/04/14 17:46, Mark Kirkwood wrote: > One possible problem with using EXPLAIN ANALYZE is that the cost of > timing each step can artificially inflate the query time...however you > can avoid this by using the variant: > > EXPLAIN (ANALYZE,TIMING FALSE) statement > > Which still does the query, but skips timing each step (which I think > probably what you want). It still says how long the entire statement took. > > regards > > Mark > > > On 20/04/14 12:22, Dave Cramer wrote: >> Dimitris, >> >> You would be better off running queries such as explain analyze which do >> not return results, but do time the query. Every postgresql client >> library will have to wait for the results. That is essentially the way >> the protocol works >> >> Dave >> >> Dave Cramer >> >> dave.cramer(at)credativ(dot)ca >> http://www.credativ.ca >> >> >> On 19 April 2014 15:02, Sehrope Sarkuni <sehrope@jackdb.com >> <mailto:sehrope@jackdb.com>> wrote: >> >> The fetch size only comes into play if your are in a transaction. >> You have to disable auto commit and set the fetch size before >> executing your query. Otherwise the entire query result will be read >> and buffered in memory. >> >> An alternative is to run the command as an EXPLAIN ANALYZE[1]. The >> server will then execute the entire operation but instead of sending >> back the data it will send the query plan and runtime statistics. >> >> [1]: http://www.postgresql.org/docs/9.3/static/sql-explain.html >> >> Regards, >> Sehrope Sarkuni >> Founder & CEO | JackDB, Inc. | http://www.jackdb.com/ >> >> On Apr 19, 2014, at 2:48 PM, Dimitris Karampinas >> <dkarampin@gmail.com <mailto:dkarampin@gmail.com>> wrote: >> >>> Hi, >>> >>> I'm working on an academic project and I need to benchmark >>> PostgreSQL. >>> I'm intersted only about the performance of the DBMS itself and >>> I'm trying to keep things simple in my measurements. >>> Preferably I'd like to ignore the query results at the client side >>> but jdbc seems to return results even if I don't call next() on >>> the Resultset (is that true ?). >>> As a consequence, I can't measure acurately a per query execution >>> time since the time I get depends also on the time spent to send >>> the answer (or part of it) to the client. >>> setFetchSize(1) doesn't seem to help much. >>> Can I hack the driver and diminish the overhead explained above ? >>> >>> Cheers, >>> Dimitris >> >> > > >
Thanks for your answers.
I need to understand the semantics of the jdbc driver.
My question is, what is the behaviour of executeQuery() ?
At the beginning I thought it was a blocking call, meaning control comes back to the application right after the query has been fully executed at the server side.
But my experiments show that once the server starts producing the first tuples (assume a big join between two tables) executeQuery returns and I can start iterating to the Resultset while the long query continues running at the backend.
Can someone verify this ?
If this is the case, my next question is, what happens if I call .close() on the ResultSet and the query is still running. Does it get aborted ?
FYI: I use a modified version of PostgreSQL. EXPLAIN ANALYSE and a couple of other tricks I tried don't always work properly.
Thank you
-dk
On Mon, Apr 21, 2014 at 8:01 AM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:
FWIW: the difference is noticeable, even on modern CPU types (this is an i7 4770):
bench=# EXPLAIN (ANALYZE,TIMING FALSE) SELECT aid,bid FROM pgbench_accounts;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on pgbench_accounts (cost=0.00..26394.00 rows=1000000 width=8) (actual rows=1000000 loops=1)
Planning time: 0.066 ms
Total runtime: 80.172 ms
(3 rows)
bench=# EXPLAIN (ANALYZE,TIMING) SELECT aid,bid FROM
pgbench_accounts;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Seq Scan on pgbench_accounts (cost=0.00..26394.00 rows=1000000 width=8) (actual time=0.010..98.167 rows=1000000 loops=1)
Planning time: 0.063 ms
Total runtime: 124.818 ms
(3 rows)
On 21/04/14 17:46, Mark Kirkwood wrote:One possible problem with using EXPLAIN ANALYZE is that the cost of
timing each step can artificially inflate the query time...however you
can avoid this by using the variant:
EXPLAIN (ANALYZE,TIMING FALSE) statement
Which still does the query, but skips timing each step (which I think
probably what you want). It still says how long the entire statement took.
regards
Mark
On 20/04/14 12:22, Dave Cramer wrote:Dimitris,
You would be better off running queries such as explain analyze which do
not return results, but do time the query. Every postgresql client
library will have to wait for the results. That is essentially the way
the protocol works
Dave
Dave Cramer
dave.cramer(at)credativ(dot)ca
http://www.credativ.ca
On 19 April 2014 15:02, Sehrope Sarkuni <sehrope@jackdb.com
<mailto:sehrope@jackdb.com>> wrote:
The fetch size only comes into play if your are in a transaction.
You have to disable auto commit and set the fetch size before
executing your query. Otherwise the entire query result will be read
and buffered in memory.
An alternative is to run the command as an EXPLAIN ANALYZE[1]. The
server will then execute the entire operation but instead of sending
back the data it will send the query plan and runtime statistics.
[1]: http://www.postgresql.org/docs/9.3/static/sql-explain.html
Regards,
Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | http://www.jackdb.com/
On Apr 19, 2014, at 2:48 PM, Dimitris Karampinas
<dkarampin@gmail.com <mailto:dkarampin@gmail.com>> wrote:Hi,
I'm working on an academic project and I need to benchmark
PostgreSQL.
I'm intersted only about the performance of the DBMS itself and
I'm trying to keep things simple in my measurements.
Preferably I'd like to ignore the query results at the client side
but jdbc seems to return results even if I don't call next() on
the Resultset (is that true ?).
As a consequence, I can't measure acurately a per query execution
time since the time I get depends also on the time spent to send
the answer (or part of it) to the client.
setFetchSize(1) doesn't seem to help much.
Can I hack the driver and diminish the overhead explained above ?
Cheers,
Dimitris
On 21 April 2014 11:30, Dimitris Karampinas <dkarampin@gmail.com> wrote:
Thanks for your answers.I need to understand the semantics of the jdbc driver.My question is, what is the behaviour of executeQuery() ?At the beginning I thought it was a blocking call, meaning control comes back to the application right after the query has been fully executed at the server side.But my experiments show that once the server starts producing the first tuples (assume a big join between two tables) executeQuery returns and I can start iterating to the Resultset while the long query continues running at the backend.
The driver can do both. If it uses a cursor. IE you have setFetchSize() then you may get results back from a query before they are all processed.
Can someone verify this ?If this is the case, my next question is, what happens if I call .close() on the ResultSet and the query is still running. Does it get aborted ?
The query will not get aborted
FYI: I use a modified version of PostgreSQL. EXPLAIN ANALYSE and a couple of other tricks I tried don't always work properly.
Then all bets are off. Candidly this isn't a good way to profile postgreSQL. Running queries and analyzing the logs is a better way
Thank you-dkOn Mon, Apr 21, 2014 at 8:01 AM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote:FWIW: the difference is noticeable, even on modern CPU types (this is an i7 4770):
bench=# EXPLAIN (ANALYZE,TIMING FALSE) SELECT aid,bid FROM pgbench_accounts;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on pgbench_accounts (cost=0.00..26394.00 rows=1000000 width=8) (actual rows=1000000 loops=1)
Planning time: 0.066 ms
Total runtime: 80.172 ms
(3 rows)
bench=# EXPLAIN (ANALYZE,TIMING) SELECT aid,bid FROM
pgbench_accounts;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Seq Scan on pgbench_accounts (cost=0.00..26394.00 rows=1000000 width=8) (actual time=0.010..98.167 rows=1000000 loops=1)
Planning time: 0.063 ms
Total runtime: 124.818 ms
(3 rows)
On 21/04/14 17:46, Mark Kirkwood wrote:One possible problem with using EXPLAIN ANALYZE is that the cost of
timing each step can artificially inflate the query time...however you
can avoid this by using the variant:
EXPLAIN (ANALYZE,TIMING FALSE) statement
Which still does the query, but skips timing each step (which I think
probably what you want). It still says how long the entire statement took.
regards
Mark
On 20/04/14 12:22, Dave Cramer wrote:Dimitris,
You would be better off running queries such as explain analyze which do
not return results, but do time the query. Every postgresql client
library will have to wait for the results. That is essentially the way
the protocol works
Dave
Dave Cramer
dave.cramer(at)credativ(dot)ca
http://www.credativ.ca
On 19 April 2014 15:02, Sehrope Sarkuni <sehrope@jackdb.com
<mailto:sehrope@jackdb.com>> wrote:
The fetch size only comes into play if your are in a transaction.
You have to disable auto commit and set the fetch size before
executing your query. Otherwise the entire query result will be read
and buffered in memory.
An alternative is to run the command as an EXPLAIN ANALYZE[1]. The
server will then execute the entire operation but instead of sending
back the data it will send the query plan and runtime statistics.
[1]: http://www.postgresql.org/docs/9.3/static/sql-explain.html
Regards,
Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | http://www.jackdb.com/
On Apr 19, 2014, at 2:48 PM, Dimitris Karampinas
<dkarampin@gmail.com <mailto:dkarampin@gmail.com>> wrote:Hi,
I'm working on an academic project and I need to benchmark
PostgreSQL.
I'm intersted only about the performance of the DBMS itself and
I'm trying to keep things simple in my measurements.
Preferably I'd like to ignore the query results at the client side
but jdbc seems to return results even if I don't call next() on
the Resultset (is that true ?).
As a consequence, I can't measure acurately a per query execution
time since the time I get depends also on the time spent to send
the answer (or part of it) to the client.
setFetchSize(1) doesn't seem to help much.
Can I hack the driver and diminish the overhead explained above ?
Cheers,
Dimitris