Обсуждение: Performance comparison between Pgsql 10.5 and Pgsql 11.2
Hello,
is there any reason why I am getting worse results using pgsql11.2 in writing comparing it with pgsql 10.6?
I have two Instances, both just restored, so no bloats.
Running read queries I have pretty much same results, a little bit better on pg11- Running writes the difference is in favour of 10.
I am expecting pg11 to be better.
Running pgbench :
PG11
[root@STAGING-CMD1 ~]# /usr/local/pgsql11.2/bin/pgbench -t 1000 -c 20 -C -f stress_service_order.sql cmdstaging -U admin
transaction type: stress_service_order.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
number of transactions per client: 1000
number of transactions actually processed: 20000/20000
latency average = 45.322 ms
tps = 441.283336 (including connections establishing)
tps = 463.731537 (excluding connections establishing)
PG10
[root@STAGING-CMD1 ~]# pgbench -t 1000 -c 20 -C -f stress_service_order.sql cmdstaging -U admin
transaction type: stress_service_order.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
number of transactions per client: 1000
number of transactions actually processed: 20000/20000
latency average = 44.686 ms
tps = 447.565403 (including connections establishing)
tps = 470.285561 (excluding connections establishing)
This is making a really big difference with longer queries.
Here I am updating a field in a random record.
With more transactions the difference is bigger
WITH POSTGRES 10
[root@STAGING-CMD1 ~]# pgbench -t 100000 -c 20 -C -f stress_service_order_read.sql cmdstaging -U postgres
transaction type: stress_service_order_read.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
number of transactions per client: 100000
number of transactions actually processed: 2000000/2000000
latency average = 55.291 ms
tps = 442.1490778 (including connections establishing)
tps = 454.846844 (excluding connections establishing)
WITH POSTGRES 11
[root@STAGING-CMD1 ~]# pgbench -t 100000 -c 20 -C -f stress_service_order_read.sql cmdstaging -U postgres
transaction type: stress_service_order_read.sql
scaling factor: 1
query mode: simple
number of clients: 20
number of threads: 1
number of transactions per client: 100000
number of transactions actually processed: 2000000/2000000
latency average = 53.291 ms
tps = 375.297748 (including connections establishing)
tps = 392.316057 (excluding connections establishing)
The postgres.conf file are the same.
max_connections = 220
shared_buffers = 10GB
effective_cache_size = 120GB
work_mem = 600MB
maintenance_work_mem = 2GB
min_wal_size = 1GB
max_wal_size = 2GB
checkpoint_completion_target = 0.7
wal_buffers = 16MB
#default_statistics_target = 100
Using data_sync_retry=on doesn't make any difference.
Is there anything else changed in the default values?
Any trick?
I don't want to go live and loose performances.
Thanks a lot,
Nicola
On 01/03/2019 15:01, Nicola Contu wrote: > Hello, > is there any reason why I am getting worse results using pgsql11.2 in > writing comparing it with pgsql 10.6? > > I have two Instances, both just restored, so no bloats. > Running read queries I have pretty much same results, a little bit > better on pg11- Running writes the difference is in favour of 10. Did you run ANALYZE on the databases after restoring? Ray. -- Raymond O'Donnell // Galway // Ireland ray@rodonnell.ie
On Sat, Mar 2, 2019 at 5:02 AM Ray O'Donnell <ray@rodonnell.ie> wrote: > On 01/03/2019 15:01, Nicola Contu wrote: > > Hello, > > is there any reason why I am getting worse results using pgsql11.2 in > > writing comparing it with pgsql 10.6? > > > > I have two Instances, both just restored, so no bloats. > > Running read queries I have pretty much same results, a little bit > > better on pg11- Running writes the difference is in favour of 10. > > Did you run ANALYZE on the databases after restoring? If you can rule out different query plans, and if you compiled them both with the same compiler and optimisation levels and without cassert enabled (it's a long shot but I mentioned that because you showed a path in /usr/local so perhaps you're hand-compiling 11, but 10 came from a package?), then the next step might be to use a profiler like "perf" (or something equivalent on your OS) to figure out where 11 is spending more time in the write test? -- Thomas Munro https://enterprisedb.com
I did a analyze in stages on both.
And Yes both are compiled.
This is the configure command (change 10.6 for PG10)
./configure --prefix=/usr/local/pgsql11.2
See attached perf report. The difference seems to be all in this line, but not sure :
+ 26.80% 0.00% 222 postmaster [kernel.kallsyms] [k] system_call_fastpath
I am using CentOS 7
With Centos I am using this profile for tuned-adm
[root@STAGING-CMD1 ~]# tuned-adm active
Current active profile: latency-performance
Il giorno sab 2 mar 2019 alle ore 20:41 Thomas Munro <thomas.munro@gmail.com> ha scritto:
On Sat, Mar 2, 2019 at 5:02 AM Ray O'Donnell <ray@rodonnell.ie> wrote:
> On 01/03/2019 15:01, Nicola Contu wrote:
> > Hello,
> > is there any reason why I am getting worse results using pgsql11.2 in
> > writing comparing it with pgsql 10.6?
> >
> > I have two Instances, both just restored, so no bloats.
> > Running read queries I have pretty much same results, a little bit
> > better on pg11- Running writes the difference is in favour of 10.
>
> Did you run ANALYZE on the databases after restoring?
If you can rule out different query plans, and if you compiled them
both with the same compiler and optimisation levels and without
cassert enabled (it's a long shot but I mentioned that because you
showed a path in /usr/local so perhaps you're hand-compiling 11, but
10 came from a package?), then the next step might be to use a
profiler like "perf" (or something equivalent on your OS) to figure
out where 11 is spending more time in the write test?
--
Thomas Munro
https://enterprisedb.com
Вложения
> is there any reason why I am getting worse results using pgsql11.2 in writing comparing it with pgsql 10.6?
>... And Yes both are compiled.
Why 10.6?
>... And Yes both are compiled.
Why 10.6?
according to release notes
"14th February 2019: PostgreSQL 11.2, 10.7, 9.6.12, 9.5.16, and 9.4.21 Released!" https://www.postgresql.org/about/news/1920/
"14th February 2019: PostgreSQL 11.2, 10.7, 9.6.12, 9.5.16, and 9.4.21 Released!" https://www.postgresql.org/about/news/1920/
imho: it would be better to compare PG11.2 with PG10.7 ( similar bug Fixes and Improvements + same fsync() behavior )
"This release changes the behavior in how PostgreSQL interfaces with fsync() and includes fixes for partitioning and over 70 other bugs that were reported over the past three months"
Imre
Nicola Contu <nicola.contu@gmail.com> ezt írta (időpont: 2019. márc. 4., H, 13:14):
I did a analyze in stages on both.And Yes both are compiled.This is the configure command (change 10.6 for PG10)./configure --prefix=/usr/local/pgsql11.2See attached perf report. The difference seems to be all in this line, but not sure :+ 26.80% 0.00% 222 postmaster [kernel.kallsyms] [k] system_call_fastpathI am using CentOS 7With Centos I am using this profile for tuned-adm[root@STAGING-CMD1 ~]# tuned-adm activeCurrent active profile: latency-performanceIl giorno sab 2 mar 2019 alle ore 20:41 Thomas Munro <thomas.munro@gmail.com> ha scritto:On Sat, Mar 2, 2019 at 5:02 AM Ray O'Donnell <ray@rodonnell.ie> wrote:
> On 01/03/2019 15:01, Nicola Contu wrote:
> > Hello,
> > is there any reason why I am getting worse results using pgsql11.2 in
> > writing comparing it with pgsql 10.6?
> >
> > I have two Instances, both just restored, so no bloats.
> > Running read queries I have pretty much same results, a little bit
> > better on pg11- Running writes the difference is in favour of 10.
>
> Did you run ANALYZE on the databases after restoring?
If you can rule out different query plans, and if you compiled them
both with the same compiler and optimisation levels and without
cassert enabled (it's a long shot but I mentioned that because you
showed a path in /usr/local so perhaps you're hand-compiling 11, but
10 came from a package?), then the next step might be to use a
profiler like "perf" (or something equivalent on your OS) to figure
out where 11 is spending more time in the write test?
--
Thomas Munro
https://enterprisedb.com
Because I have 10.6 in production :) and I am comparing with what I will be loosing.
And I read that in the release notes but as said in my first email, even with data_sync_retry=on (going back to previous behavior) doesn't make any difference.
So I am looking for something that will keep my performances but still allows me to upgrade to 11 in production.
Also, trying with 11.1, the problem seems still there.
Il giorno lun 4 mar 2019 alle ore 14:45 Imre Samu <pella.samu@gmail.com> ha scritto:
> is there any reason why I am getting worse results using pgsql11.2 in writing comparing it with pgsql 10.6?
>... And Yes both are compiled.
Why 10.6?according to release notes
"14th February 2019: PostgreSQL 11.2, 10.7, 9.6.12, 9.5.16, and 9.4.21 Released!" https://www.postgresql.org/about/news/1920/imho: it would be better to compare PG11.2 with PG10.7 ( similar bug Fixes and Improvements + same fsync() behavior )"This release changes the behavior in how PostgreSQL interfaces with fsync() and includes fixes for partitioning and over 70 other bugs that were reported over the past three months"ImreNicola Contu <nicola.contu@gmail.com> ezt írta (időpont: 2019. márc. 4., H, 13:14):I did a analyze in stages on both.And Yes both are compiled.This is the configure command (change 10.6 for PG10)./configure --prefix=/usr/local/pgsql11.2See attached perf report. The difference seems to be all in this line, but not sure :+ 26.80% 0.00% 222 postmaster [kernel.kallsyms] [k] system_call_fastpathI am using CentOS 7With Centos I am using this profile for tuned-adm[root@STAGING-CMD1 ~]# tuned-adm activeCurrent active profile: latency-performanceIl giorno sab 2 mar 2019 alle ore 20:41 Thomas Munro <thomas.munro@gmail.com> ha scritto:On Sat, Mar 2, 2019 at 5:02 AM Ray O'Donnell <ray@rodonnell.ie> wrote:
> On 01/03/2019 15:01, Nicola Contu wrote:
> > Hello,
> > is there any reason why I am getting worse results using pgsql11.2 in
> > writing comparing it with pgsql 10.6?
> >
> > I have two Instances, both just restored, so no bloats.
> > Running read queries I have pretty much same results, a little bit
> > better on pg11- Running writes the difference is in favour of 10.
>
> Did you run ANALYZE on the databases after restoring?
If you can rule out different query plans, and if you compiled them
both with the same compiler and optimisation levels and without
cassert enabled (it's a long shot but I mentioned that because you
showed a path in /usr/local so perhaps you're hand-compiling 11, but
10 came from a package?), then the next step might be to use a
profiler like "perf" (or something equivalent on your OS) to figure
out where 11 is spending more time in the write test?
--
Thomas Munro
https://enterprisedb.com
Nicola Contu <nicola.contu@gmail.com> writes: > See attached perf report. The difference seems to be all in this line, but > not sure : > + 26.80% 0.00% 222 postmaster [kernel.kallsyms] > [k] system_call_fastpath That would suggest that many more kernel calls are happening, which is something you could usefully investigate with strace, perhaps. regards, tom lane
Attached a part of the strace running the pgbench command for pg11
Also attached strace_10 for pg10.6.
Thanks
Il giorno lun 4 mar 2019 alle ore 15:22 Tom Lane <tgl@sss.pgh.pa.us> ha scritto:
Nicola Contu <nicola.contu@gmail.com> writes:
> See attached perf report. The difference seems to be all in this line, but
> not sure :
> + 26.80% 0.00% 222 postmaster [kernel.kallsyms]
> [k] system_call_fastpath
That would suggest that many more kernel calls are happening, which is
something you could usefully investigate with strace, perhaps.
regards, tom lane
Вложения
On Tue, Mar 5, 2019 at 5:01 AM Nicola Contu <nicola.contu@gmail.com> wrote: > Attached a part of the strace running the pgbench command for pg11 > Also attached strace_10 for pg10.6. That looks like strace output from pgbench, and I don't see any interesting differences between v10 and v11 (though I'm surprised to see it using poll() instead of ppoll(), and opening a new connection for every transaction). How about syscalls on the server side? You could start it with something like "strace -f path/to/postgres -D path/to/pgdata" (-f for follow children), and perhaps also use -c so that it shows aggregated data (up until you ^C it) instead of every syscall? -- Thomas Munro https://enterprisedb.com
Not sure what you are requesting exactly but here is the strace for the start of the pg_ctl
Il giorno lun 4 mar 2019 alle ore 21:55 Thomas Munro <thomas.munro@gmail.com> ha scritto:
On Tue, Mar 5, 2019 at 5:01 AM Nicola Contu <nicola.contu@gmail.com> wrote:
> Attached a part of the strace running the pgbench command for pg11
> Also attached strace_10 for pg10.6.
That looks like strace output from pgbench, and I don't see any
interesting differences between v10 and v11 (though I'm surprised to
see it using poll() instead of ppoll(), and opening a new connection
for every transaction).
How about syscalls on the server side? You could start it with
something like "strace -f path/to/postgres -D path/to/pgdata" (-f for
follow children), and perhaps also use -c so that it shows aggregated
data (up until you ^C it) instead of every syscall?
--
Thomas Munro
https://enterprisedb.com
Вложения
On Wed, Mar 6, 2019 at 4:22 AM Nicola Contu <nicola.contu@gmail.com> wrote: > > Not sure what you are requesting exactly but here is the strace for the start of the pg_ctl I meant that you could run the server itself in the foreground under strace, like so: $ strace -f -c /usr/local/pgsql11.2/bin/postgres -D /db/pgsql11/data/ Then perform your testing, and finally stop it with pg_ctl from another window (or hit ^C in this window) and strace should spit out a table of system calls with some counters. We might be able to see why v11 is spending so much more time executing system calls than v10 for your workload, or at least which systems calls they are, assuming you run the same transactions against both versions. -- Thomas Munro https://enterprisedb.com
Here is the strace as requested for pg11
Thanks
Il giorno mar 5 mar 2019 alle ore 17:47 Thomas Munro <thomas.munro@gmail.com> ha scritto:
On Wed, Mar 6, 2019 at 4:22 AM Nicola Contu <nicola.contu@gmail.com> wrote:
>
> Not sure what you are requesting exactly but here is the strace for the start of the pg_ctl
I meant that you could run the server itself in the foreground under
strace, like so:
$ strace -f -c /usr/local/pgsql11.2/bin/postgres -D /db/pgsql11/data/
Then perform your testing, and finally stop it with pg_ctl from
another window (or hit ^C in this window) and strace should spit out a
table of system calls with some counters. We might be able to see why
v11 is spending so much more time executing system calls than v10 for
your workload, or at least which systems calls they are, assuming you
run the same transactions against both versions.
--
Thomas Munro
https://enterprisedb.com
Вложения
This is instead the strace of another server running the same version compiled but that is even slower.
Il giorno mer 6 mar 2019 alle ore 11:14 Nicola Contu <nicola.contu@gmail.com> ha scritto:
Here is the strace as requested for pg11ThanksIl giorno mar 5 mar 2019 alle ore 17:47 Thomas Munro <thomas.munro@gmail.com> ha scritto:On Wed, Mar 6, 2019 at 4:22 AM Nicola Contu <nicola.contu@gmail.com> wrote:
>
> Not sure what you are requesting exactly but here is the strace for the start of the pg_ctl
I meant that you could run the server itself in the foreground under
strace, like so:
$ strace -f -c /usr/local/pgsql11.2/bin/postgres -D /db/pgsql11/data/
Then perform your testing, and finally stop it with pg_ctl from
another window (or hit ^C in this window) and strace should spit out a
table of system calls with some counters. We might be able to see why
v11 is spending so much more time executing system calls than v10 for
your workload, or at least which systems calls they are, assuming you
run the same transactions against both versions.
--
Thomas Munro
https://enterprisedb.com
Вложения
On Wed, Mar 6, 2019 at 11:14 PM Nicola Contu <nicola.contu@gmail.com> wrote: > Here is the strace as requested for pg11 How does it compare to v10 running the same test? -- Thomas Munro https://enterprisedb.com
On Thu, Mar 7, 2019 at 12:12 AM Nicola Contu <nicola.contu@gmail.com> wrote: > This is instead the strace of another server running the same version compiled but that is even slower. Huh. That's a lot of lseek(). Some of these will be for random reads/writes and will go way in v12, and some will be for probing the size of relations while planning, and some while executing scans. I bet you could make some of them go away by using prepared statements. Does the query in your test involve many partitions/tables? % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 32.50 143.010306 7 21044095 lseek 26.21 115.354045 14 8144577 read 6.18 27.185578 16 1669889 10 sendto 5.29 23.300584 57 407528 fdatasync 4.93 21.709522 9 2313529 824174 recvfrom 3.31 14.547568 19 765897 write 2.73 12.007486 14 867088 14494 epoll_wait 2.18 9.597460 15 659871 84097 futex 1.85 8.147759 14 567414 close 1.77 7.767832 18 437656 11319 open The other results had 1 usec lseek(), and much fewer of them relative to the number of reads and writes. BTW, are you comparing v10 and v11 on the same hardware, kernel, filesystem? Just wondering if there could be some change in syscall overhead on different kernel patch levels or something like that: we see 7 usec vs 1 usec in those two files (though I have no idea how reliable these times are) and if we're going to call it 21 million times at some point it might matter... -- Thomas Munro https://enterprisedb.com
So the first file is on Postgres11.2 on a test server (and where I compare 10 vs 11)
The second file, is our preprod machine running Postgres 11.2 (different hardware etc, it is a VM). I know that could be confusing, but I just wanted to compare that too because if you see the two files there's a lot of difference between the two machines.
And they are both running CentOS 7.
So at this point I have two problems. One inside the machine between Postgres 10 and 11 and another problem on the preprod (similar to prod) with a lot of lseek.
Sorry if this is confusing, hope it is clear now.
Regarding partitions/tables. The first file involves just one table. The second file (with a huge lseek) was running the test on a single table, but meanwhile it was accessible by the preprod web application. So it was maybe hit by some user and some other table.
Question:
1) Is it possible that pgbench could not be really a good tool for testing the performances? If I use a sql script of thousands of insert records and compare on the same server between pg10 and pg11 I get pretty much the same result (maybe better on pg11)
2) regarding preprod, is there any way to reduce those lseek()? Just to let you know, comparing the same insert script between the first server, the first server takes 2m the second one takes 5-7m.
Thanks a lot,
Il giorno gio 7 mar 2019 alle ore 04:47 Thomas Munro <thomas.munro@gmail.com> ha scritto:
On Thu, Mar 7, 2019 at 12:12 AM Nicola Contu <nicola.contu@gmail.com> wrote:
> This is instead the strace of another server running the same version compiled but that is even slower.
Huh. That's a lot of lseek(). Some of these will be for random
reads/writes and will go way in v12, and some will be for probing the
size of relations while planning, and some while executing scans. I
bet you could make some of them go away by using prepared statements.
Does the query in your test involve many partitions/tables?
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
32.50 143.010306 7 21044095 lseek
26.21 115.354045 14 8144577 read
6.18 27.185578 16 1669889 10 sendto
5.29 23.300584 57 407528 fdatasync
4.93 21.709522 9 2313529 824174 recvfrom
3.31 14.547568 19 765897 write
2.73 12.007486 14 867088 14494 epoll_wait
2.18 9.597460 15 659871 84097 futex
1.85 8.147759 14 567414 close
1.77 7.767832 18 437656 11319 open
The other results had 1 usec lseek(), and much fewer of them relative
to the number of reads and writes. BTW, are you comparing v10 and v11
on the same hardware, kernel, filesystem? Just wondering if there
could be some change in syscall overhead on different kernel patch
levels or something like that: we see 7 usec vs 1 usec in those two
files (though I have no idea how reliable these times are) and if
we're going to call it 21 million times at some point it might
matter...
--
Thomas Munro
https://enterprisedb.com
Hello,
do you have any advice on this?
Thanks a lot in advance
Il giorno gio 7 mar 2019 alle ore 09:39 Nicola Contu <nicola.contu@gmail.com> ha scritto:
So the first file is on Postgres11.2 on a test server (and where I compare 10 vs 11)The second file, is our preprod machine running Postgres 11.2 (different hardware etc, it is a VM). I know that could be confusing, but I just wanted to compare that too because if you see the two files there's a lot of difference between the two machines.And they are both running CentOS 7.So at this point I have two problems. One inside the machine between Postgres 10 and 11 and another problem on the preprod (similar to prod) with a lot of lseek.Sorry if this is confusing, hope it is clear now.Regarding partitions/tables. The first file involves just one table. The second file (with a huge lseek) was running the test on a single table, but meanwhile it was accessible by the preprod web application. So it was maybe hit by some user and some other table.Question:1) Is it possible that pgbench could not be really a good tool for testing the performances? If I use a sql script of thousands of insert records and compare on the same server between pg10 and pg11 I get pretty much the same result (maybe better on pg11)2) regarding preprod, is there any way to reduce those lseek()? Just to let you know, comparing the same insert script between the first server, the first server takes 2m the second one takes 5-7m.Thanks a lot,Il giorno gio 7 mar 2019 alle ore 04:47 Thomas Munro <thomas.munro@gmail.com> ha scritto:On Thu, Mar 7, 2019 at 12:12 AM Nicola Contu <nicola.contu@gmail.com> wrote:
> This is instead the strace of another server running the same version compiled but that is even slower.
Huh. That's a lot of lseek(). Some of these will be for random
reads/writes and will go way in v12, and some will be for probing the
size of relations while planning, and some while executing scans. I
bet you could make some of them go away by using prepared statements.
Does the query in your test involve many partitions/tables?
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
32.50 143.010306 7 21044095 lseek
26.21 115.354045 14 8144577 read
6.18 27.185578 16 1669889 10 sendto
5.29 23.300584 57 407528 fdatasync
4.93 21.709522 9 2313529 824174 recvfrom
3.31 14.547568 19 765897 write
2.73 12.007486 14 867088 14494 epoll_wait
2.18 9.597460 15 659871 84097 futex
1.85 8.147759 14 567414 close
1.77 7.767832 18 437656 11319 open
The other results had 1 usec lseek(), and much fewer of them relative
to the number of reads and writes. BTW, are you comparing v10 and v11
on the same hardware, kernel, filesystem? Just wondering if there
could be some change in syscall overhead on different kernel patch
levels or something like that: we see 7 usec vs 1 usec in those two
files (though I have no idea how reliable these times are) and if
we're going to call it 21 million times at some point it might
matter...
--
Thomas Munro
https://enterprisedb.com
On Tue, Mar 12, 2019 at 10:49 PM Nicola Contu <nicola.contu@gmail.com> wrote: > Il giorno gio 7 mar 2019 alle ore 09:39 Nicola Contu <nicola.contu@gmail.com> ha scritto: >> So the first file is on Postgres11.2 on a test server (and where I compare 10 vs 11) >> The second file, is our preprod machine running Postgres 11.2 (different hardware etc, it is a VM). I know that couldbe confusing, but I just wanted to compare that too because if you see the two files there's a lot of difference betweenthe two machines. >> And they are both running CentOS 7. >> >> So at this point I have two problems. One inside the machine between Postgres 10 and 11 and another problem on the preprod(similar to prod) with a lot of lseek. So we still don't know what is different between 10 and 11. We'll need some more evidence to understand that. We know that perf said 11 was spending more time making syscalls, but we haven't seen any syscall measurement from the 10 system so we don't know what those supposed extra syscalls are. Also, backing up a bit, have you compared the query plan to see if it's the same, and can we please see it? As for the preprod/lots-of-lseek system, I don't know, it seems there are even more unknown variables there... if you can say more about the virtualisation technology you're using, perhaps someone who knows more about that would have some ideas. Besides apparently slower syscalls, one factor that is unexplained is why that system is calling lseek() more times per query (but you said there may be other work happening on the server, so who knows). -- Thomas Munro https://enterprisedb.com