Обсуждение: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
This question was posted originally on http://dba.stackexchange.com/questions/96444/cant-get-dell-pe-t420-perc-h710-perform-better-than-a-macmini-with-postgresql and they suggested to post it on this mailing list.
It's months that I'm trying to solve a performance issue with PostgreSQL. I’m able to give you all the technical details needed.
SYSTEM CONFIGURATION
Our deployment machine is a Dell PowerEdge T420 with a Perc H710 RAID controller configured in this way:
- VD0: two 15k SAS disks (ext4, OS partition, WAL partition, RAID1)
- VD1: ten 10k SAS disks (XFS, Postgres data partition, RAID5)
This system has the following configuration:
- Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-48-generic x86_64)
- 128GB RAM (DDR3, 8x16GB @1600Mhz)
- two Intel Xeon E5-2640 v2 @2Ghz
- Dell Perc H710 with 512MB RAM (Write cache: "WriteBack", Read cache: "ReadAhead", Disk cache: "disabled"):
- VD0 (OS and WAL partition): two 15k SAS disks (ext4, RAID1)
- VD1 (Postgres data partition): ten 10k SAS disks (XFS, RAID5)
- PostgreSQL 9.4 (updated to the latest available version)
- moved pg_stat_tmp to RAM disk
My personal low cost and low profile development machine is a MacMini configured in this way:
- OS X Server 10.7.5
- 8GB RAM (DDR3, 2x4GB @1333Mhz)
- one Intel i7 @2.2Ghz
- two Internal 500GB 7.2k SAS HDD (non RAID) for OS partition
- external Promise Pegasus R1 connected with Thunderbolt v1 (512MB RAM, four 1TB 7.2k SAS HDD 32MB cache, RAID5, Write cache: "WriteBack", Read cache: "ReadAhead", Disk cache: "enabled", NCQ: "enabled")
- PostgreSQL 9.0.13 (the original built-in shipped with OS X Server)
- moved pg_stat_tmp to RAM disk
So far I've made a lot of tuning adjustments to both machines, including kernel reccomended ones on the official Postgres doc site.
APPLICATION
The deployment machine runs a web platform which instructs Postgres to make big transactions over billion of records. It's a platform designed for one user because system resources have to be dedicated as much as possible to one single job due to data size (I don't like to call it big data because big data are in the order ob ten of billion).
ISSUEs
I've found the deployment machine to be a lot slower than the development machine. This is paradoxal because the two machine really differs in many aspects. I've run many queries to investigate this strange behaviour and have done a lot of tuning adjustments.
During the last two months I've prepared and executed two type of query sets:
- A: these sets make use of
SELECT ... INTO
,CREATE INDEX
,CLUSTER
andVACUUM ANALYZE
. - B: these sets are from our application generated transactions and make use of
SELECT
over the tables created with set A.
A and B were always slower on T420. The only type of operation that was faster is the VACUUM ANALYZE
.
RESULTS
A type set:
- T420: went from 311seconds (default
postgresql.conf
) to 195seconds doing tuning adjustments over RAID, kernel andpostgresql.conf
; - MacMini: 40seconds.
B type set:
- T420: 141seconds;
- MacMini: 101seconds.
I've to mention that we have also adjusted the BIOS on T420 setting all possible parameters to "performance" and disabling low energy profiles. This lowered time execution over a type A set from 240seconds to 211seconds.
We have also upgrade all firmware and BIOS to the latest available versions.
Here are two benchmarks generated using pg_test_fsync
:
T420 pg_test_fsync
60 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default) open_datasync 23358.758 ops/sec 43 usecs/op fdatasync 21417.018 ops/sec 47 usecs/op fsync 21112.662 ops/sec 47 usecs/op fsync_writethrough n/a open_sync 23082.764 ops/sec 43 usecs/op
Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default) open_datasync 11737.746 ops/sec 85 usecs/op fdatasync 19222.074 ops/sec 52 usecs/op fsync 18608.405 ops/sec 54 usecs/op fsync_writethrough n/a open_sync 11510.074 ops/sec 87 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.) 1 * 16kB open_sync write 21484.546 ops/sec 47 usecs/op 2 * 8kB open_sync writes 11478.119 ops/sec 87 usecs/op 4 * 4kB open_sync writes 5885.149 ops/sec 170 usecs/op 8 * 2kB open_sync writes 3027.676 ops/sec 330 usecs/op 16 * 1kB open_sync writes 1512.922 ops/sec 661 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.) write, fsync, close 17946.690 ops/sec 56 usecs/op write, close, fsync 17976.202 ops/sec 56 usecs/op
Non-Sync'ed 8kB writes: write 343202.937 ops/sec 3 usecs/op
MacMini pg_test_fsync
60 seconds per test
Direct I/O is not supported on this platform.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default) open_datasync 3780.341 ops/sec 265 usecs/op fdatasync 3117.094 ops/sec 321 usecs/op fsync 3156.298 ops/sec 317 usecs/op fsync_writethrough 110.300 ops/sec 9066 usecs/op open_sync 3077.932 ops/sec 325 usecs/op
Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default) open_datasync 1522.400 ops/sec 657 usecs/op fdatasync 2700.055 ops/sec 370 usecs/op fsync 2670.652 ops/sec 374 usecs/op fsync_writethrough 98.462 ops/sec 10156 usecs/op open_sync 1532.235 ops/sec 653 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.) 1 * 16kB open_sync write 2634.754 ops/sec 380 usecs/op 2 * 8kB open_sync writes 1547.801 ops/sec 646 usecs/op 4 * 4kB open_sync writes 801.542 ops/sec 1248 usecs/op 8 * 2kB open_sync writes 405.515 ops/sec 2466 usecs/op 16 * 1kB open_sync writes 204.095 ops/sec 4900 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.) write, fsync, close 2747.345 ops/sec 364 usecs/op write, close, fsync 3070.877 ops/sec 326 usecs/op
Non-Sync'ed 8kB writes: write 3275.716 ops/sec 305 usecs/op
This confirms the hardware IO capabilities of T420 but doesn't explain why MacMini is MUCH MORE FAST.
Now let’s propose some query profiling times.
B type set are transactions, so it's impossible for me to post EXPLAIN ANALYZE
results. I've extracted two querys from a single transactions and executed the twos on both system. Here are the results:
T420
Query B_1 [55999.649 ms + 0.639 ms] http://explain.depesz.com/s/LbM
Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06
MacMini
Query B_1 [56315.614 ms] http://explain.depesz.com/s/uZTx
Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk
COMPILING PGSQL
-flto
option due to several errors returned by make.
After two days of testing I went down from 195 to 189 seconds on T420 where MacMini still is 40 seconds (A set); and from 141 to 129 seconds where MacMini is 101 seconds (B set). On MacMini I’ve used the built-in pgsql 9.0.13 version while on T420 I've used the following optimal compiling options:./configure CFLAGS="-O3 -fno-inline-functions -march=native" --with-openssl --with-libxml --with-libxslt --with-wal-blocksize=64 --with-blocksize=32 --with-wal-segsize=64 --with-segsize=1
I've also tried to disable Hyper-Threading with echo 0 > /sys/devices/system/cpu/cpuN/online
where cpuN
is the N-th logical CPU but nothing changed over B set queries. We have 2 CPU with 8 cores for a total of 16 physical cores and 16 logical cores.
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Hi Pietro, On Wed, Apr 1, 2015 at 3:56 PM, Pietro Pugni <pietro.pugni@gmail.com> wrote: > T420: went from 311seconds (default postgresql.conf) to 195seconds doing > tuning adjustments over RAID, kernel and postgresql.conf; > MacMini: 40seconds. I'am afraid, the matter is, that PostgreSQL is not configured properly (and so do operating system and probably controller, however pg_test_fsync shows that things are not so bad there as with postgresql.conf). It is pretty useless to benchmark a database using out-of-the-box configuration. You need at least configure shared memory related, checkpoints-related and autovacuum-related settings. And as a first step, please compare postgresql.conf on Mac and on the server: sometimes (with some mac installers) default postgresql.conf can be not the same as on server. Best regards, Ilya -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 ik@postgresql-consulting.com
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Now let’s propose some query profiling times.
B type set are transactions, so it's impossible for me to post
EXPLAIN ANALYZE
results. I've extracted two querys from a single transactions and executed the twos on both system. Here are the results:T420
Query B_1 [55999.649 ms + 0.639 ms] http://explain.depesz.com/s/LbM
Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06
MacMini
Query B_1 [56315.614 ms] http://explain.depesz.com/s/uZTx
Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk
On Wed, Apr 1, 2015 at 9:56 AM, Pietro Pugni <pietro.pugni@gmail.com> wrote:Now let’s propose some query profiling times.
B type set are transactions, so it's impossible for me to post
EXPLAIN ANALYZE
results. I've extracted two querys from a single transactions and executed the twos on both system. Here are the results:T420
Query B_1 [55999.649 ms + 0.639 ms] http://explain.depesz.com/s/LbM
Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06
MacMini
Query B_1 [56315.614 ms] http://explain.depesz.com/s/uZTx
Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk
Looking at the 2 B_2 queries (since they are so drastically different), the in-memory quicksorts stand out on the Dell as being *drastically* slower than the disk-based sorts on your mac-mini....
Ang Wei Shan
This question was posted originally on http://dba.stackexchange.com/questions/96444/cant-get-dell-pe-t420-perc-h710-perform-better-than-a-macmini-with-postgresql and they suggested to post it on this mailing list.
It's months that I'm trying to solve a performance issue with PostgreSQL. I’m able to give you all the technical details needed.
SYSTEM CONFIGURATION
Our deployment machine is a Dell PowerEdge T420 with a Perc H710 RAID controller configured in this way:
- two Intel Xeon E5-2640 v2 @2Ghz
- PostgreSQL 9.4 (updated to the latest available version)
My personal low cost and low profile development machine is a MacMini configured in this way:
- one Intel i7 @2.2Ghz
- PostgreSQL 9.0.13 (the original built-in shipped with OS X Server)
Here are two benchmarks generated using
pg_test_fsync
:
T420
Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06
MacMini
Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk
How can you sort 2,951,191 but then materialize 4,458,971 rows out of that? I've never seen that before. (Or, in the other plan, put 2,951,191 rows into the sort from the CTE but get 4,458,971 out of the sort?
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Ok, a quick view on the system, and some things that may be important to note: > Our deployment machine is a Dell PowerEdge T420 with a Perc H710 RAID > controller configured in this way: > > * VD0: two 15k SAS disks (ext4, OS partition, WAL partition, > RAID1) > * VD1: ten 10k SAS disks (XFS, Postgres data partition, RAID5) > Well...usually RAID5 have the worst performance in writing...EVER!!! Have you tested this in another raid configuration?RAID10 is usually the best bet. > > > This system has the following configuration: > > * Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-48-generic x86_64) > * 128GB RAM (DDR3, 8x16GB @1600Mhz) > * two Intel Xeon E5-2640 v2 @2Ghz > * Dell Perc H710 with 512MB RAM (Write cache: "WriteBack", Read > cache: "ReadAhead", Disk cache: "disabled"): > * VD0 (OS and WAL partition): two 15k SAS disks (ext4, RAID1) > * VD1 (Postgres data partition): ten 10k SAS disks (XFS, > RAID5) > * PostgreSQL 9.4 (updated to the latest available version) > * moved pg_stat_tmp to RAM disk > > [...]> versions. > You did not mention any "postgres" configuration at all. If you let the default checkpoint_segments=3, that would be an IOhell for your disk controler...and the RAID5 making things worst...Can you show us the values of: checkpoint_segments shared_buffers work_mem maintenance_work_mem effective_io_concurrency I would start from there, few changes, and check again. I would change the RAID first of all things, and try those testsagain. Cheers. Gerardo
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Ok, a quick view on the system, and some things that may be important to note:Our deployment machine is a Dell PowerEdge T420 with a Perc H710 RAID
controller configured in this way:
* VD0: two 15k SAS disks (ext4, OS partition, WAL partition,
RAID1)
* VD1: ten 10k SAS disks (XFS, Postgres data partition, RAID5)
Well...usually RAID5 have the worst performance in writing...EVER!!! Have you tested this in another raid configuration? RAID10 is usually the best bet.[...]> versions.
This system has the following configuration:
* Ubuntu 14.04.2 LTS (GNU/Linux 3.13.0-48-generic x86_64)
* 128GB RAM (DDR3, 8x16GB @1600Mhz)
* two Intel Xeon E5-2640 v2 @2Ghz
* Dell Perc H710 with 512MB RAM (Write cache: "WriteBack", Read
cache: "ReadAhead", Disk cache: "disabled"):
* VD0 (OS and WAL partition): two 15k SAS disks (ext4, RAID1)
* VD1 (Postgres data partition): ten 10k SAS disks (XFS,
RAID5)
* PostgreSQL 9.4 (updated to the latest available version)
* moved pg_stat_tmp to RAM diskYou did not mention any "postgres" configuration at all. If you let the default checkpoint_segments=3, that would be an IO hell for your disk controler...and the RAID5 making things worst...Can you show us the values of:
checkpoint_segments
shared_buffers
work_mem
maintenance_work_mem
effective_io_concurrency
I would start from there, few changes, and check again. I would change the RAID first of all things, and try those tests again.
Cheers.
Gerardo
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
On Wed, Apr 1, 2015 at 6:56 AM, Pietro Pugni <pietro.pugni@gmail.com> wrote:This question was posted originally on http://dba.stackexchange.com/questions/96444/cant-get-dell-pe-t420-perc-h710-perform-better-than-a-macmini-with-postgresql and they suggested to post it on this mailing list.
It's months that I'm trying to solve a performance issue with PostgreSQL. I’m able to give you all the technical details needed.
SYSTEM CONFIGURATION
Our deployment machine is a Dell PowerEdge T420 with a Perc H710 RAID controller configured in this way:
- two Intel Xeon E5-2640 v2 @2Ghz
- PostgreSQL 9.4 (updated to the latest available version)
My personal low cost and low profile development machine is a MacMini configured in this way:
- one Intel i7 @2.2Ghz
- PostgreSQL 9.0.13 (the original built-in shipped with OS X Server)
Using such different versions of PostgreSQL seems like a recipe for frustration.Here are two benchmarks generated using
pg_test_fsync
:This is unlikely to be important for the type of workload you describe. Fsyncs are the bottleneck for many short transactions, but not often the bottleneck for very large transactions.
T420
Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06
MacMini
Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk
What collation is used for both databases? Perhaps the T420 is using a much slower collation.
How can you sort 2,951,191 but then materialize 4,458,971 rows out of that? I've never seen that before. (Or, in the other plan, put 2,951,191 rows into the sort from the CTE but get 4,458,971 out of the sort?Cheers,Jeff
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Just looking at the 2 B_2 queries, I'm curious as to why is the execution plan different between the 2 machines. Is the optimiser stats updated on both databases?Regards,Wei ShanOn 1 April 2015 at 22:32, Aidan Van Dyk <aidan@highrise.ca> wrote:On Wed, Apr 1, 2015 at 9:56 AM, Pietro Pugni <pietro.pugni@gmail.com> wrote:Now let’s propose some query profiling times.
B type set are transactions, so it's impossible for me to post
EXPLAIN ANALYZE
results. I've extracted two querys from a single transactions and executed the twos on both system. Here are the results:T420
Query B_1 [55999.649 ms + 0.639 ms] http://explain.depesz.com/s/LbM
Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06
MacMini
Query B_1 [56315.614 ms] http://explain.depesz.com/s/uZTx
Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk
Looking at the 2 B_2 queries (since they are so drastically different), the in-memory quicksorts stand out on the Dell as being *drastically* slower than the disk-based sorts on your mac-mini....--Regards,
Ang Wei Shan
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Hi Pietro,
On Wed, Apr 1, 2015 at 3:56 PM, Pietro Pugni <pietro.pugni@gmail.com> wrote:T420: went from 311seconds (default postgresql.conf) to 195seconds doing
tuning adjustments over RAID, kernel and postgresql.conf;
MacMini: 40seconds.
I'am afraid, the matter is, that PostgreSQL is not configured properly
(and so do operating system and probably controller, however
pg_test_fsync shows that things are not so bad there as with
postgresql.conf).
It is pretty useless to benchmark a database using out-of-the-box
configuration. You need at least configure shared memory related,
checkpoints-related and autovacuum-related settings. And as a first
step, please compare postgresql.conf on Mac and on the server:
sometimes (with some mac installers) default postgresql.conf can be
not the same as on server.
Best regards,
Ilya
--
Ilya Kosmodemiansky,
PostgreSQL-Consulting.com
tel. +14084142500
cell. +4915144336040
ik@postgresql-consulting.com
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Hi Pietro, The modern CPUs trying to be too smart. try to run this code to disable CPUs c-states: ----> setcpulatency.c #include <stdio.h> #include <fcntl.h> #include <stdint.h> int main(int argc, char **argv) { int32_t l; int fd; if (argc != 2) { fprintf(stderr, "Usage: %s <latency in us>\n", argv[0]); return 2; } l = atoi(argv[1]); printf("setting latency to %d us\n", l); fd = open("/dev/cpu_dma_latency", O_WRONLY); if (fd < 0) { perror("open /dev/cpu_dma_latency"); return 1; } if (write(fd, &l, sizeof(l)) != sizeof(l)) { perror("write to /dev/cpu_dma_latency"); return 1; } while (1) pause(); } ----> you can use i7z (https://code.google.com/p/i7z/) to see the percentage of CPU power to be used. Changing CPU from C1 to C0 takes quite some time and for DB workload not optimal (if you need a high throughout and any given moment). I see ~65% boost when run './setcpulatency 0'. Tigran. ----- Original Message ----- > From: "Pietro Pugni" <pietro.pugni@gmail.com> > To: ik@postgresql-consulting.com > Cc: "pgsql-performance" <pgsql-performance@postgresql.org> > Sent: Thursday, April 2, 2015 12:57:22 PM > Subject: Re: [PERFORM] Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL > Hi Ilya, > thank your for your response. > Both system were configured for each test I’ve done. On T420 I’ve optimized the > kernel following the official Postgres documentation ( > http://www.postgresql.org/docs/9.4/static/kernel-resources.html ): > kernel.shmmax=68719476736 > kernel.shmall=16777216 > vm.overcommit_memory=2 > vm.overcommit_ratio=90 > > > RAID controllers were configured as following: > - Write cache: WriteBack > - Read cache: ReadAhead > - Disk cache (only T420): disabled to take full advantage of WriteBack cache > (BBU is charged and working) > - NCQ (only MacMini because it’s a SATA option): enabled (this affects a lot the > overall performance) > > For postgresql.conf: > > T420 > Normal operations > autovacuum = on > maintenance_work_mem = 512MB > work_mem = 512MB > wal_buffers = 64MB > effective_cache_size = 64GB # this helps A LOT in disk write speed when creating > indexes > shared_buffers = 32GB > checkpoint_segments = 2000 > checkpoint_completion_target = 1.0 > effective_io_concurrency = 0 # 1 doesn’t make any substantial difference > max_connections = 10 # 20 doesn’t make any difference > > Data loading (same as above with the following changes): > autovacuum = off > maintenance_work_mem = 64GB > > > MacMini > Normal operations > autovacuum = on > maintenance_work_mem = 128MB > work_mem = 32MB > wal_buffers = 32MB > effective_cache_size = 800MB > shared_buffers = 512MB > checkpoint_segments = 32 > checkpoint_completion_target = 1.0 > effective_io_concurrency = 1 > max_connections = 20 > > Data loading (same as above with the following changes): > autovacuum = off > maintenance_work_mem = 6GB > > > Best regards, > Pietro > > > > Il giorno 01/apr/2015, alle ore 16:27, Ilya Kosmodemiansky > <ilya.kosmodemiansky@postgresql-consulting.com> ha scritto: > >> Hi Pietro, >> >> On Wed, Apr 1, 2015 at 3:56 PM, Pietro Pugni <pietro.pugni@gmail.com> wrote: >>> T420: went from 311seconds (default postgresql.conf) to 195seconds doing >>> tuning adjustments over RAID, kernel and postgresql.conf; >>> MacMini: 40seconds. >> >> I'am afraid, the matter is, that PostgreSQL is not configured properly >> (and so do operating system and probably controller, however >> pg_test_fsync shows that things are not so bad there as with >> postgresql.conf). >> >> It is pretty useless to benchmark a database using out-of-the-box >> configuration. You need at least configure shared memory related, >> checkpoints-related and autovacuum-related settings. And as a first >> step, please compare postgresql.conf on Mac and on the server: >> sometimes (with some mac installers) default postgresql.conf can be >> not the same as on server. >> >> Best regards, >> Ilya >> >> >> -- >> Ilya Kosmodemiansky, >> >> PostgreSQL-Consulting.com >> tel. +14084142500 >> cell. +4915144336040 > > ik@postgresql-consulting.com
Hi, On Thu, Apr 2, 2015 at 12:47 PM, Pietro Pugni <pietro.pugni@gmail.com> wrote: > Hi Jeff, > thank you for your response. > I’m using Postgres 9.0 on MacMini because I’ve noticed that it’s quite fast > compared to different Ubuntu machines on which I’ve worked with different > (and more performant) hardware. > The built-in Postgres version on OS X Server is impossible to update. I > should stop it and install a parallel and independent distribution which has > not been optimized by Apple. On opensource.appel.com they have different > Postgres versions but the latest one is 9.2.x. They stopped updating it in > 2012. If you want you can compile 9.0 on OSX and double check. I don't remember well but ITSM that a fsync used by psql was a noop on OSX. > pg_test_fsync tells me that T420 disk iops are ~7 times faster than MacMini, > which is ok, but queries run ~2-5 times slower (for brevity I didn’t report > all test results in my first mail). > > I’ve searched just now what a collation is because I’ve never explicitly > used one before, so I think it uses the default one. What's the output of free and sysctl -a | grep vm.zone_reclaim_mode Search the mailing list for zone_reclaim_mode there's some tips. For testing you can also use the mac mini config with the dell, at least it should give you the same plan. With your example disks don't seem to matter, it's all in memory. Keep in mind that a psql query is still single thread so the mac and the dell should get more or less the same speed for in memory queries. > > B_2 query is of the form: > WITH soggetti AS ( > SELECT ... FROM ... GROUP BY ...) > SELECT ... INTO ... FROM soggetti, ... WHERE ... > > (I omit the … part because they’re not relevant) > > Best regards, > Pietro
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
T420work_mem = 512MB
MacMiniwork_mem = 32MB
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
T420work_mem = 512MBMacMiniwork_mem = 32MBSo that is why the T420 does memory sorts and the mini does disk sorts.
I'd start looking at why memory sorts on the T420 is so slow. Check your numa settings, etc (as already mentioned).For a drastic test, disable the 2nd socket on the dell, and just use one (eliminate any numa/QPI costs) and see how it compares to the no-numa MacMini.
If you want to see how bad the NUMA/QPI is, play with stream to benchmark memory performance.
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Hi,
On Thu, Apr 2, 2015 at 12:47 PM, Pietro Pugni <pietro.pugni@gmail.com> wrote:Hi Jeff,If you want you can compile 9.0 on OSX and double check.
thank you for your response.
I’m using Postgres 9.0 on MacMini because I’ve noticed that it’s quite fast
compared to different Ubuntu machines on which I’ve worked with different
(and more performant) hardware.
The built-in Postgres version on OS X Server is impossible to update. I
should stop it and install a parallel and independent distribution which has
not been optimized by Apple. On opensource.appel.com they have different
Postgres versions but the latest one is 9.2.x. They stopped updating it in
2012.
I don't remember well but ITSM that a fsync used by psql was a noop on OSX.
pg_test_fsync tells me that T420 disk iops are ~7 times faster than MacMini,
which is ok, but queries run ~2-5 times slower (for brevity I didn’t report
all test results in my first mail).
I’ve searched just now what a collation is because I’ve never explicitly
used one before, so I think it uses the default one.
What's the output of free and sysctl -a | grep vm.zone_reclaim_mode
Search the mailing list for zone_reclaim_mode there's some tips.
For testing you can also use the mac mini config with the dell, at
least it should give you the same plan.
With your example disks don't seem to matter, it's all in memory.
Keep in mind that a psql query is still single thread so the mac andYes I know ;) With 128GB I try to maximize RAM usage, but it’s difficult to fully understand how to achieve this.
the dell should get more or less the same speed for in memory queries.
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
the commanddmesg | grep -i numadoesn’t display me anything. I think T420 hasn’t NUMA on it. Is there a way to enable it from Ubuntu? I don’t have immediate access to BIOS (server is in another location).
For QPI I don’t know what to do. Please, can you give me more details?
If you want to see how bad the NUMA/QPI is, play with stream to benchmark memory performance.With stream you refer to this: https://sites.utexas.edu/jdm4372/tag/stream-benchmark/ ? Do you suggest me some way to do this kind of tests?
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
The modern CPUs trying to be too smart.
try to run this code to disable CPUs c-states:
----> setcpulatency.c
#include <stdio.h>
#include <fcntl.h>
#include <stdint.h>
int main(int argc, char **argv) {
int32_t l;
int fd;
if (argc != 2) {
fprintf(stderr, "Usage: %s <latency in us>\n", argv[0]);
return 2;
}
l = atoi(argv[1]);
printf("setting latency to %d us\n", l);
fd = open("/dev/cpu_dma_latency", O_WRONLY);
if (fd < 0) {
perror("open /dev/cpu_dma_latency");
return 1;
}
if (write(fd, &l, sizeof(l)) != sizeof(l)) {
perror("write to /dev/cpu_dma_latency");
return 1;
}
while (1) pause();
}
——>
you can use i7z (https://code.google.com/p/i7z/) to see the percentage of CPU power to be used.
Changing CPU from C1 to C0 takes quite some time and for DB workload not optimal (if you need a
high throughout and any given moment).
I see ~65% boost when run './setcpulatency 0'.
Tigran.
Hi On Thu, Apr 2, 2015 at 3:52 PM, Pietro Pugni <pietro.pugni@gmail.com> wrote: > > I’ve searched just now what a collation is because I’ve never explicitly > used one before, so I think it uses the default one. > > > What's the output of free and sysctl -a | grep vm.zone_reclaim_mode > > Search the mailing list for zone_reclaim_mode there's some tips. > > vm.zone_reclaim_mode = 0 In my understanding it's the rigth value there's also huge page /sys/kernel/mm/transparent_hugepage/enabled can you try to disable it? Also test on the dell: select tmp.cf, tmp.dt from grep_studi.tmp; and select tmp.cf, tmp.dt from grep_studi.tmp order by tmp.cf; in Query B_2 the sort is 9 time slower on the dell, you have to find why... > > For testing you can also use the mac mini config with the dell, at > least it should give you the same plan. > With your example disks don't seem to matter, it's all in memory. > T420 with optimal postgresql.conf > Query B_1 [55999.649 ms + 0.639 ms] http://explain.depesz.com/s/LbM > Query B_2 [95664.832 ms + 0.523 ms] http://explain.depesz.com/s/v06 > > > T420 with MacMini postgresql.conf > Query B_1 [51280.208ms + 0.699ms] http://explain.depesz.com/s/wlb > Query B_2 [177278.205ms + 0.428ms] http://explain.depesz.com/s/rzr > 32 GB for buffers is too high for the queries in your test but it doesn't matter. > MacMini > Query B_1 [56315.614 ms] http://explain.depesz.com/s/uZTx > Query B_2 [44890.813 ms] http://explain.depesz.com/s/y7Dk >
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
there's also huge page
/sys/kernel/mm/transparent_hugepage/enabled
can you try to disable it?
Also test on the dell:
select tmp.cf, tmp.dt from grep_studi.tmp;
and
select tmp.cf, tmp.dt from grep_studi.tmp order by tmp.cf;
in Query B_2
the sort is 9 time slower on the dell, you have to find why…
select tmp.cf, tmp.dt from grep_studi.tmp;
select tmp.cf, tmp.dt from grep_studi.tmp;
32 GB for buffers is too high for the queries in your test but it
doesn't matter.
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
NUMA stands for "Non-Uniform-Memory-Access" . It's basically the "label" for systems which have memory attached to different cpu sockets, such that accessing all of the memory from a paritciular cpu thread has different costs based on where the actual memory is located (i.e. on some other socket, or the local socket).
QPI is the the intel "QuickPath Interconnect". It's the communication path between CPU sockets. Memory ready by one cpu thread that has to come from another cpu socket's memory controller goes through QPI.
Google has lots of info on these, and how they impact performance, etc.
I’ve done some tests with sysbench on Dell T420 (via apt-get install) and MacMini (I’ve compiled the latest available sources at https://github.com/akopytov/sysbench ).If you want to see how bad the NUMA/QPI is, play with stream to benchmark memory performance.With stream you refer to this: https://sites.utexas.edu/jdm4372/tag/stream-benchmark/ ? Do you suggest me some way to do this kind of tests?Ya, that's the one. I don't have specific tests in mind.
Here are some results with 16GB RAM read and written at 1MB block size (I don’t know if this makes sense, but I’ve no problem in changing these parameters).
It returns the following output:A more simple "overview" might be "numactl —hardware”
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
It returns the following output:sh-4.3# numactl --hardwareavailable: 2 nodes (0-1)node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30node 0 size: 64385 MBnode 0 free: 56487 MBnode 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31node 1 size: 64508 MBnode 1 free: 62201 MBnode distances:node 0 10: 10 201: 20 10
cat /proc/sys/vm/zone_reclaim_mode
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Did you already post the results of:cat /proc/sys/vm/zone_reclaim_mode
Also, how big did you say your dataset is? Based on the output of free, you're certainly not using all the memory you have. That could be just because you haven't accessed that much of your dataset, or it could be because zone reclaim is preventing you from using your entire amount of RAM as file system cache.
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Hi Josh, at the moment the server is unreachable so I can’t calculate sizes. I run all of my test both with all data loaded into Postgresand with no data loaded (except from the single 20mln rows table with relative indexes). To give you an idea, with all data loaded into Postgres with indexes the space occupied is approximately 1.2-1.5TB and freespace on is about 800GB. Many thanks. Best regads, Pietro Il giorno 03/apr/2015, alle ore 17:21, Josh Krupka <jkrupka@gmail.com> ha scritto: > Sorry, how much disk space is actually used by the tables, indexes, etc involved in your queries? Or it that's a bit muchto get, how much disk space is occupied by your database in total?
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
On Tue, Apr 7, 2015 at 6:27 AM, Pietro Pugni <pietro.pugni@gmail.com> wrote:Hi Jeff,sorry for the latency but server was down due to a error I made in the sysctl.conf file.Yes, but are the defaults for those two systems? on psql, use \l to see.
\l returns the following:T420 (Postgres 9.4.1)List of databasesName | Owner | Encoding | Collate | Ctype | Access privileges-----------+----------+----------+-------------+-------------+-----------------------grep | grep | UTF8 | en_US.UTF-8 | en_US.UTF-8 |postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +| | | | | postgres=CTc/postgrestemplate1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +| | | | | postgres=CTc/postgres(4 rows)MacMini (Postgres 9.0.13)List of databasesName | Owner | Encoding | Collation | Ctype | Access privileges-------------------+------------+----------+-----------+-------+-------------------------caldav | caldav | UTF8 | C | C |collab | collab | UTF8 | C | C |device_management | _devicemgr | UTF8 | C | C |pen | pen | UTF8 | C | C |postgres | _postgres | UTF8 | C | C |roundcubemail | roundcube | UTF8 | C | C |template0 | _postgres | UTF8 | C | C | =c/_postgres +| | | | | _postgres=CTc/_postgrestemplate1 | _postgres | UTF8 | C | C | =c/_postgres +| | | | | _postgres=CTc/_postgres(8 rows)The difference between the "C" and the "en_US" collation is entirely sufficient to explain the difference in performance. "C" is the fastest possible collation as it never needs to look ahead or consult tables, it just compares raw bytes.Cheers,Jeff
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
On Tue, Apr 7, 2015 at 6:27 AM, Pietro Pugni <pietro.pugni@gmail.com> wrote:Hi Jeff,sorry for the latency but server was down due to a error I made in the sysctl.conf file.Yes, but are the defaults for those two systems? on psql, use \l to see.
\l returns the following:T420 (Postgres 9.4.1)List of databasesName | Owner | Encoding | Collate | Ctype | Access privileges-----------+----------+----------+-------------+-------------+-----------------------grep | grep | UTF8 | en_US.UTF-8 | en_US.UTF-8 |postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +| | | | | postgres=CTc/postgrestemplate1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +| | | | | postgres=CTc/postgres(4 rows)MacMini (Postgres 9.0.13)List of databasesName | Owner | Encoding | Collation | Ctype | Access privileges-------------------+------------+----------+-----------+-------+-------------------------caldav | caldav | UTF8 | C | C |collab | collab | UTF8 | C | C |device_management | _devicemgr | UTF8 | C | C |pen | pen | UTF8 | C | C |postgres | _postgres | UTF8 | C | C |roundcubemail | roundcube | UTF8 | C | C |template0 | _postgres | UTF8 | C | C | =c/_postgres +| | | | | _postgres=CTc/_postgrestemplate1 | _postgres | UTF8 | C | C | =c/_postgres +| | | | | _postgres=CTc/_postgres(8 rows)The difference between the "C" and the "en_US" collation is entirely sufficient to explain the difference in performance. "C" is the fastest possible collation as it never needs to look ahead or consult tables, it just compares raw bytes.Cheers,JeffHi Jeff,is there a way to set a default collection during compiling or in the configuration file? I have never specified one, so I suppose that somewhere on MacMini “C” collation type is set as the default one.Thank you a lot.Best regards,Pietro
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
The default collation for the database cluster is set when you create the cluster with initdb (the package you used to install postgresql might provide scripts that wrap initdb and call it something else, sorry I can't be much use with those).You can set it with --lc-collate flag to initdb, otherwise it is set based on the environment variables (LANG or LC_* variables) set in the shell you use to run initdb.Note that you can create a new database in the cluster which has its own default which is different from the cluster's default.
Re: Can't get Dell PE T420 (Perc H710) perform better than a MacMini with PostgreSQL
Ciao Pietro,stavo seguendo thread sulla mailing list Postgresql.Puoi farmi un piccolo riassunto delle conclusioni perchè non sono sicuro di aver capito tutto?
Alla fine la differenza di performance tra il DELL R420 e il Mac MINI è dovuta al tipo di "collate" utilizzato nell'inizializzazione del DB?