Обсуждение: Are we in the ballpark?
We're building a new database box. With the help of Gregory Smith's book, we're benchmarking the box: We want to know that we've set it up right, we want numbers to go back to if we have trouble later, and we want something to compare our _next_ box against. What I'd like to know is, are the performance numbers we're getting in the ballpark for the class of hardware we've picked? First, the setup: CPU: Two AMD Opteron 6128 (Magny-Cours) 2000 mHz, eight cores each RAM: DDR3-1333 64 GB (ECC) RAID: 3Ware 9750 SAS2/SATA-II PCIe, 512 MB battery backed cache, write-back caching enabled. Drives: 16 Seagate ST3500414SS 500GB 7200RPM SAS, 16 MB cache: 2 RAID1 ($PG_DATA/xlog) 12 RAID10 ($PG_DATA) 2 hot spare PostgreSQL 8.4.1 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 4.3.4-2) 4.3.4, 64-bit File system: XFS (nobarrier, noatime) i/o scheduler: noop Database config (differences from stock that might affect performance): shared_buffers = 8192MB temp_buffers = 16MB work_mem = 192MB maintenance_work_mem = 5GB wal_buffers = 8MB checkpoint_segments = 64 checkpoint_completion_target = 0.9 random_page_cost = 1.0 constraint_exclusion = on Now, the test results: Memtest86+ says our memory bandwidth is: L1 32,788 MB/S L2 is 10,050 MB/S L3 is 6,826 MB/S Stream v5.9 says: 1 core: 4,320 2 cores: 8,387 4 cores: 15,840 8 cores: 23,088 16 cores: 24,286 Bonnie++ (-f -n 0 -c 4) $PGDATA/xlog (RAID1) random seek: 369/sec block out: 87 MB/sec block in: 180 MB/sec $PGDATA (RAID10, 12 drives) random seek: 452 block out: 439 MB/sec block in: 881 MB/sec sysbench test of fsync (commit) rate: $PGDATA/xlog (RAID1) cache off: 29 req/sec cache on: 9,342 req/sec $PGDATA (RAID10, 12 drives) cache off: 61 req/sec cache on: 8,191 req/sec pgbench-tools: Averages for test set 1 by scale: avg_ set clients tps latency 90%< max_latency 1 1 29141 0.248 0.342 5.453 1 10 31467 0.263 0.361 7.148 1 100 31081 0.265 0.363 7.843 1 1000 29499 0.278 0.365 11.264 Averages for test set 1 by clients: avg_ set clients tps latency 90%< max_latency 1 1 9527 0.102 0.105 1.5 1 2 13850 0.14 0.195 5.316 1 4 19148 0.19 0.251 2.228 1 8 44101 0.179 0.248 2.557 1 16 50311 0.315 0.381 11.057 1 32 47765 0.666 0.989 24.076 We've used Brad Fitzpatrick's diskchecker script to show that the i/o stack is telling the truth when it comes to fsync. Are there any nails sticking up that we need to pound on before we start more extensive (real-world-ish) testing with this box?
Wayne Conrad wrote: > We're building a new database box. With the help of Gregory Smith's > book, we're benchmarking the box: We want to know that we've set it up > right, we want numbers to go back to if we have trouble later, and we > want something to compare our _next_ box against. Do you not want any excitement in your life? > PostgreSQL 8.4.1 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.3.real > (Debian 4.3.4-2) 4.3.4, 64-bit 8.4.7 is current; there are a lot of useful fixes to be had. See if you can get a newer Debian package installed before you go live with this. > File system: XFS (nobarrier, noatime) Should probably add "logbufs=8" in there too. > shared_buffers = 8192MB > temp_buffers = 16MB > work_mem = 192MB > maintenance_work_mem = 5GB > wal_buffers = 8MB > checkpoint_segments = 64 > checkpoint_completion_target = 0.9 > random_page_cost = 1.0 > constraint_exclusion = on That work_mem is a bit on the scary side of things, given how much memory is allocated to other things. Just be careful you don't get a lot of connections and run out of server RAM. Might as well bump wal_buffers up to 16MB and be done with it. Setting random_page_cost to 1.0 is essentially telling the server the entire database is cached in RAM. If that's not true, you don't want to go quite that far in reducing it. With 8.4, you should be able to keep constraint_exclusion at its default of 'partition' and have that work as expected; any particular reason you forced it to always be 'on'? > Bonnie++ (-f -n 0 -c 4) > $PGDATA/xlog (RAID1) > random seek: 369/sec > block out: 87 MB/sec > block in: 180 MB/sec > $PGDATA (RAID10, 12 drives) > random seek: 452 > block out: 439 MB/sec > block in: 881 MB/sec > > sysbench test of fsync (commit) rate: > > $PGDATA/xlog (RAID1) > cache off: 29 req/sec > cache on: 9,342 req/sec > $PGDATA (RAID10, 12 drives) > cache off: 61 req/sec > cache on: 8,191 req/sec That random seek rate is a little low for 12 drives, but that's probably the limitations of the 3ware controller kicking in there. Your "cache off" figures are really weird though; I'd expect those both to be around 100. Makes me wonder if something weird is happening in the controller, or if there was a problem with your config when testing that. Not a big deal, really--the cached numbers are normally going to be the important ones--but it is odd. Your pgbench SELECT numbers look fine, but particularly given that commit oddity here I'd recommend running some of the standard TPC-B-like tests, too, just to be completely sure there's no problem here. You should get results that look like "Set 3: Longer ext3 tests" in the set I've published to http://www.2ndquadrant.us/pgbench-results/index.htm presuming you let those run for 10 minutes or so. The server those came off of has less RAM and disks than yours, so you'll fit larger database scales into memory before performance falls off, but that gives you something to compare against. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
Greg, It's so nice to get a reply from the author of *the book*. Thank you for taking the time to help us out. On 02/01/11 18:30, Greg Smith wrote: > Do you not want any excitement in your life? I've had database excitement enough to last a lifetime. That's why I'm mending my ways. Your book is the first step of our 12 step program. > 8.4.7 is current; there are a lot of useful fixes to be had. See if you > can get a newer Debian package installed before you go live with this. I'll look for 8.4.7, but we'll be switching to 9 before too long. >> File system: XFS (nobarrier, noatime) > > Should probably add "logbufs=8" in there too. Will do. >> work_mem = 192MB >> wal_buffers = 8MB >> random_page_cost = 1.0 > > That work_mem is a bit on the scary side of things, given how much > memory is allocated to other things. Just be careful you don't get a lot > of connections and run out of server RAM. That's a leftover from the days when we *really* didn't know what we're doing (now we only *mostly* don't know what we're doing). I'll set work_mem down to something less scary. > Might as well bump wal_buffers up to 16MB and be done with it. Will do. > Setting random_page_cost to 1.0 is essentially telling the server the > entire database is cached in RAM. If that's not true, you don't want to > go quite that far in reducing it. Oops, that was a typo. We've set random_page_cost to 2, not 1. > With 8.4, you should be able to keep constraint_exclusion at its default > of 'partition' and have that work as expected; any particular reason you > forced it to always be 'on'? See "we really didn't know what we were doing." We'll leave constraint_exclusion at its default. >> Bonnie++ (-f -n 0 -c 4) >> $PGDATA/xlog (RAID1) >> random seek: 369/sec >> block out: 87 MB/sec >> block in: 180 MB/sec >> $PGDATA (RAID10, 12 drives) >> random seek: 452 >> block out: 439 MB/sec >> block in: 881 MB/sec >> >> sysbench test of fsync (commit) rate: >> >> $PGDATA/xlog (RAID1) >> cache off: 29 req/sec >> cache on: 9,342 req/sec >> $PGDATA (RAID10, 12 drives) >> cache off: 61 req/sec >> cache on: 8,191 req/sec > > That random seek rate is a little low for 12 drives, but that's probably > the limitations of the 3ware controller kicking in there. Your "cache > off" figures are really weird though; I'd expect those both to be around > 100. Makes me wonder if something weird is happening in the controller, > or if there was a problem with your config when testing that. Not a big > deal, really--the cached numbers are normally going to be the important > ones--but it is odd. I also thought the "cache off" figures were odd. I expected something much closer to 120 req/sec (7200 rpm drives). I probably won't investigate that with any vigor, since the cache-on numbers are OK. > Your pgbench SELECT numbers look fine, but particularly given that > commit oddity here I'd recommend running some of the standard TPC-B-like > tests, too, just to be completely sure there's no problem here. You > should get results that look like "Set 3: Longer ext3 tests" in the set > I've published to http://www.2ndquadrant.us/pgbench-results/index.htm > presuming you let those run for 10 minutes or so. The server those came > off of has less RAM and disks than yours, so you'll fit larger database > scales into memory before performance falls off, but that gives you > something to compare against. TCB-B-like tests, will do. Greg, Thanks a million. Wayne Conrad
On Wed, Feb 02, 2011 at 10:06:53AM -0700, Wayne Conrad wrote: > On 02/01/11 18:30, Greg Smith wrote: > >>Bonnie++ (-f -n 0 -c 4) > >>$PGDATA/xlog (RAID1) > >>random seek: 369/sec > >>block out: 87 MB/sec > >>block in: 180 MB/sec > >>$PGDATA (RAID10, 12 drives) > >>random seek: 452 > >>block out: 439 MB/sec > >>block in: 881 MB/sec > >> > >>sysbench test of fsync (commit) rate: > >> > >>$PGDATA/xlog (RAID1) > >>cache off: 29 req/sec > >>cache on: 9,342 req/sec > >>$PGDATA (RAID10, 12 drives) > >>cache off: 61 req/sec > >>cache on: 8,191 req/sec > > > >That random seek rate is a little low for 12 drives, but that's probably > >the limitations of the 3ware controller kicking in there. Your "cache > >off" figures are really weird though; I'd expect those both to be around > >100. Makes me wonder if something weird is happening in the controller, > >or if there was a problem with your config when testing that. Not a big > >deal, really--the cached numbers are normally going to be the important > >ones--but it is odd. > > I also thought the "cache off" figures were odd. I expected > something much closer to 120 req/sec (7200 rpm drives). I probably > won't investigate that with any vigor, since the cache-on numbers > are OK. You may want to look into the "cache off" figures a little more. We run a number of battery backed raid controllers and we test the batteries every 6 months or so. When we test the batteries, the cache goes off line (as it should) to help keep the data valid. If you need to test your raid card batteries (nothing like having a battery with only a 6 hour runtime when it takes you a couple of days MTTR), can your database app survive with that low a commit rate? As you said you ar expecting something almost 4-5x faster with 7200 rpm disks. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111