Обсуждение: Noticed something odd with pgbench
Hey guys, So, we have a pretty beefy system that runs dual X5675's with 72GB of RAM. After our recent upgrade to 9.1, things have been... odd. I managed to track it down to one setting: shared_buffers = 8GB The thing is, we currently have 850 clients connected to our database (I know, that's bad, but the platform is not compatible with pgpool or pgbouncer right now). So I did a pgbench test (scale = 3000) with 850 clients. The 3000 scale was enough to cross the NUMA barrier, because that should force zone flushing if there's a problem. To cheat a little, I preloaded all pgbench tables into memory with dd. I'm not running pgbench to see performance. In this case, it's a load test. And at first, the load test starts normally, and everything looks fine. But if I check /proc/meminfo, I see this within 2-5 minutes: MemFree: 461660 kB Active(file): 23252240 kB Inactive(file): 21272440 kB However, if I change shared_buffers to 4GB, it ends up converging to this: MemFree: 11024696 kB Active(file): 46009064 kB Inactive(file): 239672 kB If you watch the contents of /proc/meminfo during the pgbench, it's pretty clear when the transition starts. It only takes a couple minutes of pgbench, and it just spirals out of control, marking tons of file cache inactive. But according to the OS, that memory is being paged out because the OS is out of genuinely free memory. But how does a 4GB bump in shared_buffers wipe out 12GB? It does the same thing at 6GB. 4GB is safe for hours on end, but 6GB and 8GB implode within in minutes. During this, kswapd goes crazy paging out the cache, at the same time it's reading from disk to put them back in. It's like I fed the kernel poison or something. Has anybody else noticed something like this? I got this behavior with 9.1.6 on a 3.2 kernel. No amount of tweaks in /proc/sys/vm changed anything either, so I'm not convinced it's a NUMA problem. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 16/11/12 19:35, Shaun Thomas wrote: > Hey guys, > > So, we have a pretty beefy system that runs dual X5675's with 72GB of > RAM. After our recent upgrade to 9.1, things have been... odd. I > managed to track it down to one setting: > > shared_buffers = 8GB > > It does the same thing at 6GB. 4GB is safe for hours on end, but 6GB > and 8GB implode within in minutes. During this, kswapd goes crazy > paging out the cache, at the same time it's reading from disk to put > them back in. It's like I fed the kernel poison or something. > > Has anybody else noticed something like this? I got this behavior with > 9.1.6 on a 3.2 kernel. No amount of tweaks in /proc/sys/vm changed > anything either, so I'm not convinced it's a NUMA problem. > Does this match what you're seeing? http://frosty-postgres.blogspot.co.uk/2012/08/postgresql-numa-and-zone-reclaim-mode.html
On 11/16/2012 01:59 PM, Richard Huxton wrote: > http://frosty-postgres.blogspot.co.uk/2012/08/postgresql-numa-and-zone-reclaim-mode.html I actually considered zone_reclaim_mode. But the article you linked to misses a point: during boot, zone_reclaim_mode is chosen only if the zone distance is > 20, otherwise it's disabled. And in our case: #> cat /proc/sys/vm/zone_reclaim_mode 0 #> numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 node 0 size: 36853 MB node 0 free: 6456 MB node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 node 1 size: 36863 MB node 1 free: 6921 MB node distances: node 0 1 0: 10 20 1: 20 10 I actually hoped that was the problem, but no such luck. Now... there is the possibility that the 3.2 kernel variant we're using has some bug where it's not honoring this setting, but evidence suggests it's something else. What's annoying about the above numactl output is that the OS is ignoring 12GB of RAM, while still marking 15GB of cache as inactive. So we're really getting 27GB less cache than usual. It's pretty obvious when watching system load. I'm getting ready to start grabbing mainline kernels and compiling them to try and track this down. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On Nov 16, 2012, at 11:59 AM, Richard Huxton <dev@archonet.com> wrote: > On 16/11/12 19:35, Shaun Thomas wrote: >> Hey guys, >> >> So, we have a pretty beefy system that runs dual X5675's with 72GB of RAM. After our recent upgrade to 9.1, things havebeen... odd. I managed to track it down to one setting: >> >> shared_buffers = 8GB >> > Does this match what you're seeing? > > http://frosty-postgres.blogspot.co.uk/2012/08/postgresql-numa-and-zone-reclaim-mode.html > (Slightly OT from the OP's question, sorry) Would this be worth referencing in the PostgreSQL documentation? I feel like I've read a lot of the documentation on Postgrestuning but this is news to me. And surprise surprise I'm running an affected system too! Maybe it deserves a moreprominent warning, perhaps at http://www.postgresql.org/docs/9.2/interactive/performance-tips.html