Обсуждение: random slow query
I have a table about 50 million rows. There are a few writers to pump data into the table at the rate of 40000 row/hours. Most the time, the SELECT is less than 100 ms. However sometime it is very slow, from 30 seconds to 500 seconds. The database is vacuum analyze regularly. One months ago, this type of slow query happened about a few time per day. But recently, the slow query happens more frequent at the rate of once every 10 minutes or less. There seesm not relation to the database loading or the type of query. If I manually execute these query, it is returns in less than 1 seconds. I just wonder where should I start to look? Thanks Shawn.
On 06/29/2009 03:33 PM, Sean Ma wrote: > I have a table about 50 million rows. There are a few writers to pump > data into the table at the rate of 40000 row/hours. Most the time, the > SELECT is less than 100 ms. However sometime it is very slow, from 30 > seconds to 500 seconds. The database is vacuum analyze regularly. > > One months ago, this type of slow query happened about a few time per > day. But recently, the slow query happens more frequent at the rate of > once every 10 minutes or less. There seesm not relation to the > database loading or the type of query. If I manually execute these > query, it is returns in less than 1 seconds. > > I just wonder where should I start to look? The slow queries could be waiting for locks - so you could enable log_lock_waits to see if that is the issue. Andres
Sean Ma <seanxma@gmail.com> wrote: > I have a table about 50 million rows. There are a few writers to > pump data into the table at the rate of 40000 row/hours. Most the > time, the SELECT is less than 100 ms. However sometime it is very > slow, from 30 seconds to 500 seconds. The database is vacuum analyze > regularly. What version of PostgreSQL is this? On what OS? What hardware? We had similar problems on some of our servers under 8.2 and earlier due to the tendency of PostgreSQL to build up a very large set of dirty pages and then throw them all at the drives with an immediate fsync. The RAID controller queued up the requests, and fast reads got stuck in the queue behind all those writes. You may want to look at this excellent coverage of the topic by Greg Smith: http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm We got around the problem by keeping the checkpoint interval and shared buffer size fairly small, and making the background writer fairly aggressive. What works for you, if this is your problem, may be different. I've heard that some have had to tune their OS caching configuration. -Kevin
Hi Sean, Sean Ma wrote: > One months ago, this type of slow query happened about a few time per > day. But recently, the slow query happens more frequent at the rate of > once every 10 minutes or less. There seesm not relation to th What is your hardware (memory, CPU type and such)? This seems like a cache issue to me, but I can't tell for sure without some additional information on your system: 1) What is the amount of a) available memory b) free memory and c) memory available to i/o buffers? 2) What is the swap usage if any? 3) What is the CPU load? Any noticeable patterns in CPU load? You can use /usr/bin/top to obtain most of this information. Mike
top - 10:18:58 up 224 days, 15:10, 2 users, load average: 6.27, 7.33, 6 Tasks: 239 total, 1 running, 238 sleeping, 0 stopped, 0 zombie Cpu(s): 5.0%us, 0.7%sy, 0.0%ni, 61.5%id, 32.7%wa, 0.0%hi, 0.1%si, 0 Mem: 32962804k total, 32802612k used, 160192k free, 325360k buffers Swap: 8193140k total, 224916k used, 7968224k free, 30829456k cached Didn't really see the pattern, typical the cpu load is only about 40% On Mon, Jun 29, 2009 at 7:26 PM, Mike Ivanov<mikei@activestate.com> wrote: > Hi Sean, > > Sean Ma wrote: >> >> One months ago, this type of slow query happened about a few time per >> day. But recently, the slow query happens more frequent at the rate of >> once every 10 minutes or less. There seesm not relation to th > > What is your hardware (memory, CPU type and such)? > > This seems like a cache issue to me, but I can't tell for sure without some > additional information on your system: > > 1) What is the amount of a) available memory b) free memory and c) memory > available to i/o buffers? > > 2) What is the swap usage if any? > > 3) What is the CPU load? Any noticeable patterns in CPU load? > > You can use /usr/bin/top to obtain most of this information. > > Mike > > >
Hi Sean, Well, the overall impression is your machine is badly overloaded. Look: > top - 10:18:58 up 224 days, 15:10, 2 users, load average: 6.27, 7.33, 6 > The load average of 6.5 means there are six and a half processes competing for the same CPU (and this system apparently has only one). This approximately equals to 500% overload. Recommendation: either add more CPU's or eliminate process competition by moving them to other boxes. > Tasks: 239 total, 1 running, 238 sleeping, 0 stopped, 0 zombie > This supports what I said above. There are only 92 processes running on my laptop and I think it is too much. Do you have Apache running on the same machine? > Cpu(s): 5.0%us, 0.7%sy, 0.0%ni, 61.5%id, 32.7%wa, 0.0%hi, 0.1%si, 0 > Waiting time (wa) is rather high, which means processes wait on locks or for IO, another clue for concurrency issues on this machine. > Mem: 32962804k total, 32802612k used, 160192k free, 325360k buffers > Buffers are about 10% of all the memory which is OK, but I tend to give buffers some more room. Recommendation: eliminate unneeded processes, decrease (yes, decrease) the Postgres cache buffers if they are set too high. > Swap: 8193140k total, 224916k used, 7968224k free, 30829456k cached > 200M paged out. It should be zero except of an emergency. 3G of cached swap is a sign of some crazy paging activity in thepast. Those unexplainable slowdowns are very likely caused by that. > Didn't really see the pattern, typical the cpu load is only about 40% > 40% is too much, really. I start worrying when it is above 10%. Conclusion: - the system bears more load than it can handle - the machine needs an upgrade - Postges is competing with something (presumably Apache) - separate them. That should help. Cheers, Mike
On Tue, Jun 30, 2009 at 11:23 AM, Mike Ivanov<mikei@activestate.com> wrote: > Hi Sean, > > Well, the overall impression is your machine is badly overloaded. Look: > >> top - 10:18:58 up 224 days, 15:10, 2 users, load average: 6.27, 7.33, 6 >> > > The load average of 6.5 means there are six and a half processes competing > for the same CPU (and this system apparently has only one). This > approximately equals to 500% overload. > > Recommendation: either add more CPU's or eliminate process competition by > moving them to other boxes. Well, we can't be sure OP's only got one core. However, given that the OPs posting shows mostly idle and wait state, the real issue isn't the number of cores, it's the IO subsystem is too slow for the load. More cores wouldn't fix that. >> Tasks: 239 total, 1 running, 238 sleeping, 0 stopped, 0 zombie >> > > This supports what I said above. There are only 92 processes running on my > laptop and I think it is too much. Do you have Apache running on the same > machine? My production PG server that runs ONLY pg has 222 processes on it. It's no big deal. Unless they're all trying to get cpu time, which generally isn't the case. >> Cpu(s): 5.0%us, 0.7%sy, 0.0%ni, 61.5%id, 32.7%wa, 0.0%hi, 0.1%si, 0 >> > > Waiting time (wa) is rather high, which means processes wait on locks or for > IO, another clue for concurrency issues on this machine. More likely just a slow IO subsystem. Like a single drive or something. adding drives in a RAID-1 or RAID-10 etc usually helps. >> Mem: 32962804k total, 32802612k used, 160192k free, 325360k buffers >> > > Buffers are about 10% of all the memory which is OK, but I tend to give > buffers some more room. This is kernel buffers, not pg buffers. It's set by the OS semi-automagically. In this case it's 325M out of 32 Gig, so it's well under 10%, which is typical. >> Swap: 8193140k total, 224916k used, 7968224k free, 30829456k cached >> > > 200M paged out. It should be zero except of an emergency. Not true. Linux will happily swap out seldom used processes to make room in memory for more kernel cache etc. You can adjust this tendency by setting swappiness. > 3G of cached swap > is a sign of some crazy paging activity in thepast. Those unexplainable > slowdowns are very likely caused by that. No, they're not. It's 30G btw, and it's not swap that's cached, it's the kernel using extra memory to cache data to / from the hard drives. It's normal, and shouldn't worry anybody. In fact it's a good sign that you're not using way too much memory for any one process. >> Didn't really see the pattern, typical the cpu load is only about 40% >> > > 40% is too much, really. I start worrying when it is above 10%. Really? I have eight cores on my production servers and many batch jobs I run put all 8 cores at 90% for extended periods. Since that machine is normally doing a lot of smaller cached queries, it hardly even notices. > Conclusion: > > - the system bears more load than it can handle Yes, too much IO load. I agree on that. > - the machine needs an upgrade Yes, more hard drives / better caching RAID controller.
Hi Mike, Thanks for the details. Yes, besides another mysql server running on the same server, there is also an homegrown application that frequent read/write the file system. The postgres shared cache is at 4G, is that too big? Thanks Sean On Tue, Jun 30, 2009 at 1:23 PM, Mike Ivanov<mikei@activestate.com> wrote: > Hi Sean, > > Well, the overall impression is your machine is badly overloaded. Look: > >> top - 10:18:58 up 224 days, 15:10, 2 users, load average: 6.27, 7.33, 6 >> > > The load average of 6.5 means there are six and a half processes competing > for the same CPU (and this system apparently has only one). This > approximately equals to 500% overload. > > Recommendation: either add more CPU's or eliminate process competition by > moving them to other boxes. > >> Tasks: 239 total, 1 running, 238 sleeping, 0 stopped, 0 zombie >> > > This supports what I said above. There are only 92 processes running on my > laptop and I think it is too much. Do you have Apache running on the same > machine? > >> Cpu(s): 5.0%us, 0.7%sy, 0.0%ni, 61.5%id, 32.7%wa, 0.0%hi, 0.1%si, 0 >> > > Waiting time (wa) is rather high, which means processes wait on locks or for > IO, another clue for concurrency issues on this machine. > >> Mem: 32962804k total, 32802612k used, 160192k free, 325360k buffers >> > > Buffers are about 10% of all the memory which is OK, but I tend to give > buffers some more room. > > Recommendation: eliminate unneeded processes, decrease (yes, decrease) the > Postgres cache buffers if they are set too high. > >> Swap: 8193140k total, 224916k used, 7968224k free, 30829456k cached >> > > 200M paged out. It should be zero except of an emergency. 3G of cached swap > is a sign of some crazy paging activity in thepast. Those unexplainable > slowdowns are very likely caused by that. > >> Didn't really see the pattern, typical the cpu load is only about 40% >> > > 40% is too much, really. I start worrying when it is above 10%. > > Conclusion: > > - the system bears more load than it can handle > - the machine needs an upgrade > - Postges is competing with something (presumably Apache) - separate them. > > That should help. > > Cheers, > Mike > >
On Tue, Jun 30, 2009 at 11:49 AM, Sean Ma<seanxma@gmail.com> wrote: > Hi Mike, > > Thanks for the details. Yes, besides another mysql server running on > the same server, there is also an homegrown application that frequent > read/write the file system. > > The postgres shared cache is at 4G, is that too big? Not for a machine with 32Gig of ram.
Sean, > Yes, besides another mysql server running on > the same server, Which is a really bad idea :-) > The postgres shared cache is at 4G, is that too big? > OK, I have misread the total memory amount which was 32G, and I thought it was 3G. Thanks to Scott Marlow who pointed that out. In this case 4G for shared buffers is good. Actually, I take back my words on swap, too. 200M swapped is less important when you have a plenty of memory. Regards, Mike
Scott Marlowe wrote: >> >> The postgres shared cache is at 4G, is that too big? >> > > Not for a machine with 32Gig of ram. > > He could even add some more. Mike
On Tue, Jun 30, 2009 at 12:01 PM, Mike Ivanov<mikei@activestate.com> wrote: > Scott Marlowe wrote: >>> >>> The postgres shared cache is at 4G, is that too big? >>> >> >> Not for a machine with 32Gig of ram. >> >> > > He could even add some more. Definitely. Really depends on how big his data set is, and how well pgsql is at caching it versus the kernel. I've found that with a really big dataset, like 250G to 1T range, the kernel is almost always better at caching a lot of it, and if you're operating on a few hundred meg at a time anyway, then smaller shared_buffers helps. OTOH, if you're working on a 5G data set, it's often helpful to turn up shared_buffers enough to cover that. OTOH, if you're running a busy transaction oriented db (lots of small updates) larger shared_buffers will slow you down quite a bit.
Hi Scott, > Well, we can't be sure OP's only got one core. In fact, we can, Sean posted what top -b -n 1 says. There was only one CPU line. > the number of cores, it's the IO subsystem is too slow for the load. > More cores wouldn't fix that. > While I agree on the IO, more cores would definitely help to improve ~6.5 load average. > My production PG server that runs ONLY pg has 222 processes on it. > It's no big deal. Unless they're all trying to get cpu time, which > generally isn't the case. > 222 / 8 cores = ridiculous 27 processes per core, while the OP has 239. > More likely just a slow IO subsystem. Like a single drive or > something. adding drives in a RAID-1 or RAID-10 etc usually helps. > Absolutely. > This is kernel buffers, not pg buffers. It's set by the OS > semi-automagically. In this case it's 325M out of 32 Gig, so it's > well under 10%, which is typical. > You can control the FS buffers indirectly by not allowing running processes to take too much memory. If you have like 40% free, there are good chances the system will use that memory for buffers. If you let them eat up 90% and swap out some more, there is no room for buffers and the system will have to swap out something when it really needs it. > Not true. Linux will happily swap out seldom used processes to make > room in memory for more kernel cache etc. You can adjust this > tendency by setting swappiness. > This is fine until one of those processes wakes up. Then your FS cache is dumped. > It's 30G btw, Yeah, I couldn't believe my eyes :-) > > 3G of cached swap > and it's not swap that's cached, it's > the kernel using extra memory to cache data to / from the hard drives. > Oh please.. it *is*: http://www.linux-tutorial.info/modules.php?name=MContent&pageid=314 > It's normal, and shouldn't worry anybody. In fact it's a good sign > that you're not using way too much memory for any one process. > It says exactly the opposite. > Really? I have eight cores on my production servers and many batch > jobs I run put all 8 cores at 90% for extended periods. Since that > machine is normally doing a lot of smaller cached queries, it hardly > even notices. > The OP's machine is doing a lot of write ops, which is different. > Yes, more hard drives / better caching RAID controller. > +1 BTW, nearly full file system can be another source of problems. Cheers, Mike
On Tue, Jun 30, 2009 at 12:22 PM, Mike Ivanov<mikei@activestate.com> wrote: > Hi Scott, > >> Well, we can't be sure OP's only got one core. > > In fact, we can, Sean posted what top -b -n 1 says. There was only one CPU > line. Missed that. > >> the number of cores, it's the IO subsystem is too slow for the load. >> More cores wouldn't fix that. >> > > While I agree on the IO, more cores would definitely help to improve ~6.5 > load average. No, it won't. You can have 1000 cores, and if they're all waiting on IO, you'll have the same load. >> My production PG server that runs ONLY pg has 222 processes on it. >> It's no big deal. Unless they're all trying to get cpu time, which >> generally isn't the case. >> > > 222 / 8 cores = ridiculous 27 processes per core, while the OP has 239. But most of those processes are asleep and doing nothing. My production machine is an RHEL 5.2 machine doing only one thing really, and it's got that many processes on it. It's fine. >> More likely just a slow IO subsystem. Like a single drive or >> something. adding drives in a RAID-1 or RAID-10 etc usually helps. >> > > Absolutely. > >> This is kernel buffers, not pg buffers. It's set by the OS >> semi-automagically. In this case it's 325M out of 32 Gig, so it's >> well under 10%, which is typical. >> > > You can control the FS buffers indirectly by not allowing running processes > to take too much memory. If you have like 40% free, there are good chances > the system will use that memory for buffers. If you let them eat up 90% and > swap out some more, there is no room for buffers and the system will have to > swap out something when it really needs it. Close, but it'll use that memory for cache. Large buffers are not typical in linux, large kernel caches are. >> Not true. Linux will happily swap out seldom used processes to make >> room in memory for more kernel cache etc. You can adjust this >> tendency by setting swappiness. >> > > This is fine until one of those processes wakes up. Then your FS cache is > dumped. Yep. >> > 3G of cached swap >> and it's not swap that's cached, it's >> the kernel using extra memory to cache data to / from the hard drives. > > Oh please.. it *is*: > http://www.linux-tutorial.info/modules.php?name=MContent&pageid=314 If that tutorial says that, then that tutorial is wrong. I'm guessing what that tutorial is talking about, and what top is saying are two very different things though. >> It's normal, and shouldn't worry anybody. In fact it's a good sign >> that you're not using way too much memory for any one process. >> > > It says exactly the opposite. Sorry, but you are wrong here. Look up a better tutorial on what the cache entry for top means. It's most assuredly not about swap cache, it's kernel cache. >> Yes, more hard drives / better caching RAID controller. >> > > +1 > > BTW, nearly full file system can be another source of problems. Yeah, ran into that a while back, causes lots of fragmentation.
On Tue, Jun 30, 2009 at 12:22 PM, Mike Ivanov<mikei@activestate.com> wrote: >> > 3G of cached swap >> and it's not swap that's cached, it's >> the kernel using extra memory to cache data to / from the hard drives. >> > > Oh please.. it *is*: > http://www.linux-tutorial.info/modules.php?name=MContent&pageid=314 Also think about it, the OP has 8G of swap and 30Gig cached. How / why would you be caching 30Gigs worth of data when there's only 8G to cache anyway?
On Tuesday 30 June 2009, Mike Ivanov <mikei@activestate.com> wrote: > Hi Scott, > > > Well, we can't be sure OP's only got one core. > > In fact, we can, Sean posted what top -b -n 1 says. There was only one > CPU line. > Recent versions of top on Linux (on RedHat 5 anyway) may show only one combined CPU line unless you break them out with an option. > > the number of cores, it's the IO subsystem is too slow for the load. > > More cores wouldn't fix that. > > While I agree on the IO, more cores would definitely help to improve > ~6.5 load average. No, I agree with the previous poster. His load is entirely due to IO wait. Only one of those processes was trying to do anything. IO wait shows up as high load averages.
Scott Marlowe wrote: > Also think about it, the OP has 8G of swap and 30Gig cached. How / > why would you be caching 30Gigs worth of data when there's only 8G to > cache anyway? > You're right, I have misread it again :-) Cheers, Mike
Alan Hodgson wrote: > On Tuesday 30 June 2009, Mike Ivanov <mikei@activestate.com> wrote: >> Hi Scott, >> >>> Well, we can't be sure OP's only got one core. >> In fact, we can, Sean posted what top -b -n 1 says. There was only one >> CPU line. >> > > Recent versions of top on Linux (on RedHat 5 anyway) may show only one > combined CPU line unless you break them out with an option. I have not noticed that to be the case. I ran RHEL3 from early 2004 until a little after RHEL5 came out. I now run that (updated whenever updates come out), and I do not recall ever setting any flag to get it to split the CPU into 4 pieces. I know the flag is there, but I do not recall ever setting it. > >>> the number of cores, it's the IO subsystem is too slow for the load. >>> More cores wouldn't fix that. >> While I agree on the IO, more cores would definitely help to improve >> ~6.5 load average. > > No, I agree with the previous poster. His load is entirely due to IO wait. > Only one of those processes was trying to do anything. IO wait shows up as > high load averages. > If you run xosview, you can see all that stuff broken out, in my case at one-second intervals. It shows user, nice, system, idle, wait, hardware interrupt, software interrupt. It also shows disk read, write, and idle time. Lots of other stuff too. -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jersey http://counter.li.org ^^-^^ 14:55:01 up 12 days, 1:44, 3 users, load average: 4.34, 4.36, 4.41
Scott Marlowe wrote: > Close, but it'll use that memory for cache. Large buffers are not > typical in linux, large kernel caches are. > OK, we're talking about different things. You're right. > If that tutorial says that, then that tutorial is wrong. I'm guessing > what that tutorial is talking about, and what top is saying are two > very different things though. > Then it is an amazingly common misconception. I guess it first appeared in some book and then reproduced by zillion blogs. Essentially this is what Goolgle brings you on 'swap cache' query. Thanks for clearing that out. >>> It's normal, and shouldn't worry anybody. In fact it's a good sign >>> that you're not using way too much memory for any one process >> It says exactly the opposite. >> This time I agree :-) Cheers, Mike
Well, this is going to be a bit redundant but: On 6/30/09 11:22 AM, "Mike Ivanov" <mikei@activestate.com> wrote: > Hi Scott, > >> Well, we can't be sure OP's only got one core. > > In fact, we can, Sean posted what top -b -n 1 says. There was only one > CPU line. I do not believe that setting means what you think it means. Here is the same output for a machine with two quad-core cpus. $ top -b -n 1 top - 12:43:06 up 264 days, 1:47, 5 users, load average: 0.24, 0.25, 0.71 Tasks: 253 total, 1 running, 252 sleeping, 0 stopped, 0 zombie Cpu(s): 5.1%us, 0.5%sy, 0.0%ni, 93.9%id, 0.5%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 16432232k total, 13212684k used, 3219548k free, 5992k buffers Swap: 2040244k total, 180k used, 2040064k free, 7775732k cached From the man page: When you see ¹Cpu(s):¹ in the summary area, the ¹1¹ toggle is On and all cpu information is gathered in a single line. Otherwise, each cpu is displayed separately as: ¹Cpu0, Cpu1, ...¹ > >> the number of cores, it's the IO subsystem is too slow for the load. >> More cores wouldn't fix that. >> > > While I agree on the IO, more cores would definitely help to improve > ~6.5 load average. > Load average is one of the more useless values to look at on a system unless you are looking at a DELTA of the load average from one condition to another. All alone, it doesn't say much. The CPU was 60% idle, and ~35% in io wait. If those processes were waiting on CPU resources to be available, the idle % would be very low. Or, the OS scheduler is broken. >> My production PG server that runs ONLY pg has 222 processes on it. >> It's no big deal. Unless they're all trying to get cpu time, which >> generally isn't the case. >> > 222 / 8 cores = ridiculous 27 processes per core, while the OP has 239. > That's not rediculous at all. Modern OS's handle thousands of idle processes just fine. >> This is kernel buffers, not pg buffers. It's set by the OS >> semi-automagically. In this case it's 325M out of 32 Gig, so it's >> well under 10%, which is typical. >> > > You can control the FS buffers indirectly by not allowing running > processes to take too much memory. If you have like 40% free, there are > good chances the system will use that memory for buffers. If you let > them eat up 90% and swap out some more, there is no room for buffers and > the system will have to swap out something when it really needs it. > Or you can control the behavior with the following kenrnel params: vm.swappiness vm.dirty_ratio vm.dirty_background ratio >> Not true. Linux will happily swap out seldom used processes to make >> room in memory for more kernel cache etc. You can adjust this >> tendency by setting swappiness. >> > > This is fine until one of those processes wakes up. Then your FS cache > is dumped. Actually, no. When a process wakes up only the pages that are needed are accessed. For most idle processes that wake up from time to time, a small bit of work is done, then they go back to sleep. This initial allocation does NOT come from the page cache, but from the "buffers" line in top. The os tries to keep some ammount of free buffers not allocated to processes or pages available, so that allocation demands can be met without having to synchronously decide which buffers from page cache to eject. >>> 3G of cached swap >> and it's not swap that's cached, it's >> the kernel using extra memory to cache data to / from the hard drives. >> > > Oh please.. it *is*: > http://www.linux-tutorial.info/modules.php?name=MContent&pageid=314 > There is no such thing as "cached swap". What would there be to cache? A process' page is either in RAM or swap, and a file is either in buffer cache or not. That line entry is the size of the file page cache. Read about 'free' and compare the values to top. >> It's normal, and shouldn't worry anybody. In fact it's a good sign >> that you're not using way too much memory for any one process. >> > > It says exactly the opposite. It says a ton of space is used caching files. > >> Really? I have eight cores on my production servers and many batch >> jobs I run put all 8 cores at 90% for extended periods. Since that >> machine is normally doing a lot of smaller cached queries, it hardly >> even notices. >> > > The OP's machine is doing a lot of write ops, which is different. > Not when it comes to CPU use percentage. The overlap with disk I/O and CPU on linux shows up in time spent by the kernel (system time), and often kswapd processor time (shows up as system time). Everything else is i/o wait. The OP has a I/O bottleneck. Suggestions other than new hardware: * Put the xlogs on a separate partition and if ext3 mount with data=writeback or use ext2. * Use the deadline scheduler. If queries are intermittently causing problems, it might be due to checkpoints. Make sure that the kernel parameters for dirty_background_ratio is 5 or less, and dirty_ratio is 10 or less. Search this group for information about tuning postgres checkpoints. If using a hardware RAID card with a battery back-up, make sure its cache mode is set to write-back. A larger shared_buffers size can help if sequential scans are infrequent and kick out pages from the OS page cache. Postgres does not let sequential scans kick out index pages or pages accessed randomly from its buffer cache, but the OS (Linux) is more prone to that. Whether larger or smaller shared_buffers will help is HIGHLY load and use case dependant.
On 6/30/09 12:06 PM, "Jean-David Beyer" <jeandavid8@verizon.net> wrote: > Alan Hodgson wrote: >> On Tuesday 30 June 2009, Mike Ivanov <mikei@activestate.com> wrote: >>> Hi Scott, >>> >>>> Well, we can't be sure OP's only got one core. >>> In fact, we can, Sean posted what top -b -n 1 says. There was only one >>> CPU line. >>> >> >> Recent versions of top on Linux (on RedHat 5 anyway) may show only one >> combined CPU line unless you break them out with an option. > > I have not noticed that to be the case. I ran RHEL3 from early 2004 until a > little after RHEL5 came out. I now run that (updated whenever updates come > out), and I do not recall ever setting any flag to get it to split the CPU > into 4 pieces. > > I know the flag is there, but I do not recall ever setting it. Top now has storable defaults so how it behaves depends on what the user has stored for their defaults. For example: go to interactive mode by just typing top. Now, hit "1". Or hit "c", or try "M". Now, toggle the '1' flag until it shows one (and reports Cpu(s) not Cpu0, Cpu1, etc) and hit shift-w. Now your defaults are changed and it will spit out one line for all cpus unless you tell it not to. The output can be highly customized and your preferences stored. Hit 'h' for more info. Another way to put it, is that Linux' top has mostly caught up to the proprietary commmand line interactive tools on Solaris and AIX that used to be light-years ahead.
On 6/30/09 1:08 PM, "Scott Carey" <scott@richrelevance.com> wrote: > > A larger shared_buffers size can help if sequential scans are infrequent and > kick out pages from the OS page cache. > Postgres does not let sequential scans kick out index pages or pages > accessed randomly from its buffer cache, but the OS (Linux) is more prone to > that. Let me qualify the above: Postgres 8.3+ doesn't let full page scans push out pages from its shared_buffers. It uses a ring buffer for full page scans and vacuums. > > Whether larger or smaller shared_buffers will help is HIGHLY load and use > case dependant. > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Scott Carey wrote: >> 222 / 8 cores = ridiculous 27 processes per core, while the OP has 239 > That's not rediculous at all. Modern OS's handle thousands of idle > processes just fine. > > I meant that 27 was a ridiculously small number. > Or you can control the behavior with the following kenrnel params: > vm.swappiness > vm.dirty_ratio > vm.dirty_background ratio > Thanks for pointing that out! > Actually, no. When a process wakes up only the pages that are needed are > accessed. For most idle processes that wake up from time to time, a small > bit of work is done, then they go back to sleep. This initial allocation > does NOT come from the page cache, but from the "buffers" line in top. The > os tries to keep some ammount of free buffers not allocated to processes or > pages available, so that allocation demands can be met without having to > synchronously decide which buffers from page cache to eject. > Wait a second, I'm trying to understand that :-) Did you mean that FS cache pages are first allocated from the buffer pages or that process memory being paged out to swap is first written to buffers? Could you clarify please? > If queries are intermittently causing problems, it might be due to > checkpoints. Make sure that the kernel parameters for > dirty_background_ratio is 5 or less, and dirty_ratio is 10 or less. > Scott, isn't dirty_ratio supposed to be less than dirty_background_ratio? I've heard that system would automatically set dirty_ratio = dirty_background_ratio / 2 if that's not the case. Also, how dirty_ratio could be less than 5 if 5 is the minimal value? Regards, Mike
On 6/30/09 2:39 PM, "Mike Ivanov" <mikei@activestate.com> wrote: > Scott Carey wrote: >>> 222 / 8 cores = ridiculous 27 processes per core, while the OP has 239 >> That's not rediculous at all. Modern OS's handle thousands of idle >> processes just fine. >> >> > I meant that 27 was a ridiculously small number. > >> Or you can control the behavior with the following kenrnel params: >> vm.swappiness >> vm.dirty_ratio >> vm.dirty_background ratio >> > Thanks for pointing that out! > >> Actually, no. When a process wakes up only the pages that are needed are >> accessed. For most idle processes that wake up from time to time, a small >> bit of work is done, then they go back to sleep. This initial allocation >> does NOT come from the page cache, but from the "buffers" line in top. The >> os tries to keep some ammount of free buffers not allocated to processes or >> pages available, so that allocation demands can be met without having to >> synchronously decide which buffers from page cache to eject. >> > Wait a second, I'm trying to understand that :-) > Did you mean that FS cache pages are first allocated from the buffer > pages or that process memory being paged out to swap is first written to > buffers? Could you clarify please? > There are some kernel parameters that control how much RAM the OS tries to keep in a state that is not allocated to page cache or processes. I've forgotten what these are exactly. But the purpose is to prevent the virtual memory system from having to make the decision on what memory to kick out of the page cache, or what pages to swap to disk, when memory is allocated. Rather, it can do this in the background most of the time. So, the first use of this is when a process allocates memory. Pulling a swapped page off disk probably uses this too but I'm not sure. It would make sense. Pages being written to swap go directly to swap and deallocated. File pages are either on disk or in the page cache. Process pages are either in memory or swap. But when either of these is first put in memory (process allocation, page-in, file read), the OS can either quickly allocate to the process or the page cache from the free buffers, or more slowly take from the page cache, or even more slowly page out a process page. >> If queries are intermittently causing problems, it might be due to >> checkpoints. Make sure that the kernel parameters for >> dirty_background_ratio is 5 or less, and dirty_ratio is 10 or less. >> > Scott, isn't dirty_ratio supposed to be less than > dirty_background_ratio? I've heard that system would automatically set > dirty_ratio = dirty_background_ratio / 2 if that's not the case. Also, > how dirty_ratio could be less than 5 if 5 is the minimal value? > dirty_ratio is the percentage of RAM that can be in the page cache and not yet written to disk before all writes in the system block. dirty_background_ratio is the percentage of RAM that can be filled with dirty file pages before a background thread is started by the OS to start flushing to disk. Flushing to disk also occurs on timed intervals or other triggers. By default, Linux 2.6.18 (RHEL5/Centos5, etc) has the former at 40 and the latter at 10, which on a 32GB system means over 13GB can be in memory and not yet on disk! Sometime near 2.6.22 or so the default became 10 and 5, respectively. For some systems, this is still too much. I like to use the '5 second rule'. dirty_background_ratio should be sized so that it takes about 5 seconds to flush to disk in optimal conditions. dirty_ratio should be 2x to 5x this depending on your application's needs -- for a system with well tuned postgres checkpoints, smaller tends to be better to limit stalls while waiting for the checkpoint fsync to finish. > Regards, > Mike > >
Scott Carey wrote: > the OS can either quickly allocate to the process or > the page cache from the free buffers, or more slowly take from the page > cache, or even more slowly page out a process page. > Aha, now it all makes sense. > I like to use the '5 second rule'. dirty_background_ratio should be sized > so that it takes about 5 seconds to flush to disk in optimal conditions. > dirty_ratio should be 2x to 5x this depending on your application's needs -- > for a system with well tuned postgres checkpoints, smaller tends to be > better to limit stalls while waiting for the checkpoint fsync to finish. > Thanks a lot, this is invaluable information. Regards, Mike