Re: Scaling shared buffer eviction
От | Robert Haas |
---|---|
Тема | Re: Scaling shared buffer eviction |
Дата | |
Msg-id | CA+TgmoY58dQi8Z=FDAu4ggxHV-HYV03-R9on1LSP9OJU_fy_zA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Scaling shared buffer eviction (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Scaling shared buffer eviction
(Robert Haas <robertmhaas@gmail.com>)
Re: Scaling shared buffer eviction (Andres Freund <andres@2ndquadrant.com>) |
Список | pgsql-hackers |
On Tue, Sep 23, 2014 at 10:55 AM, Robert Haas <robertmhaas@gmail.com> wrote: > But this gets at another point: the way we're benchmarking this right > now, we're really conflating the effects of three different things: > > 1. Changing the locking regimen around the freelist and clocksweep. > 2. Adding a bgreclaimer process. > 3. Raising the number of buffer locking partitions. I did some more experimentation on this. Attached is a patch that JUST does #1, and, as previously suggested, it uses a single spinlock instead of using two of them that are probably in the same cacheline. Without this patch, on a 32-client, read-only pgbench at scale factor 1000 and shared_buffers=8GB, perf -e cs -g -a has this to say about LWLock-inspired preemption: - LWLockAcquire + 68.41% ReadBuffer_common + 31.59% StrategyGetBuffer With the patch, unsurprisingly, StrategyGetBuffer disappears and the only lwlocks that are causing context-switches are the individual buffer locks. But are we suffering spinlock contention instead as a result? Yes, but not much. s_lock is at 0.41% in the corresponding profile, and only 6.83% of those calls are from the patched StrategyGetBuffer. In a similar profile of master, s_lock is at 0.31%. So there's a bit of additional s_lock contention, but I think it's considerably less than the contention over the lwlock it's replacing, because the spinlock is only held for the minimal amount of time needed, whereas the lwlock could be held across taking and releasing many individual buffer locks. TPS results are a little higher with the patch - these are alternating 5 minute runs: master tps = 176010.647944 (including connections establishing) master tps = 176615.291149 (including connections establishing) master tps = 175648.370487 (including connections establishing) reduce-replacement-locking tps = 177888.734320 (including connections establishing) reduce-replacement-locking tps = 177797.842410 (including connections establishing) reduce-replacement-locking tps = 177894.822656 (including connections establishing) The picture is similar at 64 clients, but the benefit is a little more: master tps = 179037.231597 (including connections establishing) master tps = 180500.937068 (including connections establishing) master tps = 181565.706514 (including connections establishing) reduce-replacement-locking tps = 185741.503425 (including connections establishing) reduce-replacement-locking tps = 188598.728062 (including connections establishing) reduce-replacement-locking tps = 187340.977277 (including connections establishing) What's interesting is that I can't see in the perf output any real sign that the buffer mapping locks are slowing things down, but they clearly are, because when I take this patch and also boost NUM_BUFFER_PARTITIONS to 128, the performance goes up: reduce-replacement-locking-128 tps = 251001.812843 (including connections establishing) reduce-replacement-locking-128 tps = 247368.925927 (including connections establishing) reduce-replacement-locking-128 tps = 250775.304177 (including connections establishing) The performance also goes up if I do that on master, but the effect is substantially less: master-128 tps = 219301.492902 (including connections establishing) master-128 tps = 219786.249076 (including connections establishing) master-128 tps = 219821.220271 (including connections establishing) I think this shows pretty clearly that, even without the bgreclaimer, there's a lot of merit in getting rid of BufFreelistLock and using a spinlock held for the absolutely minimal number of instructions instead. There's already some benefit without doing anything about the buffer mapping locks, and we'll get a lot more benefit once that issue is addressed. I think we need to do some serious study to see whether bgreclaimer is even necessary, because it looks like this change alone, which is much simpler, takes a huge bite out of the scalability problem. I welcome further testing and comments, but my current inclination is to go ahead and push the attached patch. To my knowledge, nobody has at any point objected to this aspect of what's being proposed, and it looks safe to me and seems to be a clear win. We can then deal with the other issues on their own merits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: