Re: our buffer replacement strategy is kind of lame

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: our buffer replacement strategy is kind of lame
Дата
Msg-id CA+TgmobTUhTC3EJa-QLY9i=V5PCa4MM8JX2FE32DZC3rMhrsKg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: our buffer replacement strategy is kind of lame  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: our buffer replacement strategy is kind of lame  (Jim Nasby <jim@nasby.net>)
Список pgsql-hackers
On Tue, Jan 3, 2012 at 10:56 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, Jan 3, 2012 at 3:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> The clock sweep is where all the time goes, in its current form.
>>
>> ...but I agree with this.  In its current form, the clock sweep has to
>> acquire a spinlock for every buffer it touches.  That's really
>> expensive, and I think we need to either get rid of it altogether or
>> at least find some way to move it into the background.  Ideally, in
>> the common case, a backend that wants to allocate a buffer shouldn't
>> need to do more than acquire a spinlock, pop a buffer off some sort of
>> linked list of allocation targets, and release the spinlock.  Whatever
>> other computation is needed should get pushed out of foreground
>> processing.
>
> So you don't think a freelist is worth having, but you want a list of
> allocation targets.
> What is the practical difference?

I think that our current freelist is practically useless, because it
is almost always empty, and the cases where it's not empty (startup,
and after a table or database drop) are so narrow that we don't really
get any benefit out of having it.  However, I'm not opposed to the
idea of a freelist in general: I think that if we actually put in some
effort to keep the freelist in a non-empty state it would help a lot,
because backends would then have much less work to do at buffer
allocation time.

>> I have trouble accepting that this is really the problem we should be
>> solving.  There's only one background writer process, and that process
>> is waking up every 200ms by default.  Many people probably tune that
>> down quite a bit, but unless I'm quite mistaken, it should still be a
>> drop in the bucket next to what the backends are doing.
>
> That doesn't follow. Forcing the bgwriter to wait makes no sense. Yes,
> most of the contention is caused by the other backends, but the
> bgwriter experiences that contention currently when it has no need to
> do so.

Sure, but if that contention is a negligible fraction of the total and
doesn't cost anything measurable, then it doesn't make sense to add
code to eliminate it.  There are all sorts of places in the system
where we have contention at a level that doesn't affect performance.
Many of those could probably be fixed by adding more complicated
locking, but that would just complicate the code to no purpose.  If
there's a demonstrable performance benefit to this change then it's
worth a harder look, but my gut feeling is that there won't be.

>>> 2. When a backend can't find a free buffer, it spins for a long time
>>> while holding the lock. This makes the buffer strategy O(N) in its
>>> worst case, which slows everything down. Notably, while this is
>>> happening the bgwriter sits doing nothing at all, right at the moment
>>> when it is most needed. The Clock algorithm is an approximation of an
>>> LRU, so is already suboptimal as a "perfect cache". Tweaking to avoid
>>> worst case behaviour makes sense. How much to tweak? Well,...
>>
>> I generally agree with this analysis, but I don't think the proposed
>> patch is going to solve the problem.  It may have some merit as a way
>> of limiting the worst case behavior.  For example, if every shared
>> buffer has a reference count of 5, the first buffer allocation that
>> misses is going to have a lot of work to do before it can actually
>> come up with a victim.  But I don't think it's going to provide good
>> scaling in general.  Even if the background writer only spins through,
>> on average, ten or fifteen buffers before finding one to evict, that
>> still means we're acquiring ten or fifteen spinlocks while holding
>> BufFreelistLock. I don't currently have the measurements to prove
>> that's too expensive, but I bet it is.
>
> I think its worth reducing the cost of scanning, but that has little
> to do with solving the O(N) problem. I think we need both.

Maybe.  I would really like a unified solution to both problems, but
I'd be willing to accept a solution for one of them if we're confident
that we've actually solved it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kohei KaiGai
Дата:
Сообщение: Re: [v9.2] Fix Leaky View Problem
Следующее
От: Robert Haas
Дата:
Сообщение: Re: ALTER TABLE lock strength reduction patch is unsafe