Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
От | Alvaro Herrera |
---|---|
Тема | Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock |
Дата | |
Msg-id | 202402221843.ibzvpndbacbi@alvherre.pgsql обсуждение исходный текст |
Ответ на | Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock (Dilip Kumar <dilipbalaut@gmail.com>) |
Ответы |
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
|
Список | pgsql-hackers |
On 2024-Feb-07, Dilip Kumar wrote: > On Wed, Feb 7, 2024 at 3:49 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Sure, but is that really what we want? > > So your question is do we want these buffers to be in multiple of > SLRU_BANK_SIZE? Maybe we can have the last bank to be partial, I > don't think it should create any problem logically. I mean we can > look again in the patch to see if we have made any such assumptions > but that should be fairly easy to fix, then maybe if we are going in > this way we should get rid of the check_slru_buffers() function as > well. Not really, I just don't think the macro should be in slru.h. Another thing I've been thinking is that perhaps it would be useful to make the banks smaller, when the total number of buffers is small. For example, if you have 16 or 32 buffers, it's not really clear to me that it makes sense to have just 1 bank or 2 banks. It might be more sensible to have 4 banks with 4 or 8 buffers instead. That should make the algorithm scale down as well as up ... I haven't done either of those things in the attached v19 version. I did go over the comments once again and rewrote the parts I was unhappy with, including some existing ones. I think it's OK now from that point of view ... at some point I thought about creating a separate README, but in the end I thought it not necessary. I did add a bunch of Assert()s to make sure the locks that are supposed to be held are actually held. This led me to testing the page status to be not EMPTY during SimpleLruWriteAll() before calling SlruInternalWritePage(), because the assert was firing. The previous code is not really *buggy*, but to me it's weird to call WritePage() on a slot with no contents. Another change was in TransactionGroupUpdateXidStatus: the original code had the leader doing pg_atomic_read_u32(&procglobal->clogGroupFirst) to know which bank to lock. I changed it to simply be the page used by the leader process; this doesn't need an atomic read, and should be the same page anyway. (If it isn't, it's no big deal). But what's more: even if we do read ->clogGroupFirst at that point, there's no guarantee that this is going to be exactly for the same process that ends up being the first in the list, because since we have not set it to INVALID by the time we grab the bank lock, it is quite possible for more processes to add themselves to the list. I realized all this while rewriting the comments in a way that would let me understand what was going on ... so IMO the effort was worthwhile. Anyway, what I send now should be pretty much final, modulo the change to the check_slru_buffers routines and documentation additions to match pg_stat_slru to the new GUC names. > > Another painful point is that pg_stat_slru uses internal names in the > > data it outputs, which obviously do not match the new GUCs. > > Yeah, that's true, but I think this could be explained somewhere not > sure what is the right place for this. Or we can change those names in the view. > FYI, I have also repeated all the performance tests I performed in my > first email[1], and I am seeing a similar gain. Excellent, thanks for doing that. > Some comments on v18 in my first pass of the review. > > 1. > @@ -665,7 +765,7 @@ TransactionIdGetStatus(TransactionId xid, XLogRecPtr *lsn) > lsnindex = GetLSNIndex(slotno, xid); > *lsn = XactCtl->shared->group_lsn[lsnindex]; > > - LWLockRelease(XactSLRULock); > + LWLockRelease(SimpleLruGetBankLock(XactCtl, pageno)); > > Maybe here we can add an assert before releasing the lock for a safety check > > Assert(LWLockHeldByMe(SimpleLruGetBankLock(XactCtl, pageno))); Hmm, I think this would just throw a warning or error "you don't hold such lwlock", so it doesn't seem necessary. > 2. > + * > + * XXX could we make the LSNs to be bank-based? > */ > XLogRecPtr *group_lsn; > > IMHO, the flush still happens at the page level so up to which LSN > should be flush before flushing the particular clog page should also > be maintained at the page level. Yeah, this was a misguided thought, nevermind. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ <Schwern> It does it in a really, really complicated way <crab> why does it need to be complicated? <Schwern> Because it's MakeMaker.
Вложения
В списке pgsql-hackers по дате отправления:
Следующее
От: Alena RybakinaДата:
Сообщение: Re: Optimize planner memory consumption for huge arrays