Re: Add bump memory context type and use it for tuplesorts

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Add bump memory context type and use it for tuplesorts
Дата
Msg-id af658fa4-6dae-413c-9c63-fffc4aeeb7cb@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Add bump memory context type and use it for tuplesorts  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
On 3/15/24 03:21, David Rowley wrote:
> On Tue, 12 Mar 2024 at 23:57, Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> Attached is an updated version of the mempool patch, modifying all the
>> memory contexts (not just AllocSet), including the bump context. And
>> then also PDF with results from the two machines, comparing results
>> without and with the mempool. There's very little impact on small reset
>> values (128kB, 1MB), but pretty massive improvements on the 8MB test
>> (where it's a 2x improvement).
> 
> I think it would be good to have something like this.  I've done some
> experiments before with something like this [1]. However, mine was
> much more trivial.
> 

Interesting. My thing is a bit more complex because it was not meant to
be a cache initially, but more a way to limit the memory allocated by a
backend (discussed in [1]), or perhaps even a smaller part of a plan.

I only added the caching after I ran into some bottlenecks [2], and
malloc turned out to be a scalability issue.

> One thing my version did was get rid of the *context* freelist stuff
> in aset.  I wondered if we'd need that anymore as, if I understand
> correctly, it's just there to stop malloc/free thrashing, which is
> what the patch aims to do anyway. Aside from that, it's now a little
> weird that aset.c has that but generation.c and slab.c do not.
> 

True. I think the "memory pool" shared by all memory contexts would be a
more principled way to do this - not only it works for all memory
context types, but it's also part of the "regular" cache eviction like
everything else (which the context freelist is not).

> One thing I found was that in btbeginscan(), we have "so =
> (BTScanOpaque) palloc(sizeof(BTScanOpaqueData));", which on this
> machine is 27344 bytes and results in a call to AllocSetAllocLarge()
> and therefore a malloc().  Ideally, there'd be no malloc() calls in a
> standard pgbench run, at least once the rel and cat caches have been
> warmed up.
> 

Right. That's exactly what I found in [2], where it's a massive problem
with many partitions and many concurrent connections.

> I think there are a few things in your patch that could be improved,
> here's a quick review.
> 
> 1. MemoryPoolEntryIndex() could follow the lead of
> AllocSetFreeIndex(), which is quite well-tuned and has no looping. I
> think you can get rid of MemoryPoolEntrySize() and just have
> MemoryPoolEntryIndex() round up to the next power of 2.
> 
> 2. The following could use  "result = Min(MEMPOOL_MIN_BLOCK,
> pg_nextpower2_size_t(size));"
> 
> + * should be very low, though (less than MEMPOOL_SIZES, i.e. 14).
> + */
> + result = MEMPOOL_MIN_BLOCK;
> + while (size > result)
> + result *= 2;
> 
> 3. "MemoryPoolFree": I wonder if this is a good name for such a
> function.  Really you want to return it to the pool. "Free" sounds
> like you're going to free() it.  I went for "Fetch" and "Release"
> which I thought was less confusing.
> 
> 4. MemoryPoolRealloc(), could this just do nothing if the old and new
> indexes are the same?
> 
> 5. It might be good to put a likely() around this:
> 
> + /* only do this once every MEMPOOL_REBALANCE_DISTANCE allocations */
> + if (pool->num_requests < MEMPOOL_REBALANCE_DISTANCE)
> + return;
> 
> Otherwise, if that function is inlined then you'll bloat the functions
> that inline it for not much good reason.  Another approach would be to
> have a static inline function which checks and calls a noinline
> function that does the work so that the rebalance stuff is never
> inlined.
> 

Yes, I agree with all of that. I was a bit lazy when doing the PoC, so I
ignored these things.

> Overall, I wonder if the rebalance stuff might make performance
> testing quite tricky.  I see:
> 
> +/*
> + * How often to rebalance the memory pool buckets (number of allocations).
> + * This is a tradeoff between the pool being adaptive and more overhead.
> + */
> +#define MEMPOOL_REBALANCE_DISTANCE 25000
> 
> Will TPS take a sudden jump after 25k transactions doing the same
> thing?  I'm not saying this shouldn't happen, but... benchmarking is
> pretty hard already. I wonder if there's something more fine-grained
> that can be done which makes the pool adapt faster but not all at
> once.  (I've not studied your algorithm for the rebalance.)
> 

I don't think so, or at least I haven't observed anything like that. My
intent was to make the rebalancing fairly frequent but incremental, with
each increment doing only a tiny amount of work.

It does not do any more malloc/free calls than without the cache - it
may just delay them a bit, and the assumption is the rest of the
rebalancing (walking the size slots and adjusting counters based on
activity since the last run) is super cheap.

So it shouldn't be the case that the rebalancing is so expensive to
cause a measurable drop in throughput, or something like that. I can
imagine spreading it even more (doing it in smaller steps), but on the
other hand the interval must not be too short - we need to do enough
allocations to provide good "heuristics" how to adjust the cache.

BTW it's not directly tied to transactions - it's triggered by block
allocations, and each transaction likely needs many of those.



regards


[1]
https://www.postgresql.org/message-id/bd57d9a4c219cc1392665fd5fba61dde8027b3da.camel%40crunchydata.com

[2]
https://www.postgresql.org/message-id/510b887e-c0ce-4a0c-a17a-2c6abb8d9a5c@enterprisedb.com


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dean Rasheed
Дата:
Сообщение: Re: MERGE ... RETURNING
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Introduce XID age and inactive timeout based replication slot invalidation