Re: PGC_SIGHUP shared_buffers?

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: PGC_SIGHUP shared_buffers?
Дата
Msg-id CA+TgmoYh-UD79y=og8YsfkCCqXux9rDcWOVTDkrd8BVcrnTwkw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PGC_SIGHUP shared_buffers?  (Andres Freund <andres@anarazel.de>)
Ответы Re: PGC_SIGHUP shared_buffers?  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Sat, Feb 17, 2024 at 12:38 AM Andres Freund <andres@anarazel.de> wrote:
> IMO the ability to *shrink* shared_buffers dynamically and cheaply is more
> important than growing it in a way, except that they are related of
> course. Idling hardware is expensive, thus overcommitting hardware is very
> attractive (I count "serverless" as part of that). To be able to overcommit
> effectively, unused long-lived memory has to be released. I.e. shared buffers
> needs to be shrinkable.

I see your point, but people want to scale up, too. Of course, those
people will have to live with what we can practically implement.

> Perhaps worth noting that there are two things limiting the size of shared
> buffers: 1) the available buffer space 2) the available buffer *mapping*
> space. I think making the buffer mapping resizable is considerably harder than
> the buffers themselves. Of course pre-reserving memory for a buffer mapping
> suitable for a huge shared_buffers is more feasible than pre-allocating all
> that memory for the buffers themselves. But it' still mean youd have a maximum
> set at server start.

We size the fsync queue based on shared_buffers too. That's a lot less
important, though, and could be worked around in other ways.

> Such a scheme still leaves you with a dependend memory read for a quite
> frequent operation. It could turn out to nto matter hugely if the mapping
> array is cache resident, but I don't know if we can realistically bank on
> that.

I don't know, either. I was hoping you did. :-)

But we can rig up a test pretty easily, I think. We can just create a
fake mapping that gives the same answers as the current calculation
and then beat on it. Of course, if testing shows no difference, there
is the small problem of knowing whether the test scenario was right;
and it's also possible that an initial impact could be mitigated by
removing some gratuitously repeated buffer # -> buffer address
mappings. Still, I think it could provide us with a useful baseline.
I'll throw something together when I have time, unless someone beats
me to it.

> I'm also somewhat concerned about the coarse granularity being problematic. It
> seems like it'd lead to a desire to make the granule small, causing slowness.

How many people set shared_buffers to something that's not a whole
number of GB these days? I mean I bet it happens, but in practice if
you rounded to the nearest GB, or even the nearest 2GB, I bet almost
nobody would really care. I think it's fine to be opinionated here and
hold the line at a relatively large granule, even though in theory
people could want something else.

Alternatively, maybe there could be a provision for the last granule
to be partial, and if you extend further, you throw away the partial
granule and replace it with a whole one. But I'm not even sure that's
worth doing.

> One big advantage of a scheme like this is that it'd be a step towards a NUMA
> aware buffer mapping and replacement. Practically everything beyond the size
> of a small consumer device these days has NUMA characteristics, even if not
> "officially visible". We could make clock sweeps (or a better victim buffer
> selection algorithm) happen within each "chunk", with some additional
> infrastructure to choose which of the chunks to search a buffer in. Using a
> chunk on the current numa node, except when there is a lot of imbalance
> between buffer usage or replacement rate between chunks.

I also wondered whether this might be a useful step toward allowing
different-sized buffers in the same buffer pool (ducks, runs away
quickly). I don't have any particular use for that myself, but it's a
thing some people probably want for some reason or other.

> > 2. Make a Buffer just a disguised pointer. Imagine something like
> > typedef struct { Page bp; } *buffer. WIth this approach,
> > BufferGetBlock() becomes trivial.
>
> You also additionally need something that allows for efficient iteration over
> all shared buffers. Making buffer replacement and checkpointing more expensive
> isn't great.

True, but I don't really see what the problem with this would be in
this approach.

> > 3. Reserve lots of address space and then only use some of it. I hear
> > rumors that some forks of PG have implemented something like this. The
> > idea is that you convince the OS to give you a whole bunch of address
> > space, but you try to avoid having all of it be backed by physical
> > memory. If you later want to increase shared_buffers, you then get the
> > OS to back more of it by physical memory, and if you later want to
> > decrease shared_buffers, you hopefully have some way of giving the OS
> > the memory back. As compared with the previous two approaches, this
> > seems less likely to be noticeable to most PG code.
>
> Another advantage is that you can shrink shared buffers fairly granularly and
> cheaply with that approach, compared to having to move buffes entirely out of
> a larger mapping to be able to unmap it.

Don't you have to still move buffers entirely out of the region you
want to unmap?

> > Problems include (1) you have to somehow figure out how much address space
> > to reserve, and that forms an upper bound on how big shared_buffers can grow
> > at runtime and
>
> Presumably you'd normally not want to reserve more than the physical amount of
> memory on the system. Sure, memory can be hot added, but IME that's quite
> rare.

I would think that might not be so rare in a virtualized environment,
which would seem to be one of the most important use cases for this
kind of thing.

Plus, this would mean we'd need to auto-detect system RAM. I'd rather
not go there, and just fix the upper limit via a GUC.

> > (2) you have to figure out ways to reserve address space and
> > back more or less of it with physical memory that will work on all of the
> > platforms that we currently support or might want to support in the future.
>
> We also could decide to only implement 2) on platforms with suitable APIs.

Yep, fair.

> A third issue is that it can confuse administrators inspecting the system with
> OS tools. "Postgres uses many terabytes of memory on my system!" due to VIRT
> being huge etc.

Mmph. That's disagreeable but probably not a reason to entirely
abandon any particular approach.

--
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Luzanov
Дата:
Сообщение: Re: Things I don't like about \du's "Attributes" column
Следующее
От: Robert Haas
Дата:
Сообщение: Re: PGC_SIGHUP shared_buffers?