Обсуждение: [multithreading] extension compatibility

Поиск
Список
Период
Сортировка

[multithreading] extension compatibility

От
Robert Haas
Дата:
Hi,

At 2024.pgconf.dev, Heikki did a session on multithreading PostgreSQL
which I was unfortunately unable to attend due my involvement with
another session, and then we had an unconference discussion which I
was able to attend and at which I volunteered to have a look at a
couple of tasks, including "Extension Marking System (marking
extensions as thread-safe)". So in this email I'd like to (1) say a
few things about multithreading for PostgreSQL in general, (2) spell
out my understanding of the extension compatibility problem
specifically, and then (3) discuss possible solutions to that problem.
See also https://wiki.postgresql.org/wiki/Multithreading

== Multithreading Generally ==

I believe there is a consensus in the PostgreSQL developer community,
or at least among committers, that a multi-threaded programming model
would be superior to a multi-process programming model as we have now.
I won't be surprised if a few people disagree with that as a general
statement, and others may view it as better in theory but so difficult
in practice as to be not worth doing, but I believe that the consensus
is otherwise. I do understand that switching to threads introduces
some new stability risks, which are not to be taken lightly, but it
also opens the door to various performance improvements, and even
functionality, that are not feasible today. I do not believe that it
would be necessary, as has been alleged previously, to halt all other
development for a lengthy period of time while such a conversion is
undertaken, nor do I believe that the community would or should accept
such a solution were someone to propose it. I do believe that there
are some difficult problems to be solved in order to make it work at
all, and I believe even more strongly that a good deal of follow-up
work will be necessary to reap the potential benefits of such a
change. I also believe that it's absolutely necessary that both models
coexist side by side for a period of time. I think we will eventually
want to abandon the multi-process model, because I think over time the
benefits of using threads will accumulate until they are overwhelming
and the process model will end up appearing to be an obstacle to
progress. However, I don't think we'll be able to do that particularly
soon, because I think it's going to take a while to fully stabilize
the thread model even as far as the core code is concerned, and
extensions will take even longer to catch up. I realize Heikki in
particular is hoping for a quick transition; I don't see that as
feasible, but like everything else about this, opinions are going to
vary.

Obligatory disclaimer: Everything above (and below) is just a
statement of what I believe, and everyone is free to dispute it. As
always, I cannot speak to objective truth, but I can tell you what I
think.

== The Extension Compatibility Problem ==

I don't know yet whether we're going to end up with a system where the
same build of PostgreSQL can produce processes or threads depending on
configuration or whether it's going to be a build option, but I'm
guessing the latter is more likely. Certainly, if an extension is
assuming that its global variables are session-local and they suddenly
become global to the cluster, chaos will ensue. The same is true for
the core code, and will need to be solved by annotating global
variables so that the appropriate ones can be made thread-local and
the others can be given whatever treatment is appropriate considering
how they are used. The details of how this annotation system will work
are TBD, but the point for this email is that extension global
variables, including file-level globals, will need the same kinds of
annotations that we use in the core code in order to work. Other
adjustments may also be needed.

I think there are two severable problems here. One is that, if an
extension is built for use with a non-threaded PostgreSQL, we
shouldn't permit it to be used with a threaded PostgreSQL, even if the
major version and other details are compatible. Hence, threading or
the lack of it must become part of the data set up by PG_MODULE_MAGIC.
Maybe this problem goes away if we decide that threads-vs-processes is
a configuration option rather than a build-time option, but even then,
we might still end up with a build-time option indicating whether
threads are even a possibility, so I think it's pretty likely we need
this in some form. If or when the process model eventually dies, then
we can take this out again.

The other problem is that we probably want a way for extensions to
signal that they are believed to work with threading. It's a little
bit debatable whether this is a good idea, because (1) some people are
going to blindly state that their extension works fine with threading
even if they haven't actually made the necessary changes and (2) one
could simply declare that making an extension thread-ready is part of
supporting whatever PostgreSQL release adds threading as an option and
(3) one could also declare that extension authors should just document
what they do or don't support rather than doing anything in code.
However, I think it makes sense to try to make extensions fail to
compile against a threaded PostgreSQL unless the extension declares
that it supports such builds of PostgreSQL. I think that by doing
this, we'll make it a LOT easier for packagers to find out what
extensions still need updating. A packager could possibly do light
testing of an extension and fail to miss the fact that the extension
doesn't actually work properly against a threaded PostgreSQL, but you
can't fail to notice a compile failure. There's still going to be some
chaos because of (1), but I think we can mitigate that with good
messaging: documentation, wiki pages, and blog posts explaining that
this is coming and how to adapt to it can help a lot, IMHO.

== Extension Compatibility Solutions ==

The attached patch is a sketch of one possible approach: PostgreSQL
signals whether it is multithreaded by defining or not defining
PG_MULTITHREADING in pg_config_manual.h, and an extension signals
thread-readiness by defining PG_THREADSAFE_EXTENSION before including
any PostgreSQL headers other than postgres.h. If PostgreSQL is built
multithreaded and the extension does not signal thread-safety, you get
something like this:

../pgsql/src/test/modules/dummy_seclabel/dummy_seclabel.c:20:1: error:
static assertion failed due to requirement '1 == 0': must define
PG_THREADSAFE_EXTENSION or use unthreaded PostgreSQL
PG_MODULE_MAGIC;

I'm not entirely happy with this solution because the results are
confusing if PG_THREADSAFE_EXTENSION is declared after including
fmgr.h. Perhaps this can be adequately handled by documenting and
demonstrating the right pattern, or maybe somebody has a better idea.

Another idea I considered was to replace the PG_MODULE_MAGIC;
declaration with something that allows for arguments, like
PG_MODULE_MAGIC(.process_model = false, .thread_model = true). But on
further reflection, that seems like the wrong thing. AFAICS, that's
going to tell you at runtime about something that you really want to
know at compile time. But this kind of idea might need more thought if
we decide that the *same* build of PostgreSQL can either launch
processes or threads per session, because then we'd to know which
extensions were available in whichever mode applied to the current
session.

That's all I've got for today.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: [multithreading] extension compatibility

От
"Tristan Partin"
Дата:
On Wed Jun 5, 2024 at 3:05 PM CDT, Robert Haas wrote:
> ...
>
> == Extension Compatibility Solutions ==
>
> The attached patch is a sketch of one possible approach: PostgreSQL
> signals whether it is multithreaded by defining or not defining
> PG_MULTITHREADING in pg_config_manual.h, and an extension signals
> thread-readiness by defining PG_THREADSAFE_EXTENSION before including
> any PostgreSQL headers other than postgres.h. If PostgreSQL is built
> multithreaded and the extension does not signal thread-safety, you get
> something like this:
>
> ../pgsql/src/test/modules/dummy_seclabel/dummy_seclabel.c:20:1: error:
> static assertion failed due to requirement '1 == 0': must define
> PG_THREADSAFE_EXTENSION or use unthreaded PostgreSQL
> PG_MODULE_MAGIC;
>
> I'm not entirely happy with this solution because the results are
> confusing if PG_THREADSAFE_EXTENSION is declared after including
> fmgr.h. Perhaps this can be adequately handled by documenting and
> demonstrating the right pattern, or maybe somebody has a better idea.
>
> Another idea I considered was to replace the PG_MODULE_MAGIC;
> declaration with something that allows for arguments, like
> PG_MODULE_MAGIC(.process_model = false, .thread_model = true). But on
> further reflection, that seems like the wrong thing. AFAICS, that's
> going to tell you at runtime about something that you really want to
> know at compile time. But this kind of idea might need more thought if
> we decide that the *same* build of PostgreSQL can either launch
> processes or threads per session, because then we'd to know which
> extensions were available in whichever mode applied to the current
> session.

Not entirely sure how I feel about the approach you've taken, but here
is a patch that Heikki and I put together for extension compatibility.
It's not a build time solution, but a runtime solution. Instead of
PG_MODULE_MAGIC, extensions would use PG_MAGIC_MODULE_REENTRANT. There
is a GUC called `multithreaded` which controls the variable
IsMultithreaded. We operated under the assumption that being able to
toggle multithreading and multi-processing without recompiling has
value.

--
Tristan Partin
https://tristan.partin.io

Вложения

Re: [multithreading] extension compatibility

От
Jelte Fennema-Nio
Дата:
On Wed, 5 Jun 2024 at 22:05, Robert Haas <robertmhaas@gmail.com> wrote:
> The attached patch is a sketch of one possible approach: PostgreSQL
> signals whether it is multithreaded by defining or not defining
> PG_MULTITHREADING in pg_config_manual.h, and an extension signals
> thread-readiness by defining PG_THREADSAFE_EXTENSION before including
> any PostgreSQL headers other than postgres.h.

My first gut-reaction: It seems kinda annoying to have to do this for
every c that you use to build your extension, e.g. citus or postgis
have a ton of those.

PG_MODULE_MAGIC seems like a better fit imho.

If we really want a compile time failure, then I think I'd prefer to
have a new postgres.h file (e.g. postgres-thread.h) that you would
include instead of plain postgres.h. Basically this file could then
contain the below two lines:

#include "postgres.h"
#define PG_THREADSAFE_EXTENSION    1



Re: [multithreading] extension compatibility

От
Robert Haas
Дата:
On Wed, Jun 5, 2024 at 4:32 PM Tristan Partin <tristan@partin.io> wrote:
> Not entirely sure how I feel about the approach you've taken, but here
> is a patch that Heikki and I put together for extension compatibility.
> It's not a build time solution, but a runtime solution. Instead of
> PG_MODULE_MAGIC, extensions would use PG_MAGIC_MODULE_REENTRANT. There
> is a GUC called `multithreaded` which controls the variable
> IsMultithreaded. We operated under the assumption that being able to
> toggle multithreading and multi-processing without recompiling has
> value.

That's interesting, because I thought Heikki was against having a
runtime toggle.

I don't think PG_MODULE_MAGIC_REENTRANT is a good syntax. It all looks
great as long as we only ever need the PG_MODULE_MAGIC line to signal
one bit of information, but as soon as you get to two bits it's pretty
questionable, and anything more than two bits is insane. If we want to
do something with the PG_MODULE_MAGIC line, I think it should involve
options-passing of some form rather than just having an alternate
macro name.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: [multithreading] extension compatibility

От
"Tristan Partin"
Дата:
On Wed Jun 5, 2024 at 3:56 PM CDT, Robert Haas wrote:
> On Wed, Jun 5, 2024 at 4:32 PM Tristan Partin <tristan@partin.io> wrote:
> > Not entirely sure how I feel about the approach you've taken, but here
> > is a patch that Heikki and I put together for extension compatibility.
> > It's not a build time solution, but a runtime solution. Instead of
> > PG_MODULE_MAGIC, extensions would use PG_MAGIC_MODULE_REENTRANT. There
> > is a GUC called `multithreaded` which controls the variable
> > IsMultithreaded. We operated under the assumption that being able to
> > toggle multithreading and multi-processing without recompiling has
> > value.
>
> That's interesting, because I thought Heikki was against having a
> runtime toggle.
>
> I don't think PG_MODULE_MAGIC_REENTRANT is a good syntax. It all looks
> great as long as we only ever need the PG_MODULE_MAGIC line to signal
> one bit of information, but as soon as you get to two bits it's pretty
> questionable, and anything more than two bits is insane. If we want to
> do something with the PG_MODULE_MAGIC line, I think it should involve
> options-passing of some form rather than just having an alternate
> macro name.

I agree that this method doesn't lend itself to future extensibility.

--
Tristan Partin
https://tristan.partin.io



Re: [multithreading] extension compatibility

От
Heikki Linnakangas
Дата:
On 05/06/2024 23:55, Robert Haas wrote:
> On Wed, Jun 5, 2024 at 4:32 PM Tristan Partin <tristan@partin.io> wrote:
>> Not entirely sure how I feel about the approach you've taken, but here
>> is a patch that Heikki and I put together for extension compatibility.
>> It's not a build time solution, but a runtime solution. Instead of
>> PG_MODULE_MAGIC, extensions would use PG_MAGIC_MODULE_REENTRANT. There
>> is a GUC called `multithreaded` which controls the variable
>> IsMultithreaded. We operated under the assumption that being able to
>> toggle multithreading and multi-processing without recompiling has
>> value.
> 
> That's interesting, because I thought Heikki was against having a
> runtime toggle.

I'm very much in favor of a runtime toggle. To be precise, a 
PGC_POSTMASTER setting. We'll get a lot more testing if you can easily 
turn it on/off, and so far I haven't seen anything that would require it 
to be a compile time option.

> I don't think PG_MODULE_MAGIC_REENTRANT is a good syntax. It all looks
> great as long as we only ever need the PG_MODULE_MAGIC line to signal
> one bit of information, but as soon as you get to two bits it's pretty
> questionable, and anything more than two bits is insane. If we want to
> do something with the PG_MODULE_MAGIC line, I think it should involve
> options-passing of some form rather than just having an alternate
> macro name.

+1, that would be nicer.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: [multithreading] extension compatibility

От
Robert Haas
Дата:
On Wed, Jun 5, 2024 at 8:01 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I'm very much in favor of a runtime toggle. To be precise, a
> PGC_POSTMASTER setting. We'll get a lot more testing if you can easily
> turn it on/off, and so far I haven't seen anything that would require it
> to be a compile time option.

I was thinking about global variable annotations. If someone wants to
build without multithreading, I think that they won't want to still
end up with a ton of variables being changed to thread-local. So I
think there has to be a build-time option controlling whether this
build supports threading. I suspect there will be other people who
want to just shut all of this experimental code off, which is probably
going to be a second driver for a build-time toggle. But even with
that, we can still have a GUC controlling whether threading is
actually used. Does that make sense to you?

Supposing it does, then how does the extension-marking system need to
work? I suppose in this world we don't want any build failures: you're
allowed to build a non-thread-aware extension against a
threading-capable PostgreSQL; you're just not allowed to load the
resulting extension when the server is in threading mode.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: [multithreading] extension compatibility

От
Andres Freund
Дата:
Hi,

On 2024-06-05 21:10:01 -0400, Robert Haas wrote:
> On Wed, Jun 5, 2024 at 8:01 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > I'm very much in favor of a runtime toggle. To be precise, a
> > PGC_POSTMASTER setting. We'll get a lot more testing if you can easily
> > turn it on/off, and so far I haven't seen anything that would require it
> > to be a compile time option.
> 
> I was thinking about global variable annotations. If someone wants to
> build without multithreading, I think that they won't want to still
> end up with a ton of variables being changed to thread-local.

Depending on the architecture / ABI / compiler options it's often not
meaningfully more expensive to access a thread local variable than a "normal"
variable.

I think these days it's e.g. more expensive on x86-64 windows, but not
linux. On arm the overhead of TLS is more noticeable, across platforms,
afaict.

Example compiler output for x86-64 and armv8:
https://godbolt.org/z/K369eG5MM

Cycle analysis or linux x86-64 output:
https://godbolt.org/z/KK57vM1of

This shows that for the linux x86-64 case there's no difference in efficiency
between the tls/non-tls case.

The reason it's so fast on x86-64 linux is that they reused one of the "old"
segment registers to serve as the index register differing between each
thread.  For x86-64 code, most code is compiled position independent, and
*also* uses an indexed mode (but relative to the instruction pointer).


I think we might be able to gain some small performance benefits via the
annotations, which actualy might make it viable to just apply the annotations
regardless of using threads or not.


Greetings,

Andres Freund



Re: [multithreading] extension compatibility

От
Robert Haas
Дата:
On Wed, Jun 5, 2024 at 9:50 PM Andres Freund <andres@anarazel.de> wrote:
> Depending on the architecture / ABI / compiler options it's often not
> meaningfully more expensive to access a thread local variable than a "normal"
> variable.
>
> I think these days it's e.g. more expensive on x86-64 windows, but not
> linux. On arm the overhead of TLS is more noticeable, across platforms,
> afaict.

I mean, to me, this still sounds like we want multithreading to be a
build-time option.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: [multithreading] extension compatibility

От
Andres Freund
Дата:
Hi,

On 2024-06-05 21:59:42 -0400, Robert Haas wrote:
> On Wed, Jun 5, 2024 at 9:50 PM Andres Freund <andres@anarazel.de> wrote:
> > Depending on the architecture / ABI / compiler options it's often not
> > meaningfully more expensive to access a thread local variable than a "normal"
> > variable.
> >
> > I think these days it's e.g. more expensive on x86-64 windows, but not
> > linux. On arm the overhead of TLS is more noticeable, across platforms,
> > afaict.
> 
> I mean, to me, this still sounds like we want multithreading to be a
> build-time option.

Maybe. I think shipping a mode where users can fairly simply toggle between
threaded and process mode will allow us to get this stable a *lot* quicker
than if we distribute two builds. Most users don't build from source, distros
will have to pick the mode. If they don't choose threaded mode, we'll not find
problems. If they do choose threaded mode, we can't ask users to switch to a
process based mode to check if the problem is related.

We have been talking in a bunch of threads about having a mode where the main
postgres binary chooses a build optimized for the current architecture, to be
able to use SIMD instructions without a runtime check/dispatch. I guess we
could add threadedness to such a matrix...

Greetings,

Andres Freund



Re: [multithreading] extension compatibility

От
Robert Haas
Дата:
On Wed, Jun 5, 2024 at 10:09 PM Andres Freund <andres@anarazel.de> wrote:
> Maybe. I think shipping a mode where users can fairly simply toggle between
> threaded and process mode will allow us to get this stable a *lot* quicker
> than if we distribute two builds. Most users don't build from source, distros
> will have to pick the mode. If they don't choose threaded mode, we'll not find
> problems. If they do choose threaded mode, we can't ask users to switch to a
> process based mode to check if the problem is related.

I don't believe that being coercive here is the right approach. I
think distros see the value in compiling with as many things turned on
as possible; when they ship with something turned off, it's because
it's unstable or introduces weird dependencies or has some other
disadvantage.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: [multithreading] extension compatibility

От
Heikki Linnakangas
Дата:
On 06/06/2024 05:47, Robert Haas wrote:
> On Wed, Jun 5, 2024 at 10:09 PM Andres Freund <andres@anarazel.de> wrote:
>> Maybe. I think shipping a mode where users can fairly simply toggle between
>> threaded and process mode will allow us to get this stable a *lot* quicker
>> than if we distribute two builds. Most users don't build from source, distros
>> will have to pick the mode. If they don't choose threaded mode, we'll not find
>> problems. If they do choose threaded mode, we can't ask users to switch to a
>> process based mode to check if the problem is related.
> 
> I don't believe that being coercive here is the right approach. I
> think distros see the value in compiling with as many things turned on
> as possible; when they ship with something turned off, it's because
> it's unstable or introduces weird dependencies or has some other
> disadvantage.

I presume there's no harm in building with multithreading support. If 
you don't want to use it, put "multithreading=off" in your config file 
(which will presumably be the default for a while).

If we're worried about the performance impact of thread-local variables 
in particular, we can try to measure that. I don't think it's material 
though.

If there is some material harm from compiling with multithreading 
support even if you're not using it, we should try to fix that. I'm not 
dead set against having a compile-time option, but I don't see the need 
for it at the moment.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: [multithreading] extension compatibility

От
Robert Haas
Дата:
On Thu, Jun 6, 2024 at 5:00 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> If there is some material harm from compiling with multithreading
> support even if you're not using it, we should try to fix that. I'm not
> dead set against having a compile-time option, but I don't see the need
> for it at the moment.

Well, OK, so it sounds like I'm outvoted, at least at the moment.
Maybe that will change as more people vote, but for now, that's where
we are. Given that, I suppose we want something more like Tristan's
patch, but with a more extensible syntax. Does that sound right?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: [multithreading] extension compatibility

От
Heikki Linnakangas
Дата:
On 06/06/2024 17:23, Robert Haas wrote:
> On Thu, Jun 6, 2024 at 5:00 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> If there is some material harm from compiling with multithreading
>> support even if you're not using it, we should try to fix that. I'm not
>> dead set against having a compile-time option, but I don't see the need
>> for it at the moment.
> 
> Well, OK, so it sounds like I'm outvoted, at least at the moment.
> Maybe that will change as more people vote, but for now, that's where
> we are. Given that, I suppose we want something more like Tristan's
> patch, but with a more extensible syntax. Does that sound right?

+1

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: [multithreading] extension compatibility

От
Robert Haas
Дата:
On Thu, Jun 6, 2024 at 10:32 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > Well, OK, so it sounds like I'm outvoted, at least at the moment.
> > Maybe that will change as more people vote, but for now, that's where
> > we are. Given that, I suppose we want something more like Tristan's
> > patch, but with a more extensible syntax. Does that sound right?
>
> +1

So, how shall we proceed here? Tristan, do you want to update your
patch based on this feedback?

--
Robert Haas
EDB: http://www.enterprisedb.com