Обсуждение: Switch to unnamed POSIX semaphores as our preferred sema code?

Поиск
Список
Период
Сортировка

Switch to unnamed POSIX semaphores as our preferred sema code?

От
Tom Lane
Дата:
I've gotten a bit tired of seeing "could not create semaphores: No space
left on device" failures in the buildfarm, so I looked into whether we
should consider preferring unnamed POSIX semaphores over SysV semaphores.

We've had code for named and unnamed POSIX semaphores in our tree for
a long time, but it's not actually used on any current platform AFAIK.
There are good reasons to avoid the named-semaphore variant: typically
that eats a file descriptor per sema per backend.  However that
complaint doesn't necessarily apply to unnamed semaphores.  Indeed,
it seems that on Linux an unnamed POSIX semaphore is basically a futex,
which eats zero kernel resources; all the state is in userspace.

Although in normal cases the semaphore code paths aren't very heavily
exercised in our code, I was able to get a measurable performance
difference by building with --disable-spinlocks, so that spinlocks are
emulated with semaphores.  On an 8-core RHEL6 machine, "pgbench -S -c 20
-j 20" seems to be about 4% faster with unnamed semaphores than SysV
semaphores.  It'd be good to replicate that test on some higher-end
hardware, but provisionally I'd say unnamed semaphores are faster.

The data structure is bigger: Linux's type sem_t is 32 bytes on 64-bit
machines (16 bytes on 32-bit) whereas we use 8 bytes for SysV semaphores.
But there aren't normally a huge number of semaphores in a cluster, and
anyway this comparison is cheating because it ignores the space taken for
the kernel data structures backing the SysV semaphores.

There was some previous discussion about this in
https://www.postgresql.org/message-id/flat/20160621193412.5792.65085%40wrigleys.postgresql.org
but that thread tailed off without a resolution, partly because it wasn't
the kind of change we'd consider making in late beta.  One thing
I expressed concern about there was whether there are any hidden kernel
resources underlying an unnamed semaphore.  So far as I can tell by
strace'ing sem_init and sem_destroy, there are not, at least on Linux.

Another issue is raised in today's discussion
https://www.postgresql.org/message-id/flat/14947.1475690465%40sss.pgh.pa.us
where it appears that we might need to be more careful about putting
memory barriers into the unnamed-semaphore code (probably because it
might not enter the kernel).  But if that's a bug, we'd want to fix it
anyway, IMO.

So for Linux, I think probably we should switch.

macOS seems not to have unnamed POSIX semaphores, only named ones (the
functions exist, but they always fail with ENOSYS).  However, some
googling suggests that other BSD derivatives do have these primitives, so
somebody ought to do a similar comparison on them to see if switching is a
win.  (The first thread above asserts that it is for FreeBSD, but someone
should recheck using a test case that stresses semaphores more.)

Dunno about other platforms.  sem_init is nominally required by SUS v2,
but it doesn't seem to actually exist everywhere, so I doubt we can drop
SysV altogether.  I'd be inclined to change the default on a platform-
by-platform basis not whole hog.

If anyone wants to test, the main thing you have to do to try this in
the existing code is to add "USE_UNNAMED_POSIX_SEMAPHORES=1" and
"--disable-spinlocks" to your configure arguments.  On Linux you may need
to add -lrt to the backend LIBS list, though on my machine configure is
putting that in already.
        regards, tom lane



Re: Switch to unnamed POSIX semaphores as our preferred sema code?

От
"Tsunakawa, Takayuki"
Дата:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane
> I've gotten a bit tired of seeing "could not create semaphores: No space
> left on device" failures in the buildfarm, so I looked into whether we should
> consider preferring unnamed POSIX semaphores over SysV semaphores.

+100
Wonderful decision and cautious analysis.  This will make PostgreSQL more friendly to users, especially newcomers, by
eliminatingthe need to tune kernel resources.  I wish other kernel resources (files, procs) will need no tuning like
Windows,but that's just a daydream.
 

Regards
Takayuki Tsunakawa





Re: Switch to unnamed POSIX semaphores as our preferred sema code?

От
Tom Lane
Дата:
I wrote:
> Although in normal cases the semaphore code paths aren't very heavily
> exercised in our code, I was able to get a measurable performance
> difference by building with --disable-spinlocks, so that spinlocks are
> emulated with semaphores.  On an 8-core RHEL6 machine, "pgbench -S -c 20
> -j 20" seems to be about 4% faster with unnamed semaphores than SysV
> semaphores.  It'd be good to replicate that test on some higher-end
> hardware, but provisionally I'd say unnamed semaphores are faster.

I realized that the above test is probably bogus, or at least not very
relevant to real-world Postgres performance.  A key performance aspect of
Linux futexes is that uncontended lock acquisitions, as well as releases
that don't need to wake anyone, don't enter the kernel at all.  However,
in PG's normal use of semaphores, neither scenario occurs very often;
processes lock their semaphores only after determining that they need to
wait, and release semaphores only when it's known they'll waken a sleeper.
The futex fast-path cases can occur only in the race condition that
someone else awakens a would-be waiter before it actually reaches its
semop call.  However, uncontended locks and releases *are* very common
for spinlocks.  This means that testing with --disable-spinlocks will show
a futex performance benefit that's totally irrelevant for real cases.

Based on that analysis, I abandoned testing with --disable-spinlocks
and instead tried to measure the actual speed of contended heavyweight
lock acquisition/release.  I usedpgbench -f lockscript.sql -c 20 -j 20 -T 60 bench
with the script beingbegin; lock table pgbench_accounts; commit;
I got speeds between 10500 and 10800 TPS with either semaphore API;
if there's any difference at all, it's below the noise level for
this test scenario.

So I'm now thinking there's basically no performance consideration
here, and the point of switching would just be to get out from
under SysV kernel resource limits.  (Again though, this applies
only to Linux --- the other thread I cited suggests things might
be quite different on FreeBSD for instance.)

Can anyone think of a test case that would stress semaphore operations
more heavily, without being unrealistic?
        regards, tom lane



Re: Switch to unnamed POSIX semaphores as our preferred sema code?

От
Robert Haas
Дата:
On Thu, Oct 6, 2016 at 9:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Can anyone think of a test case that would stress semaphore operations
> more heavily, without being unrealistic?

I think it's going to be pretty hard to come up with a non-artificial
test case that has exhibits meaningful lwlock contention on an 8-core
system.  If you go back to 9.1, before we had fast-path locking, you
can do it, because the relation locks and vxid locks do cause
noticeable contention on the lock manager locks in that version.
Alternatively, you might try something like "pgbench -n -S -c $N -j
$N" with a scale factor that doesn't fit in shared buffers.  This
probably won't produce significant contention because there are 128
LWLocks and only 8 cores, but you could reduce the number of buffer
mapping LWLocks to, say, 4 and then you'd probably hit it fairly hard.

Alternatively, get a bigger box.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Switch to unnamed POSIX semaphores as our preferred sema code?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Alternatively, get a bigger box.  :-)

So what's it take to get access to hydra?
        regards, tom lane



Re: Switch to unnamed POSIX semaphores as our preferred sema code?

От
Robert Haas
Дата:
On Thu, Oct 6, 2016 at 5:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Alternatively, get a bigger box.  :-)
>
> So what's it take to get access to hydra?

Send me a private email with your .ssh key.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Switch to unnamed POSIX semaphores as our preferred sema code?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Oct 6, 2016 at 9:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Can anyone think of a test case that would stress semaphore operations
>> more heavily, without being unrealistic?

> I think it's going to be pretty hard to come up with a non-artificial
> test case that has exhibits meaningful lwlock contention on an 8-core
> system.  If you go back to 9.1, before we had fast-path locking, you
> can do it, because the relation locks and vxid locks do cause
> noticeable contention on the lock manager locks in that version.
> ...
> Alternatively, get a bigger box.  :-)

Well, I did both of the above.  I tried 9.1 on "hydra", that 60-processor
POWER7 box, and cranked the parallelism up to ridiculous levels:
pgbench -S -j 250 -c 250 -M prepared -T 60 bench

Median of 3 runs with sysv semaphores:

number of transactions actually processed: 1554570
tps = 25875.432836 (including connections establishing)
tps = 25894.938187 (excluding connections establishing)

Ditto, for unnamed POSIX semaphores:

number of transactions actually processed: 1726696
tps = 28742.486104 (including connections establishing)
tps = 28765.963071 (excluding connections establishing)

That's about a 10% win for POSIX semaphores.  Now, at saner loads,
I couldn't see much of any difference between the two semaphore APIs.
So I'm still of the opinion that there's not likely to be any meaningful
performance difference in practice, at least not on reasonably recent
Linux machines.  But this does indicate that if there is any difference,
it will probably favor switching.
        regards, tom lane



Re: Switch to unnamed POSIX semaphores as our preferred sema code?

От
Christoph Berg
Дата:
Re: Tom Lane 2016-10-08 <29244.1475959928@sss.pgh.pa.us>
> So I'm still of the opinion that there's not likely to be any meaningful
> performance difference in practice, at least not on reasonably recent
> Linux machines.  But this does indicate that if there is any difference,
> it will probably favor switching.

Another data point that's admittedly much more of a footnote than
serious input to the original question is the following: Debian has a
(so far mostly toy) port "hurd-i386" which is using the GNU hurd
kernel along with the usual GNU userland that's also in use on Linux.

This OS doesn't implement any semaphores yet (PG compiles, but initdb
dies with ENOSYS immediately). On talking to the porters, they advised
that POSIX semaphores would have the best chances to get implemented
first, so I added USE_UNNAMED_POSIX_SEMAPHORES=1 to the architecture
template to be prepared for that.

Christoph


(The patch quoted below is obviously Debian-specific and not meant for
inclusion upstream.)


hurd doesn't support sysv semaphores (semget), and needs -pthread to find
sem_init. POSIX semaphores shared between processes (sem_init(pshared = 1))
aren't supported yet either, but have the best chance to get implemented, so be
prepared.

FATAL:  could not create semaphores: Function not implemented
DETAIL:  Failed system call was semget(1, 17, 03600).

undefined reference to symbol 'sem_init@@GLIBC_2.12'

--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -109,6 +109,10 @@ endifendif # aix
+ifeq ($(shell dpkg-architecture -qDEB_HOST_ARCH_OS), hurd)
+LIBS += -pthread
+endif # hurd
+# Update the commonly used headers before building the subdirectories$(SUBDIRS:%=%-recursive): | generated-headers
--- a/src/template/linux
+++ b/src/template/linux
@@ -28,3 +28,10 @@ if test "$SUN_STUDIO_CC" = "yes" ; then    ;;  esacfi
+
+# force use of POSIX instead of SysV semaphores on hurd-i386
+case $(dpkg-architecture -qDEB_HOST_ARCH) in
+    hurd*)
+        USE_UNNAMED_POSIX_SEMAPHORES=1
+        ;;
+esac


Re: Switch to unnamed POSIX semaphores as our preferred sema code?

От
Tom Lane
Дата:
Christoph Berg <myon@debian.org> writes:
> Another data point that's admittedly much more of a footnote than
> serious input to the original question is the following: Debian has a
> (so far mostly toy) port "hurd-i386" which is using the GNU hurd
> kernel along with the usual GNU userland that's also in use on Linux.

> This OS doesn't implement any semaphores yet (PG compiles, but initdb
> dies with ENOSYS immediately). On talking to the porters, they advised
> that POSIX semaphores would have the best chances to get implemented
> first, so I added USE_UNNAMED_POSIX_SEMAPHORES=1 to the architecture
> template to be prepared for that.

As of HEAD, that should happen automatically for anything using the
"linux" template.


I did some googling (but no actual testing) to try to find out the state
of POSIX sema support for the other platform templates:

aix

AIX doesn't seem to have support (reportedly, the functions exist but
always fail).

cygwin

Not clear whether unnamed semas work on this; I found conflicting reports.

darwin

Unnamed semas are known not to work here.

hpux

Reportedly, unnamed POSIX sema support exists on HPUX 11.x, but on 10.x
sem_init fails with ENOSYS.  We'd need a run-time test in configure to
see whether to use it.  Doubt it's worth the trouble.

netbsd

No support for cross-process unnamed semas.

openbsd

No support for cross-process unnamed semas.

sco

Doubt anyone cares.

solaris

Apparently supported in newer versions of Solaris; as with HPUX,
we might need a run-time configure probe to tell.  Again, without
specific evidence that it might be worth switching, I doubt it's
worth taking any trouble over.

unixware

Doubt anyone cares.

win32

No support.


So at this point it seems likely that stopping with Linux and FreeBSD
is the thing to do, and as far as I can tell the code we have now is
working with all variants of those that we have in the buildfarm.
(I'm a little suspicious that older variants of FreeBSD might not
have working sem_init, like the other *BSD variants, necessitating
a run-time test there.  But we'll cross that bridge when we come
to it.)

So, barring further input, this project is done.  I'll go update
the user docs to explain the new state of affairs.
        regards, tom lane



Re: Switch to unnamed POSIX semaphores as our preferred sema code?

От
Thomas Munro
Дата:
On Tue, Oct 11, 2016 at 5:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So at this point it seems likely that stopping with Linux and FreeBSD
> is the thing to do, and as far as I can tell the code we have now is
> working with all variants of those that we have in the buildfarm.
> (I'm a little suspicious that older variants of FreeBSD might not
> have working sem_init, like the other *BSD variants, necessitating
> a run-time test there.  But we'll cross that bridge when we come
> to it.)

The sem_init man page from FreeBSD 8.4[1] (EOL August 2015) and earlier said:
    This implementation does not support shared semaphores, and reports this    fact by setting errno to EPERM.

FreeBSD 9.0 (released January 2012) reimplemented semaphores and
removed those words from that man page[2].  All current releases[3]
support it, though I guess there may be 8.4 machines out there a year
and a bit after EOL.

[1]
https://www.freebsd.org/cgi/man.cgi?query=sem_init&apropos=0&sektion=0&manpath=FreeBSD+8.4-RELEASE&arch=default&format=html
[2]
https://www.freebsd.org/cgi/man.cgi?query=sem_init&apropos=0&sektion=0&manpath=FreeBSD+9.0-RELEASE&arch=default&format=html
[3] https://www.freebsd.org/releases/

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Switch to unnamed POSIX semaphores as our preferred sema code?

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Tue, Oct 11, 2016 at 5:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (I'm a little suspicious that older variants of FreeBSD might not
>> have working sem_init, like the other *BSD variants, necessitating
>> a run-time test there.  But we'll cross that bridge when we come
>> to it.)

> The sem_init man page from FreeBSD 8.4[1] (EOL August 2015) and earlier said:
>      This implementation does not support shared semaphores, and reports this
>      fact by setting errno to EPERM.
> FreeBSD 9.0 (released January 2012) reimplemented semaphores and
> removed those words from that man page[2].

Yeah, in subsequent googling I found other mentions of this having been
added in FreeBSD 9.0.  But that will be more than 5 years old by the
time PG 10 gets out.

> All current releases[3] support it, though I guess there may be 8.4
> machines out there a year and a bit after EOL.

We don't have anything older than 9.0 in the buildfarm, which I take
to indicate that nobody particularly cares about older versions anymore.
I would just as soon not add a run-time test in configure (it breaks
cross-compiles), so I'd rather wait and see if anyone complains.
        regards, tom lane