Re: Inconsistent use of "volatile" when accessing shared memory?

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Inconsistent use of "volatile" when accessing shared memory?
Дата
Msg-id 20231104013639.dhjroch6jybdhlv6@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Inconsistent use of "volatile" when accessing shared memory?  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Inconsistent use of "volatile" when accessing shared memory?  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
Hi,

On 2023-11-03 17:44:44 -0700, Jeff Davis wrote:
> On Fri, 2023-11-03 at 15:59 -0700, Andres Freund wrote:
> > I don't think so. We used to use volatile for most shared memory
> > accesses, but
> > volatile doesn't provide particularly useful semantics - and
> > generates
> > *vastly* slower code in a lot of circumstances. Most of that usage
> > predates
> > spinlocks being proper compiler barriers,
>
> A compiler barrier doesn't always force the compiler to generate loads
> and stores, though.

I don't think volatile does so entirely either, but I don't think it's
relevant anyway.


> For instance (example code I placed at the bottom of xlog.c):
>
>   typedef struct DummyStruct {
>       XLogRecPtr recptr;
>   } DummyStruct;
>   extern void DummyFunction(void);
>   static DummyStruct Dummy = { 5 };
>   static DummyStruct *pDummy = &Dummy;
>   void
>   DummyFunction(void)
>   {
>       while(true)
>       {
>           pg_compiler_barrier();
>           pg_memory_barrier();
>           if (pDummy->recptr == 0)
>               break;
>           pg_compiler_barrier();
>           pg_memory_barrier();
>       }
>   }
>
>
> Generates the following code (clang -O2):
>
>   000000000016ed10 <DummyFunction>:
>     16ed10:       f0 83 04 24 00          lock addl $0x0,(%rsp)
>     16ed15:       f0 83 04 24 00          lock addl $0x0,(%rsp)
>     16ed1a:       eb f4                   jmp    16ed10 <DummyFunction>
>     16ed1c:       0f 1f 40 00             nopl   0x0(%rax)
>
> Obviously this is an oversimplified example and if I complicate it in
> any number of ways then it will start generating actual loads and
> stores, and then the compiler and memory barriers should do their job.

All that is happening here is that clang can prove that nothing ever could
change the variable (making this actually undefined behaviour, IIRC). If you
add another function like:

extern void
SetDummy(uint64_t lsn)
{
    pDummy->recptr = lsn;
}

clang does generate the code you'd expect - there's a possibility that it
could be true.

See https://godbolt.org/z/EaM77E8jK

We don't gain anything from preventing the compiler from making such
conclusions afaict.


> > Note that use of volatile does *NOT* guarantee anything about memory
> > ordering!
>
> Right, but it does force loads/stores to be emitted by the compiler;
> and without loads/stores a memory barrier is useless.

There would not have been the point in generating that load...


> I understand that my example is too simple and I'm not claiming that
> there's a problem. I'd just like to understand the key difference
> between my example and what we do with XLogCtl.

The key difference is that there's code changing relevant variables :)


> Another way to phrase my question: under what specific circumstances
> must we use something like UINT32_ACCESS_ONCE()? That seems to be used
> for local pointers

I guess I don't really know what you mean with global or local pointers?
There's no point in ever using something like it if the memory is local to a
function.

The procarray uses all are for shared memory. Sure, we save a pointer to a
subset of shared memory in a local variable, but that doesn't change very much
(it does tell the compiler that it doesnt' need to reload ProcGlobal->xid from
memory after calling an external function (which could change it), so it's a
minor efficiency improvement).

The reason for
        /* Fetch xid just once - see GetNewTransactionId */
        pxid = UINT32_ACCESS_ONCE(other_xids[pgxactoff]);
is explained in a comment in GetNewTransactionId():
     *
     * Note that readers of ProcGlobal->xids/PGPROC->xid should be careful to
     * fetch the value for each proc only once, rather than assume they can
     * read a value multiple times and get the same answer each time.  Note we
     * are assuming that TransactionId and int fetch/store are atomic.

without the READ_ONCE(), the compiler could decide to not actually load the
memory contents into a register through the loop body, but to just refetch it
every time.

We could also implement this with a compiler barrier between fetching pxid and
using it - but it'd potentially lead to worse code, because instead of just
forcing one load to come from memory, it'd also force reloading *other*
variables from memory.


> but it's not clear to me exactly why that matters.  Intuitively, access
> through a local pointer seems much more likely to be optimized and therefore
> more dangerous, but that doesn't imply that access through global variables
> is not dangerous.

I really don't think there's a meaningful difference between the two. What is
interesting is where the memory points to and whether the compiler can reason
about other code potentially having had access to the memory.

E.g. with code like this (overly simple example)

    int foo = 0;
    int *foop = &0;

    pg_memory_barrier();

    if (*foop != 0)
       return false;

the compiler can easily prove that the variable could not have changed - there
is no legal way for foo too have changed, regardless of the memory barrier.

However, if the code looked like this:

    int foo = 0;
    int *foop = &0;

    SomethingTheCompilerCantSee(foop);

    if (*foop != 0)
       return false;

or

extern int *extern_foop;

    int foo = 0;
    int *foop = &0;

    extern_foop = foop;

    if (*foop != 0)
       return false;


the compiler couldn't prove that anymore.


Of course the case of a pointer to a stack variable is on the very trivial
side. But this holds true for other cases, e.g. the one you noticed above, a
static variable (guaranteeing it's not accessed in other translation units)
only accessed in one function - there couldn't have been anything modifying
it.

This even holds true for memory allocated by malloc() - the language
guarantees that there shall be no other pointers to that memory, so the
compiler knows that just after returning there can't be nobody but the caller
accessing the memory. Until the pointer escapes in some form.


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Explicitly skip TAP tests under Meson if disabled
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Add new option 'all' to pg_stat_reset_shared()