Hello Robert,
01.09.2023 23:21, Robert Haas wrote:
> On Fri, Sep 1, 2023 at 6:13 AM Alexander Lakhin<exclusion@gmail.com> wrote:
>> (Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the
>> issue for us.)
> Maybe it'd be worth trying something stronger, like
> pg_memory_barrier(). A compiler barrier doesn't prevent the CPU from
> reordering loads and stores as it goes, and ARM64 has weak memory
> ordering.
Indeed, thank you for the tip!
So maybe here we deal with not compiler's, but with CPU's optimization.
The wider code fragment is:
805c48: 52800028 mov w8, #1 // true
805c4c: 52800319 mov w25, #24
805c50: 5280073a mov w26, #57
805c54: fd446128 ldr d8, [x9, #2240]
805c58: 90000d7b adrp x27, 0x9b1000 <ModifyWaitEvent+0xb0>
805c5c: fd415949 ldr d9, [x10, #688]
805c60: f9071d68 str x8, [x11, #3640] // waiting = true (x8 = w8)
805c64: f90003f3 str x19, [sp]
805c68: 14000010 b 0x805ca8 <WaitEventSetWait+0x108>
805ca8: f9400a88 ldr x8, [x20, #16] // if (set->latch && set->latch->is_set)
805cac: b4000068 cbz x8, 0x805cb8 <WaitEventSetWait+0x118>
805cb0: f9400108 ldr x8, [x8]
805cb4: b5001248 cbnz x8, 0x805efc <WaitEventSetWait+0x35c>
805cb8: f9401280 ldr x0, [x20, #32]
If that CPU can delay the writing to the variable waiting
(str x8, [x11, #3640]) in it's internal form like
"store 1 to [address]" to 805cb0 or a later instruction, then we can get the
behavior discussed. Something like that is shown in the ARM documentation:
https://developer.arm.com/documentation/102336/0100/Memory-ordering?lang=en
I'll try to test this guess on the target machine...
Best regards,
Alexander