We are trying to speed up real cases, not just benchmarks.
So +1 for the concept, patch is going in right direction though lets do the full press-up.
I have mentioned above the reason for not doing it for sub transactions, if
you think it is viable to reserve space in shared memory for this purpose, then
I can include the optimization for subtransactions as well.
The number of subxids is unbounded, so as you say, reserving shmem isn't viable.
I'm interested in real world cases, so allocating 65 xids per process isn't needed, but we can say is that the optimization shouldn't break down abruptly in the presence of a small/reasonable number of subtransactions.
I think in that case what we can do is if the total number of
sub transactions is lesser than equal to 64 (we can find that by
overflowed flag in PGXact) , then apply this optimisation, else use
the existing flow to update the transaction status. I think for that we
don't even need to reserve any additional memory. Does that sound