Re: Race condition in TransactionIdIsInProgress

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Race condition in TransactionIdIsInProgress
Дата
Msg-id 20220212215035.ge2elo73ykxhofuo@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Race condition in TransactionIdIsInProgress  (Simon Riggs <simon.riggs@enterprisedb.com>)
Список pgsql-hackers
Hi,

On 2022-02-12 13:25:58 +0000, Simon Riggs wrote:
> On Fri, 11 Feb 2022 at 19:08, Andres Freund <andres@anarazel.de> wrote:
> 
> > On 2022-02-11 13:48:59 +0000, Simon Riggs wrote:
> > > On Fri, 11 Feb 2022 at 08:48, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> > > >
> > > > On Fri, 11 Feb 2022 at 06:11, Andres Freund <andres@anarazel.de> wrote:
> > > >
> > > > > Looks lik syncrep will make this a lot worse, because it can drastically
> > > > > increase the window between the TransactionIdCommitTree() and
> > > > > ProcArrayEndTransaction() due to the SyncRepWaitForLSN() inbetween.  But at
> > > > > least it might make it easier to write tests exercising this scenario...
> > > >
> > > > Agreed
> > > >
> > > > TransactionIdIsKnownCompleted(xid) is only broken because the single
> > > > item cache is set too early in some cases. The single item cache is
> > > > important for performance, so we just need to be more careful about
> > > > setting the cache.
> > >
> > > Something like this... fix_cachedFetchXid.v1.patch prevents the cache
> > > being set, but this fails! Not worked out why, yet.
> >
> > I don't think it's safe to check !TransactionIdKnownNotInProgress() in
> > TransactionLogFetch(), given that TransactionIdKnownNotInProgress() needs to
> > do TransactionLogFetch() internally.
> 
> That's not correct because you're confusing
> TransactionIdKnownNotInProgress(), which is a new function introduced
> by the patch, with the existing function
> TransactionIdIsKnownCompleted().

I don't think so. This call of the new TransactionIdKnownNotInProgress()

> --- a/src/backend/access/transam/transam.c
> +++ b/src/backend/access/transam/transam.c
> @@ -73,6 +73,14 @@ TransactionLogFetch(TransactionId transactionId)
>          return TRANSACTION_STATUS_ABORTED;
>      }
>  
> +    /*
> +     * Safeguard that we have called TransactionIsIdInProgress() before
> +     * checking commit log manager, to ensure that we do not cache the
> +     * result until the xid is no longer in the procarray at eoxact.
> +     */
> +    if (!TransactionIdKnownNotInProgress(transactionId))
> +        return TRANSACTION_STATUS_IN_PROGRESS;
> +
>      /*
>       * Get the transaction status.
>       */

isn't right. Consider what happens to TransactionIdIsInProgress(x)'s call to
TransactionIdDidAbort(x) at "step 4". TransactionIdDidAbort(x) will call
TransactionLogFetch(x). cachedXidIsNotInProgress isn't yet set to x, so
TransactionLogFetch() returns TRANSACTION_STATUS_IN_PROGRESS. Even if the
(sub)transaction aborted.

Which explains why your first patch doesn't work.


> > > just_remove_TransactionIdIsKnownCompleted_call.v1.patch
> >
> > I think this might contain a different diff than what the name suggests?
> 
> Not at all, please don't be confused by the name. The patch removes
> the call to TransactionIdIsKnownCompleted() from
> TransactionIdIsInProgress().

I guess for me "just remove" doesn't include adding a new cache...


> I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
> in backbranches.

I think it'd be fine if we needed to. Or we could just make it always return
false or such.


> > > just removes the known offending call, passes make check, but IMHO
> > > leaves the same error just as likely by other callers.
> >
> > Which other callers are you referring to?
> 
> I don't know of any, but we can't just remove a function in a
> backbranch, can we?

We have in the past, if it's a "sufficiently internal" function.


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: buildfarm warnings
Следующее
От: Andres Freund
Дата:
Сообщение: Re: buildfarm warnings