Обсуждение: [HACKERS] error-handling in ReorderBufferCommit() seems somewhat broken

Поиск
Список
Период
Сортировка

[HACKERS] error-handling in ReorderBufferCommit() seems somewhat broken

От
Tomas Vondra
Дата:
Hi,

I've been investigating some failures in test_decoding regression tests,
and it seems to me the error-handling in ReorderBufferCommit() is
somewhat broken, leading to segfault crashes.

The problematic part is this:

PG_CATCH()
{   /*    * Force cache invalidation to happen outside of a valid transaction    * to prevent catalog access as we just
caughtan error.    */   AbortCurrentTransaction();
 
   /* make sure there's no cache pollution */   ReorderBufferExecuteInvalidations(rb, txn);
   ...

}

Triggering it trivial - just add elog(ERROR,...) at the beginning of the
PG_TRY() block.

The problem is that AbortCurrentTransaction() apparently releases the
memory where txn is allocated, making it entirely bogus. So in assert
builds txn->ivalidations are 0x7f7f7f7f7f7f7f7f,  triggering a segfault.

Similar issues apply to subsequent calls in the catch block, that also
use txn in some way (e.g. through snapshot_now).

I suppose this is not quite intentional, but rather an example that
error-handling code is an order of magnitude more complicated to write
and test. I've only noticed as I'm investigating some regression
failures on Postgres-XL 10, which does not support subtransactions and
so the BeginInternalSubTransaction() call in the try branch always
fails, triggering the issue.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] error-handling in ReorderBufferCommit() seems somewhatbroken

От
Andres Freund
Дата:
Hi,

On 2017-08-30 01:27:34 +0200, Tomas Vondra wrote:
> I've been investigating some failures in test_decoding regression tests,
> and it seems to me the error-handling in ReorderBufferCommit() is
> somewhat broken, leading to segfault crashes.
> 
> The problematic part is this:
> 
> PG_CATCH()
> {
>     /*
>      * Force cache invalidation to happen outside of a valid transaction
>      * to prevent catalog access as we just caught an error.
>      */
>     AbortCurrentTransaction();
> 
>     /* make sure there's no cache pollution */
>     ReorderBufferExecuteInvalidations(rb, txn);
> 
>     ...
> 
> }
> 
> Triggering it trivial - just add elog(ERROR,...) at the beginning of the
> PG_TRY() block.

That's not really a valid thing to do, you should put it after the
BeginInternalSubTransaction()/StartTransactionCommand(). It's basically
assumed that those won't fail - arguably they should be outside of a
PG_TRY then, but that's a different matter.  If you start decoding
outside of SQL failing before those will lead to rolling back the parent
tx...


> I suppose this is not quite intentional, but rather an example that
> error-handling code is an order of magnitude more complicated to write
> and test. I've only noticed as I'm investigating some regression
> failures on Postgres-XL 10, which does not support subtransactions and
> so the BeginInternalSubTransaction() call in the try branch always
> fails, triggering the issue.

So, IIUC, there's no live problem in postgres core, besides some ugly &
undocumented assumptions?

Greetings,

Andres Freund



Re: [HACKERS] error-handling in ReorderBufferCommit() seems somewhatbroken

От
Tomas Vondra
Дата:

On 08/30/2017 01:34 AM, Andres Freund wrote:
> Hi,
> 
> On 2017-08-30 01:27:34 +0200, Tomas Vondra wrote:
>> I've been investigating some failures in test_decoding regression tests,
>> and it seems to me the error-handling in ReorderBufferCommit() is
>> somewhat broken, leading to segfault crashes.
>>
>> The problematic part is this:
>>
>> PG_CATCH()
>> {
>>     /*
>>      * Force cache invalidation to happen outside of a valid transaction
>>      * to prevent catalog access as we just caught an error.
>>      */
>>     AbortCurrentTransaction();
>>
>>     /* make sure there's no cache pollution */
>>     ReorderBufferExecuteInvalidations(rb, txn);
>>
>>     ...
>>
>> }
>>
>> Triggering it trivial - just add elog(ERROR,...) at the beginning of the
>> PG_TRY() block.
> 
> That's not really a valid thing to do, you should put it after the
> BeginInternalSubTransaction()/StartTransactionCommand(). It's basically
> assumed that those won't fail - arguably they should be outside of a
> PG_TRY then, but that's a different matter.  If you start decoding
> outside of SQL failing before those will lead to rolling back the parent
> tx...
> 
> 
>> I suppose this is not quite intentional, but rather an example that
>> error-handling code is an order of magnitude more complicated to write
>> and test. I've only noticed as I'm investigating some regression
>> failures on Postgres-XL 10, which does not support subtransactions and
>> so the BeginInternalSubTransaction() call in the try branch always
>> fails, triggering the issue.
> 
> So, IIUC, there's no live problem in postgres core, besides some ugly &
> undocumented assumptions?
> 

I'm not really following your reasoning. You may very well be right that
the BeginInternalSubTransaction() example is not quite valid on postgres
core, but I don't see how that implies there can be no other errors in
the PG_TRY block. It was merely an explanation how I noticed this issue.

To make it absolutely clear, I claim that the PG_CATCH() block is
demonstrably broken as it calls AbortCurrentTransaction() and then
accesses already freed memory.

The switch in the PG_TRY() blocks includes multiple elog(ERROR) calls in
the switch, and AFAICS hitting any of them will have exactly the same
effect as failure in BeginInternalSubTransaction(). And I suppose there
many other ways to trigger an error and get into the catch block. In
other words, why would we have the block at all?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] error-handling in ReorderBufferCommit() seems somewhatbroken

От
Andres Freund
Дата:
On 2017-08-30 02:11:10 +0200, Tomas Vondra wrote:
> 
> 
> On 08/30/2017 01:34 AM, Andres Freund wrote:
> > Hi,
> > 
> > On 2017-08-30 01:27:34 +0200, Tomas Vondra wrote:
> >> I've been investigating some failures in test_decoding regression tests,
> >> and it seems to me the error-handling in ReorderBufferCommit() is
> >> somewhat broken, leading to segfault crashes.
> >>
> >> The problematic part is this:
> >>
> >> PG_CATCH()
> >> {
> >>     /*
> >>      * Force cache invalidation to happen outside of a valid transaction
> >>      * to prevent catalog access as we just caught an error.
> >>      */
> >>     AbortCurrentTransaction();
> >>
> >>     /* make sure there's no cache pollution */
> >>     ReorderBufferExecuteInvalidations(rb, txn);
> >>
> >>     ...
> >>
> >> }
> >>
> >> Triggering it trivial - just add elog(ERROR,...) at the beginning of the
> >> PG_TRY() block.
> > 
> > That's not really a valid thing to do, you should put it after the
> > BeginInternalSubTransaction()/StartTransactionCommand(). It's basically
> > assumed that those won't fail - arguably they should be outside of a
> > PG_TRY then, but that's a different matter.  If you start decoding
> > outside of SQL failing before those will lead to rolling back the parent
> > tx...
> > 
> > 
> >> I suppose this is not quite intentional, but rather an example that
> >> error-handling code is an order of magnitude more complicated to write
> >> and test. I've only noticed as I'm investigating some regression
> >> failures on Postgres-XL 10, which does not support subtransactions and
> >> so the BeginInternalSubTransaction() call in the try branch always
> >> fails, triggering the issue.
> > 
> > So, IIUC, there's no live problem in postgres core, besides some ugly &
> > undocumented assumptions?
> > 
> 
> I'm not really following your reasoning. You may very well be right that
> the BeginInternalSubTransaction() example is not quite valid on postgres
> core, but I don't see how that implies there can be no other errors in
> the PG_TRY block. It was merely an explanation how I noticed this issue.
> 
> To make it absolutely clear, I claim that the PG_CATCH() block is
> demonstrably broken as it calls AbortCurrentTransaction() and then
> accesses already freed memory.

Yea, but that's not what happens normally. The txn memory isn't
allocated in the context created by the started [sub]transaction. I
think you're just seeing what you're seeing because an error thrown
before the BeginInternalSubTransaction() succeeds, will roll back the
*containing* transaction (the one that did the SQL SELECT * FROM
pg_*logical*() call), and free all the entire reorderbuffer stuff.


> The switch in the PG_TRY() blocks includes multiple elog(ERROR) calls in
> the switch, and AFAICS hitting any of them will have exactly the same
> effect as failure in BeginInternalSubTransaction().

No, absolutely not. Once the subtransaction starts, the
AbortCurrentTransaction() will abort that subtransaction, rather than
the containing one.

- Andres



Re: [HACKERS] error-handling in ReorderBufferCommit() seems somewhatbroken

От
Tomas Vondra
Дата:

On 08/30/2017 02:19 AM, Andres Freund wrote:
> On 2017-08-30 02:11:10 +0200, Tomas Vondra wrote:
>>
>> I'm not really following your reasoning. You may very well be right that
>> the BeginInternalSubTransaction() example is not quite valid on postgres
>> core, but I don't see how that implies there can be no other errors in
>> the PG_TRY block. It was merely an explanation how I noticed this issue.
>>
>> To make it absolutely clear, I claim that the PG_CATCH() block is
>> demonstrably broken as it calls AbortCurrentTransaction() and then
>> accesses already freed memory.
> 
> Yea, but that's not what happens normally. The txn memory isn't
> allocated in the context created by the started [sub]transaction. I
> think you're just seeing what you're seeing because an error thrown
> before the BeginInternalSubTransaction() succeeds, will roll back the
> *containing* transaction (the one that did the SQL SELECT * FROM
> pg_*logical*() call), and free all the entire reorderbuffer stuff.
> 
> 
>> The switch in the PG_TRY() blocks includes multiple elog(ERROR) calls in
>> the switch, and AFAICS hitting any of them will have exactly the same
>> effect as failure in BeginInternalSubTransaction().
> 
> No, absolutely not. Once the subtransaction starts, the
> AbortCurrentTransaction() will abort that subtransaction, rather than
> the containing one.
> 

Ah, I see. You're right elog(ERROR) calls after the subtransaction start
are handled correctly and don't trigger a segfault. Sorry for the noise
and thanks for the explanation.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services