Обсуждение: Problem with tupdesc in jsonb_to_recordset

Поиск
Список
Период
Сортировка

Problem with tupdesc in jsonb_to_recordset

От
Dmitry Dolgov
Дата:
Hi,

I've found out that currently in some situations jsonb_to_recordset can lead to
a crash. Minimal example that I've managed to create looks like this:

    CREATE OR REPLACE FUNCTION test(data JSONB)
      RETURNS INTEGER AS $$
    DECLARE
      test_var int;
    BEGIN
      WITH jsonb_values AS (
          SELECT
            (SELECT SUM(value)
             FROM jsonb_to_recordset(element #> '{values}')
             AS q(value INTEGER)) AS value_sum
          FROM jsonb_array_elements(data) AS element
      ) SELECT SUM(value_sum) FROM jsonb_values INTO test_var;
      RETURN test_var;
    END;
    $$ LANGUAGE plpgsql;


And then:

    =# SELECT test('[
        {
            "values": [
                {
                    "value": 1
                },
                {
                    "value": 3
                }
            ]
        },
        {
            "values": [
                {
                    "value": 1
                },
                {
                    "value": 3
                }
            ]
        }
    ]' :: JSONB);
    server closed the connection unexpectedly
            This probably means the server terminated abnormally
            before or while processing the request.
    The connection to the server was lost. Attempting reset: Failed.

After brief investigation it looks like an issue with tupdesc from the function
cache:

    if (!cache)
    {
        fcinfo->flinfo->fn_extra = cache =
            MemoryContextAllocZero(fcinfo->flinfo->fn_mcxt, sizeof(*cache));
    //...

    rsi->setDesc = cache->c.io.composite.tupdesc;

Then after the first call of populate_recordset_worker rsi->setDesc is being
reset since we never increased tdrefcount and the next call will use wrong
cache data. Apparently, it can be fixed by incrementing a tdrefcount (see the
attached patch).

Вложения

Re: Problem with tupdesc in jsonb_to_recordset

От
Michael Paquier
Дата:
On Tue, Jul 10, 2018 at 10:39:28PM +0200, Dmitry Dolgov wrote:
> I've found out that currently in some situations jsonb_to_recordset can lead to
> a crash. Minimal example that I've managed to create looks like this:

It would be better not to send the same email across multiple mailing
lists.

> After brief investigation it looks like an issue with tupdesc from the function
> cache:
>
>     if (!cache)
>     {
>         fcinfo->flinfo->fn_extra = cache =
>             MemoryContextAllocZero(fcinfo->flinfo->fn_mcxt, sizeof(*cache));
>     //...
>
>     rsi->setDesc = cache->c.io.composite.tupdesc;
>
> Then after the first call of populate_recordset_worker rsi->setDesc is being
> reset since we never increased tdrefcount and the next call will use wrong
> cache data. Apparently, it can be fixed by incrementing a tdrefcount (see the
> attached patch).

This is new as of REL_11_STABLE, so I am adding an open item.  Bisecting
the error, I have the following commit pointed out as the culprit:
commit: 37a795a60b4f4b1def11c615525ec5e0e9449e05
committer: Tom Lane <tgl@sss.pgh.pa.us>
date: Thu, 26 Oct 2017 13:47:45 -0400
Support domains over composite types.

> diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
> index e358b5a..9250646 100644
> --- a/src/backend/utils/adt/jsonfuncs.c
> +++ b/src/backend/utils/adt/jsonfuncs.c
> @@ -3728,6 +3728,7 @@ populate_recordset_worker(FunctionCallInfo fcinfo, const char *funcname,
>
>      rsi->setResult = state->tuple_store;
>      rsi->setDesc = cache->c.io.composite.tupdesc;
> +    rsi->setDesc->tdrefcount = 1;
>
>      PG_RETURN_NULL();
>  }

I don't think that your solution is correct.  From my read of 37a795a6,
the tuple descriptor is moved from the query-lifespan memory context
(ecxt_per_query_memory) to a function-level context, which could cause
the descriptor to become busted as your test is pointing out.  Tom?
--
Michael

Вложения

Re: Problem with tupdesc in jsonb_to_recordset

От
Michael Paquier
Дата:
On Wed, Jul 11, 2018 at 03:27:13PM +0900, Michael Paquier wrote:
> I don't think that your solution is correct.  From my read of 37a795a6,
> the tuple descriptor is moved from the query-lifespan memory context
> (ecxt_per_query_memory) to a function-level context, which could cause
> the descriptor to become busted as your test is pointing out.  Tom?

Hacking my way out I am finishing with the attached, which fixes the
problem and passes all regression tests.  I am not completely sure that
this the right approach though, I would need to think a bit more about
it.
--
Michael

Вложения

Re: Problem with tupdesc in jsonb_to_recordset

От
Dmitry Dolgov
Дата:
> On Wed, 11 Jul 2018 at 08:27, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jul 10, 2018 at 10:39:28PM +0200, Dmitry Dolgov wrote:
> > I've found out that currently in some situations jsonb_to_recordset can lead to
> > a crash. Minimal example that I've managed to create looks like this:
>
> It would be better not to send the same email across multiple mailing
> lists.

Ok, sorry - next time I'll avoid this.

> On Wed, Jul 11, 2018 at 03:27:13PM +0900, Michael Paquier wrote:
> > I don't think that your solution is correct.  From my read of 37a795a6,
> > the tuple descriptor is moved from the query-lifespan memory context
> > (ecxt_per_query_memory) to a function-level context, which could cause
> > the descriptor to become busted as your test is pointing out.  Tom?
>
> Hacking my way out I am finishing with the attached, which fixes the
> problem and passes all regression tests.  I am not completely sure that
> this the right approach though, I would need to think a bit more about
> it.

So, you removed FreeTupleDesc call - does it mean, that if io->tupdesc will not
be reset in some other situations it's going to be a memory leak?


Re: Problem with tupdesc in jsonb_to_recordset

От
Andrew Gierth
Дата:
>>>>> "Michael" == Michael Paquier <michael@paquier.xyz> writes:

 >> I don't think that your solution is correct. From my read of
 >> 37a795a6, the tuple descriptor is moved from the query-lifespan
 >> memory context (ecxt_per_query_memory) to a function-level context,
 >> which could cause the descriptor to become busted as your test is
 >> pointing out. Tom?

 Michael> Hacking my way out I am finishing with the attached, which
 Michael> fixes the problem and passes all regression tests. I am not
 Michael> completely sure that this the right approach though, I would
 Michael> need to think a bit more about it.

What's happening in the original case is this: the SRF call protocol
says that it's the executor's responsibility to free rsi.setDesc if it's
not refcounted, under the assumption that in such a case it's in
per-query memory (and not in either a shorter-lived or longer-lived
context).

The change to update_cached_tupdesc broke the protocol by causing
populate_recordset_worker to set rsi.setDesc to a non-refcounted tupdesc
allocated in a context that's not the per-query context.

What's not clear here is why populate_recordset_record thinks it needs
to update the tupdesc for every record in a single result set. The
entire result set will ultimately be treated as having the same tupdesc
regardless, so any per-record changes will break things later anyway.

Your latest proposed fix isn't OK because it puts the wrong context in
cache->fn_mcxt - data that's dangled off flinfo->fn_extra must NOT be in
any context that has a different lifetime than flinfo->fn_mcxt (which
might not be query lifetime), unless you're taking specific steps to
free or invalidate it at the correct point.

My first approach - assuming that update_cached_tupdesc has good reason
to make a copy, which I'm not convinced is the case - would be to simply
make a per-query-context copy of the tupdesc to assign to rsi.setDesc in
order to conform to the call protocol.

-- 
Andrew (irc:RhodiumToad)


Re: Problem with tupdesc in jsonb_to_recordset

От
Michael Paquier
Дата:
On Wed, Jul 11, 2018 at 11:38:46AM +0100, Andrew Gierth wrote:
> My first approach - assuming that update_cached_tupdesc has good reason
> to make a copy, which I'm not convinced is the case - would be to simply
> make a per-query-context copy of the tupdesc to assign to rsi.setDesc in
> order to conform to the call protocol.

I see what you are coming at here, thanks for the input.  I am not
really convinced that update_cached_tupdesc needs to do a new copy
either, but let's see what Tom has to say on the matter.  That's his
feature and his code.
--
Michael

Вложения

Re: Problem with tupdesc in jsonb_to_recordset

От
Tom Lane
Дата:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> What's happening in the original case is this: the SRF call protocol
> says that it's the executor's responsibility to free rsi.setDesc if it's
> not refcounted, under the assumption that in such a case it's in
> per-query memory (and not in either a shorter-lived or longer-lived
> context).
> The change to update_cached_tupdesc broke the protocol by causing
> populate_recordset_worker to set rsi.setDesc to a non-refcounted tupdesc
> allocated in a context that's not the per-query context.

Right, and more to the point, a non-refcounted tupdesc that the function
will use again if it's called again.

> What's not clear here is why populate_recordset_record thinks it needs
> to update the tupdesc for every record in a single result set. The
> entire result set will ultimately be treated as having the same tupdesc
> regardless, so any per-record changes will break things later anyway.

It's just convenient to check it there; it's not really expecting the
tupdesc to change intra-query.

The core of the problem here is that populate_record() is recursive,
in cases with nested composite types.  So we might be dealing either
with the outer composite type that is also going to be the returned
tuple type, or with some inner composite type that won't be.  It's
the desire to avoid repeated tupdesc lookups for the latter case that
motivates wanting to keep a tupdesc inside the fn_extra cache structure.
Otherwise we could do one lookup_rowtype_tupdesc per SRF call cycle and
pass back the result as rsi.setDesc, without caching it.

> My first approach - assuming that update_cached_tupdesc has good reason
> to make a copy, which I'm not convinced is the case - would be to simply
> make a per-query-context copy of the tupdesc to assign to rsi.setDesc in
> order to conform to the call protocol.

The alternative to making a copy would be finding a way to guarantee that
we decrement the refcount of the result of lookup_rowtype_tupdesc at end
of query (or whenever the fn_extra cache is discarded), rather than
immediately.  That seems like a mess.

I agree that making a copy to return as rsi.setDesc is the simplest and
safest fix.  If somebody produces a test case showing that that adds
unacceptable overhead, we could think about ways to improve it later;
but it's going to be more complicated and probably more fragile.

            regards, tom lane