Обсуждение: [MASSMAIL]Is it safe to cache data by GiST consistent function

Поиск
Список
Период
Сортировка

[MASSMAIL]Is it safe to cache data by GiST consistent function

От
Michał Kłeczek
Дата:
Hello,

When implementing a GiST consistent function I found the need to cache pre-processed query across invocations.
I am not sure if it is safe to do (or I need to perform some steps to make sure cached info is not leaked between
rescans).

The comment in gistrescan says:

        /*
         * If this isn't the first time through, preserve the fn_extra
         * pointers, so that if the consistentFns are using them to cache
         * data, that data is not leaked across a rescan.
         */

which seems to me self-contradictory as fn_extra is preserved between rescans (so leaks are indeed possible).

Am I missing something?

Thanks,
Michal


Re: Is it safe to cache data by GiST consistent function

От
Tom Lane
Дата:
=?utf-8?Q?Micha=C5=82_K=C5=82eczek?= <michal@kleczek.org> writes:
> When implementing a GiST consistent function I found the need to cache pre-processed query across invocations.
> I am not sure if it is safe to do (or I need to perform some steps to make sure cached info is not leaked between
rescans).

AFAIK it works.  I don't see any of the in-core ones doing so,
but at least range_gist_consistent and multirange_gist_consistent
are missing a bet by repeating their cache search every time.

> The comment in gistrescan says:

>         /*
>          * If this isn't the first time through, preserve the fn_extra
>          * pointers, so that if the consistentFns are using them to cache
>          * data, that data is not leaked across a rescan.
>          */

> which seems to me self-contradictory as fn_extra is preserved between rescans (so leaks are indeed possible).

I think you're reading it wrong.  If we cleared fn_extra during
rescan, access to the old extra value would be lost so a new one
would have to be created, leaking the old value for the rest of
the query.

            regards, tom lane



Re: Is it safe to cache data by GiST consistent function

От
Michał Kłeczek
Дата:
Thanks for taking your time to answer. Not sure if I understand though.

> On 3 Apr 2024, at 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> =?utf-8?Q?Micha=C5=82_K=C5=82eczek?= <michal@kleczek.org> writes:
>> When implementing a GiST consistent function I found the need to cache pre-processed query across invocations.
>> I am not sure if it is safe to do (or I need to perform some steps to make sure cached info is not leaked between
rescans).
>
> AFAIK it works.  I don't see any of the in-core ones doing so,
> but at least range_gist_consistent and multirange_gist_consistent
> are missing a bet by repeating their cache search every time.

pg_trgm consistent caches tigrams but it has some logic to make sure cached values are recalculated:

    cache = (gtrgm_consistent_cache *) fcinfo->flinfo->fn_extra;
    if (cache == NULL ||
        cache->strategy != strategy ||
        VARSIZE(cache->query) != querysize ||
        memcmp((char *) cache->query, (char *) query, querysize) != 0)

What I don’t understand is if it is necessary or it is enough to check fn_extra==NULL.

>
>> The comment in gistrescan says:
>
>>         /*
>>          * If this isn't the first time through, preserve the fn_extra
>>          * pointers, so that if the consistentFns are using them to cache
>>          * data, that data is not leaked across a rescan.
>>          */
>
>> which seems to me self-contradictory as fn_extra is preserved between rescans (so leaks are indeed possible).
>
> I think you're reading it wrong.  If we cleared fn_extra during
> rescan, access to the old extra value would be lost so a new one
> would have to be created, leaking the old value for the rest of
> the query.

I understand that but not sure what “that data is not leaked across a rescan” means.

—
Michal


Re: Is it safe to cache data by GiST consistent function

От
Tom Lane
Дата:
=?utf-8?Q?Micha=C5=82_K=C5=82eczek?= <michal@kleczek.org> writes:
> On 3 Apr 2024, at 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> AFAIK it works.  I don't see any of the in-core ones doing so,
>> but at least range_gist_consistent and multirange_gist_consistent
>> are missing a bet by repeating their cache search every time.

> pg_trgm consistent caches tigrams but it has some logic to make sure cached values are recalculated:

>     cache = (gtrgm_consistent_cache *) fcinfo->flinfo->fn_extra;
>     if (cache == NULL ||
>         cache->strategy != strategy ||
>         VARSIZE(cache->query) != querysize ||
>         memcmp((char *) cache->query, (char *) query, querysize) != 0)

> What I don’t understand is if it is necessary or it is enough to check fn_extra==NULL.

Ah, I didn't think to search contrib.  Yes, you need to validate the
cache entry.  In this example, a rescan could insert a new query
value.  In general, an opclass support function could get called using
a pretty long-lived FunctionCallInfo (e.g. one in the index's relcache
entry), so it's unwise to assume that cached data is relevant to the
current call without checking.

            regards, tom lane



Re: Is it safe to cache data by GiST consistent function

От
Michał Kłeczek
Дата:
> On 3 Apr 2024, at 19:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> =?utf-8?Q?Micha=C5=82_K=C5=82eczek?= <michal@kleczek.org> writes:
>
>> pg_trgm consistent caches tigrams but it has some logic to make sure cached values are recalculated:
>
>>     cache = (gtrgm_consistent_cache *) fcinfo->flinfo->fn_extra;
>>     if (cache == NULL ||
>>         cache->strategy != strategy ||
>>         VARSIZE(cache->query) != querysize ||
>>         memcmp((char *) cache->query, (char *) query, querysize) != 0)
>
>> What I don’t understand is if it is necessary or it is enough to check fn_extra==NULL.
>
> Ah, I didn't think to search contrib.  Yes, you need to validate the
> cache entry.  In this example, a rescan could insert a new query
> value.  In general, an opclass support function could get called using
> a pretty long-lived FunctionCallInfo (e.g. one in the index's relcache
> entry), so it's unwise to assume that cached data is relevant to the
> current call without checking.

This actually sounds scary - looks like there is no way to perform cache clean-up after rescan then?

Do you think it might be useful to introduce a way for per-rescan caching (ie. setting up a dedicated memory context in
gistrescanand passing it to support functions)? 

—
Michal