Обсуждение: [MASSMAIL]Is it safe to cache data by GiST consistent function
Hello, When implementing a GiST consistent function I found the need to cache pre-processed query across invocations. I am not sure if it is safe to do (or I need to perform some steps to make sure cached info is not leaked between rescans). The comment in gistrescan says: /* * If this isn't the first time through, preserve the fn_extra * pointers, so that if the consistentFns are using them to cache * data, that data is not leaked across a rescan. */ which seems to me self-contradictory as fn_extra is preserved between rescans (so leaks are indeed possible). Am I missing something? Thanks, Michal
=?utf-8?Q?Micha=C5=82_K=C5=82eczek?= <michal@kleczek.org> writes: > When implementing a GiST consistent function I found the need to cache pre-processed query across invocations. > I am not sure if it is safe to do (or I need to perform some steps to make sure cached info is not leaked between rescans). AFAIK it works. I don't see any of the in-core ones doing so, but at least range_gist_consistent and multirange_gist_consistent are missing a bet by repeating their cache search every time. > The comment in gistrescan says: > /* > * If this isn't the first time through, preserve the fn_extra > * pointers, so that if the consistentFns are using them to cache > * data, that data is not leaked across a rescan. > */ > which seems to me self-contradictory as fn_extra is preserved between rescans (so leaks are indeed possible). I think you're reading it wrong. If we cleared fn_extra during rescan, access to the old extra value would be lost so a new one would have to be created, leaking the old value for the rest of the query. regards, tom lane
Thanks for taking your time to answer. Not sure if I understand though. > On 3 Apr 2024, at 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > =?utf-8?Q?Micha=C5=82_K=C5=82eczek?= <michal@kleczek.org> writes: >> When implementing a GiST consistent function I found the need to cache pre-processed query across invocations. >> I am not sure if it is safe to do (or I need to perform some steps to make sure cached info is not leaked between rescans). > > AFAIK it works. I don't see any of the in-core ones doing so, > but at least range_gist_consistent and multirange_gist_consistent > are missing a bet by repeating their cache search every time. pg_trgm consistent caches tigrams but it has some logic to make sure cached values are recalculated: cache = (gtrgm_consistent_cache *) fcinfo->flinfo->fn_extra; if (cache == NULL || cache->strategy != strategy || VARSIZE(cache->query) != querysize || memcmp((char *) cache->query, (char *) query, querysize) != 0) What I don’t understand is if it is necessary or it is enough to check fn_extra==NULL. > >> The comment in gistrescan says: > >> /* >> * If this isn't the first time through, preserve the fn_extra >> * pointers, so that if the consistentFns are using them to cache >> * data, that data is not leaked across a rescan. >> */ > >> which seems to me self-contradictory as fn_extra is preserved between rescans (so leaks are indeed possible). > > I think you're reading it wrong. If we cleared fn_extra during > rescan, access to the old extra value would be lost so a new one > would have to be created, leaking the old value for the rest of > the query. I understand that but not sure what “that data is not leaked across a rescan” means. — Michal
=?utf-8?Q?Micha=C5=82_K=C5=82eczek?= <michal@kleczek.org> writes: > On 3 Apr 2024, at 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> AFAIK it works. I don't see any of the in-core ones doing so, >> but at least range_gist_consistent and multirange_gist_consistent >> are missing a bet by repeating their cache search every time. > pg_trgm consistent caches tigrams but it has some logic to make sure cached values are recalculated: > cache = (gtrgm_consistent_cache *) fcinfo->flinfo->fn_extra; > if (cache == NULL || > cache->strategy != strategy || > VARSIZE(cache->query) != querysize || > memcmp((char *) cache->query, (char *) query, querysize) != 0) > What I don’t understand is if it is necessary or it is enough to check fn_extra==NULL. Ah, I didn't think to search contrib. Yes, you need to validate the cache entry. In this example, a rescan could insert a new query value. In general, an opclass support function could get called using a pretty long-lived FunctionCallInfo (e.g. one in the index's relcache entry), so it's unwise to assume that cached data is relevant to the current call without checking. regards, tom lane
> On 3 Apr 2024, at 19:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > =?utf-8?Q?Micha=C5=82_K=C5=82eczek?= <michal@kleczek.org> writes: > >> pg_trgm consistent caches tigrams but it has some logic to make sure cached values are recalculated: > >> cache = (gtrgm_consistent_cache *) fcinfo->flinfo->fn_extra; >> if (cache == NULL || >> cache->strategy != strategy || >> VARSIZE(cache->query) != querysize || >> memcmp((char *) cache->query, (char *) query, querysize) != 0) > >> What I don’t understand is if it is necessary or it is enough to check fn_extra==NULL. > > Ah, I didn't think to search contrib. Yes, you need to validate the > cache entry. In this example, a rescan could insert a new query > value. In general, an opclass support function could get called using > a pretty long-lived FunctionCallInfo (e.g. one in the index's relcache > entry), so it's unwise to assume that cached data is relevant to the > current call without checking. This actually sounds scary - looks like there is no way to perform cache clean-up after rescan then? Do you think it might be useful to introduce a way for per-rescan caching (ie. setting up a dedicated memory context in gistrescanand passing it to support functions)? — Michal