Обсуждение: pgstatindex vs. !indisready

Поиск
Список
Период
Сортировка

pgstatindex vs. !indisready

От
Noah Misch
Дата:
Running pgstatindex on some !indisready indexes fails with "ERROR:  XX001:
could not read block 0 in file".  This reproduces it:

begin;
drop table if exists not_indisready_t;
create table not_indisready_t (c int);
insert into not_indisready_t values (1),(1);
commit;
create unique index concurrently not_indisready_i on not_indisready_t(c);
begin;
create extension pgstattuple;
\set VERBOSITY verbose
select * from pgstatindex('not_indisready_i');
\set VERBOSITY default
rollback;

Since XX001 = ERRCODE_DATA_CORRUPTED appears in the "can't-happen" class, it's
not a good fit for this scenario.  I propose to have pgstatindex fail early on
!indisready indexes.  We could go a step further and also fail on
indisready&&!indisvalid indexes, which are complete enough to accept inserts
but not complete enough to answer queries.  I don't see a reason to do that,
but maybe someone else will.

This made me wonder about functions handling unlogged rels during recovery.  I
used the attached hack to test most regclass-arg functions.  While some of
these errors from src/test/recovery/tmp_check/log/001_stream_rep_standby_2.log
may deserve improvement, there were no class-XX messages:

2023-10-01 12:24:05.992 PDT [646914:11] 001_stream_rep.pl ERROR:  58P01: could not open file "base/5/16862": No such
fileor directory
 
2023-10-01 12:24:05.996 PDT [646914:118] 001_stream_rep.pl ERROR:  22023: fork "main" does not exist for this relation

Thanks,
nm

Вложения

Re: pgstatindex vs. !indisready

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> Running pgstatindex on some !indisready indexes fails with "ERROR:  XX001:
> could not read block 0 in file".  This reproduces it:
> ...
> Since XX001 = ERRCODE_DATA_CORRUPTED appears in the "can't-happen" class, it's
> not a good fit for this scenario.  I propose to have pgstatindex fail early on
> !indisready indexes.

+1

> We could go a step further and also fail on
> indisready&&!indisvalid indexes, which are complete enough to accept inserts
> but not complete enough to answer queries.  I don't see a reason to do that,
> but maybe someone else will.

Hmm.  Seems like the numbers pgstatindex would produce for a
not-yet-complete index would be rather irrelevant, even if the case
doesn't risk any outright problems.  I'd be inclined to be
conservative and insist on indisvalid being true too.

            regards, tom lane



Re: pgstatindex vs. !indisready

От
Noah Misch
Дата:
On Sun, Oct 01, 2023 at 04:37:25PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > Running pgstatindex on some !indisready indexes fails with "ERROR:  XX001:
> > could not read block 0 in file".  This reproduces it:
> > ...
> > Since XX001 = ERRCODE_DATA_CORRUPTED appears in the "can't-happen" class, it's
> > not a good fit for this scenario.  I propose to have pgstatindex fail early on
> > !indisready indexes.
> 
> +1
> 
> > We could go a step further and also fail on
> > indisready&&!indisvalid indexes, which are complete enough to accept inserts
> > but not complete enough to answer queries.  I don't see a reason to do that,
> > but maybe someone else will.
> 
> Hmm.  Seems like the numbers pgstatindex would produce for a
> not-yet-complete index would be rather irrelevant, even if the case
> doesn't risk any outright problems.  I'd be inclined to be
> conservative and insist on indisvalid being true too.

Okay.  If no other preferences appear, I'll do pgstatindex that way.

> > This made me wonder about functions handling unlogged rels during recovery.  I
> > used the attached hack to test most regclass-arg functions.

I forgot to test the same battery of functions on !indisready indexes.  I've
now done that, using the attached script.  While I didn't get additional
class-XX errors, more should change:

[pgstatginindex pgstathashindex pgstattuple] currently succeed.  Like
pgstatindex, they should ERROR, including in the back branches.

[brin_desummarize_range brin_summarize_new_values brin_summarize_range
gin_clean_pending_list] currently succeed.  I propose to make them emit a
DEBUG1 message and return early, like amcheck does, except on !indisready.
This would allow users to continue running them on all indexes of the
applicable access method.  Doing these operations on an
indisready&&!indisvalid index is entirely reasonable, since they relate to
INSERT/UPDATE/DELETE operations.

[pg_freespace pg_indexes_size pg_prewarm] currently succeed, and I propose to
leave them that way.  No undefined behavior arises.  pg_freespace needs to be
resilient to garbage data anyway, given the lack of WAL for the FSM.  One
could argue for a relkind check in pg_indexes_size.  One could argue for
treating pg_prewarm like amcheck (DEBUG1 and return).

Вложения

Re: pgstatindex vs. !indisready

От
Peter Geoghegan
Дата:
On Sun, Oct 1, 2023 at 2:00 PM Noah Misch <noah@leadboat.com> wrote:
> Okay.  If no other preferences appear, I'll do pgstatindex that way.

Sounds reasonable.

> [pgstatginindex pgstathashindex pgstattuple] currently succeed.  Like
> pgstatindex, they should ERROR, including in the back branches.

Makes sense.

> [brin_desummarize_range brin_summarize_new_values brin_summarize_range
> gin_clean_pending_list] currently succeed.  I propose to make them emit a
> DEBUG1 message and return early, like amcheck does, except on !indisready.
> This would allow users to continue running them on all indexes of the
> applicable access method.  Doing these operations on an
> indisready&&!indisvalid index is entirely reasonable, since they relate to
> INSERT/UPDATE/DELETE operations.

+1 to all that (including the part about these operations being a
little different to the amcheck functions in one particular respect).

FWIW, amcheck's handling of unlogged relations when in hot standby
mode is based on the following theory: if recovery were to end, the
index would become an empty index, so just treat it as if it was
already empty during recovery. Not everybody thought that this
behavior was ideal, but ISTM that it has fewer problems than any
alternative approach you can think of. The same argument works just as
well with any function that accepts a regclass argument IMV.

--
Peter Geoghegan



Re: pgstatindex vs. !indisready

От
Michael Paquier
Дата:
On Sun, Oct 01, 2023 at 04:20:42PM -0700, Peter Geoghegan wrote:
> On Sun, Oct 1, 2023 at 2:00 PM Noah Misch <noah@leadboat.com> wrote:
> > Okay.  If no other preferences appear, I'll do pgstatindex that way.
>
> Sounds reasonable.
>
>> [pgstatginindex pgstathashindex pgstattuple] currently succeed.  Like
>> pgstatindex, they should ERROR, including in the back branches.
>
> Makes sense.

These are already restrictive, makes sense.

>> [brin_desummarize_range brin_summarize_new_values brin_summarize_range
>> gin_clean_pending_list] currently succeed.  I propose to make them emit a
>> DEBUG1 message and return early, like amcheck does, except on !indisready.
>> This would allow users to continue running them on all indexes of the
>> applicable access method.  Doing these operations on an
>> indisready&&!indisvalid index is entirely reasonable, since they relate to
>> INSERT/UPDATE/DELETE operations.

Hmm.  Still slightly incorrect in some cases?  Before being switched
to indisvalid, an index built concurrently may miss some tuples
deleted before the reference snapshot used to build the index was
taken.

> +1 to all that (including the part about these operations being a
> little different to the amcheck functions in one particular respect).

Making them return early sounds sensible here.

> FWIW, amcheck's handling of unlogged relations when in hot standby
> mode is based on the following theory: if recovery were to end, the
> index would become an empty index, so just treat it as if it was
> already empty during recovery. Not everybody thought that this
> behavior was ideal, but ISTM that it has fewer problems than any
> alternative approach you can think of. The same argument works just as
> well with any function that accepts a regclass argument IMV.

It depends, I guess, on how "user-friendly" all that should be.  I
have seen in the past as argument that it may be sometimes better to
have a function do nothing rather than ERROR when these are used
across a scan of pg_class, for example, particularly for non-SRFs.
Even if sometimes errors can be bypassed with more quals on the
relkind or such (aka less complicated queries with less JOINs to
write).
--
Michael

Вложения

Re: pgstatindex vs. !indisready

От
Peter Geoghegan
Дата:
On Sun, Oct 1, 2023 at 5:24 PM Michael Paquier <michael@paquier.xyz> wrote:
> > FWIW, amcheck's handling of unlogged relations when in hot standby
> > mode is based on the following theory: if recovery were to end, the
> > index would become an empty index, so just treat it as if it was
> > already empty during recovery. Not everybody thought that this
> > behavior was ideal, but ISTM that it has fewer problems than any
> > alternative approach you can think of. The same argument works just as
> > well with any function that accepts a regclass argument IMV.
>
> It depends, I guess, on how "user-friendly" all that should be.  I
> have seen in the past as argument that it may be sometimes better to
> have a function do nothing rather than ERROR when these are used
> across a scan of pg_class, for example, particularly for non-SRFs.
> Even if sometimes errors can be bypassed with more quals on the
> relkind or such (aka less complicated queries with less JOINs to
> write).

I think of recovery as a property of the whole system. So throwing an
error about one particular unlogged index that the user (say) checked
during recovery doesn't seem sensible. After all, the answer that
RecoveryInProgress() gives can change in a way that's observable
within individual transactions.

Again, I wouldn't claim that this is very elegant. Just that it seems
to have the fewest problems.

--
Peter Geoghegan



Re: pgstatindex vs. !indisready

От
Noah Misch
Дата:
On Mon, Oct 02, 2023 at 09:24:33AM +0900, Michael Paquier wrote:
> On Sun, Oct 01, 2023 at 04:20:42PM -0700, Peter Geoghegan wrote:
> > On Sun, Oct 1, 2023 at 2:00 PM Noah Misch <noah@leadboat.com> wrote:
> >> [brin_desummarize_range brin_summarize_new_values brin_summarize_range
> >> gin_clean_pending_list] currently succeed.  I propose to make them emit a
> >> DEBUG1 message and return early, like amcheck does, except on !indisready.
> >> This would allow users to continue running them on all indexes of the
> >> applicable access method.  Doing these operations on an
> >> indisready&&!indisvalid index is entirely reasonable, since they relate to
> >> INSERT/UPDATE/DELETE operations.
> 
> Hmm.  Still slightly incorrect in some cases?  Before being switched
> to indisvalid, an index built concurrently may miss some tuples
> deleted before the reference snapshot used to build the index was
> taken.

The !indisvalid index may be missing tuples, yes.  In what way does that make
one of those four operations incorrect?



Re: pgstatindex vs. !indisready

От
Michael Paquier
Дата:
On Sun, Oct 01, 2023 at 06:31:26PM -0700, Noah Misch wrote:
> The !indisvalid index may be missing tuples, yes.  In what way does that make
> one of those four operations incorrect?

Reminding myself of what these four do, it looks that you're right and
that the state of indisvalid is not going to change what they report.

Still, I'd like to agree with Tom's point to be more conservative and
check also for indisvalid which is what the planner does.  These
functions will be used in SELECTs, and one thing that worries me is
that checks based on indisready may get copy-pasted somewhere else,
leading to incorrect results where they get copied.  (indisready &&
!indisvalid) is a "short"-term combination in a concurrent build
process, as well (depends on how long one waits for the old snapshots
before switching indisvalid, but that's way shorter than the period of
time where the built indexes remain valid).
--
Michael

Вложения

Re: pgstatindex vs. !indisready

От
Noah Misch
Дата:
On Wed, Oct 04, 2023 at 09:00:23AM +0900, Michael Paquier wrote:
> On Sun, Oct 01, 2023 at 06:31:26PM -0700, Noah Misch wrote:
> > The !indisvalid index may be missing tuples, yes.  In what way does that make
> > one of those four operations incorrect?
> 
> Reminding myself of what these four do, it looks that you're right and
> that the state of indisvalid is not going to change what they report.
> 
> Still, I'd like to agree with Tom's point to be more conservative and
> check also for indisvalid which is what the planner does.  These
> functions will be used in SELECTs, and one thing that worries me is
> that checks based on indisready may get copy-pasted somewhere else,
> leading to incorrect results where they get copied.  (indisready &&
> !indisvalid) is a "short"-term combination in a concurrent build
> process, as well (depends on how long one waits for the old snapshots
> before switching indisvalid, but that's way shorter than the period of
> time where the built indexes remain valid).

Neither choice would harm the user experience in an important way, so I've
followed your preference in the attached patch.

Вложения

Re: pgstatindex vs. !indisready

От
Michael Paquier
Дата:
On Sun, Oct 22, 2023 at 02:02:45PM -0700, Noah Misch wrote:
> -    /* OK, do it */
> -    brinsummarize(indexRel, heapRel, heapBlk, true, &numSummarized, NULL);
> +    /* see gin_clean_pending_list() */
> +    if (indexRel->rd_index->indisvalid)
> +        brinsummarize(indexRel, heapRel, heapBlk, true, &numSummarized, NULL);
> +    else
> +        ereport(DEBUG1,
> +                (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                 errmsg("index \"%s\" is not valid",
> +                        RelationGetRelationName(indexRel))));

brinsummarize() could return 0 even for a valid index, and we would
not be able to make the difference with an invalid index.  Perhaps you
are right and this is not a big deal in practice to do as you are
suggesting.
--
Michael

Вложения