Обсуждение: create_index test fails when synchronous_commit = off @ master

Поиск
Список
Период
Сортировка

create_index test fails when synchronous_commit = off @ master

От
Aleksander Alekseev
Дата:
Hi hackers,

I noticed that create_index test (make installcheck) fails on my laptop because different query plans are chosen:

-                      QUERY PLAN
--------------------------------------------------------
- Index Only Scan using tenk1_unique1 on tenk1
-   Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
-(2 rows)
+                            QUERY PLAN
+-------------------------------------------------------------------
+ Sort
+   Sort Key: unique1
+   ->  Bitmap Heap Scan on tenk1
+         Recheck Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+         ->  Bitmap Index Scan on tenk1_unique1
+               Index Cond: (unique1 = ANY ('{1,42,7}'::integer[]))
+(6 rows)

...

-                      QUERY PLAN
--------------------------------------------------------
- Index Only Scan using tenk1_thous_tenthous on tenk1
-   Index Cond: (thousand < 2)
-   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
-(3 rows)
+                                      QUERY PLAN
+--------------------------------------------------------------------------------------
+ Sort
+   Sort Key: thousand
+   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
+         Index Cond: ((thousand < 2) AND (tenthous = ANY ('{1001,3000}'::integer[])))
+(4 rows)

Investigation showed that it happens only with `synchronous_commit = off`. Only the master branch is affected, starting from cc50080a82. This is a debug build. I'm running MacOS Monterey 12.2.1 @ x64.

I didn't investigate further. Do we assume that `make installcheck` suppose to pass with a different postgresql.conf options?

--
Best regards,
Aleksander Alekseev

Re: create_index test fails when synchronous_commit = off @ master

От
Andres Freund
Дата:
Hi,

On 2022-02-24 16:47:25 +0300, Aleksander Alekseev wrote:
> -                      QUERY PLAN
> --------------------------------------------------------
> - Index Only Scan using tenk1_thous_tenthous on tenk1
> -   Index Cond: (thousand < 2)
> -   Filter: (tenthous = ANY ('{1001,3000}'::integer[]))
> -(3 rows)
> +                                      QUERY PLAN
> +--------------------------------------------------------------------------------------
> + Sort
> +   Sort Key: thousand
> +   ->  Index Only Scan using tenk1_thous_tenthous on tenk1
> +         Index Cond: ((thousand < 2) AND (tenthous = ANY
> ('{1001,3000}'::integer[])))
> +(4 rows)

Heh. We've been having a lot of fights with exactly this plan change in the
AIO branch, before cc50080a82, and without synchronous_commit =
off. Interestingly near-exclusively with the regression run within
pg_upgrade's tests.

For aio we (David did a lot of that IIRC) finally hunted it down to be due
vacuum skipping pages due to inability to get a cleanup lock. If that happens
enough, pg_class.relallvisible changes enough to lead to the different plan.


I first was going to suggest that we should just use VACUUM FREEZE to prevent
the issue.

But in this instance the cause isn't cleanup locks, probably that we can't yet
set hint bits due to synchronous_commit=off? But I don't *fully* understand
how it leads to this.

I added the SELECT relpages, reltuples, relallvisible FROM pg_class WHERE oid = 'tenk1'::regclass;
just after the
VACUUM ANALYZE tenk1;

synchronous_commit=on
+ relpages | reltuples | relallvisible
+----------+-----------+---------------
+      345 |     10000 |           345
+(1 row)

synchronous_commit=off
+ relpages | reltuples | relallvisible
+----------+-----------+---------------
+      345 |     10000 |             0
+(1 row)

So it clearly is the explanation for the issue.


Obviously we can locally work around it by adding a
SET LOCAL synchronous_commit = local;
to the COPY. But I'd like to fully understand what's going on.


> I didn't investigate further. Do we assume that `make installcheck` suppose
> to pass with a different postgresql.conf options?

Depends on the option, I think... There's some where it's interesting to run
tests with different options and where the effort required is reasonable. And
some cases where it's not... synchronous_commit=off worked until recently, and
I think we should keep it working.

Greetings,

Andres Freund



Re: create_index test fails when synchronous_commit = off @ master

От
Andres Freund
Дата:
Hi,

On 2022-02-24 07:33:39 -0800, Andres Freund wrote:
> I added the SELECT relpages, reltuples, relallvisible FROM pg_class WHERE oid = 'tenk1'::regclass;
> just after the
> VACUUM ANALYZE tenk1;
>
> synchronous_commit=on
> + relpages | reltuples | relallvisible
> +----------+-----------+---------------
> +      345 |     10000 |           345
> +(1 row)
>
> synchronous_commit=off
> + relpages | reltuples | relallvisible
> +----------+-----------+---------------
> +      345 |     10000 |             0
> +(1 row)
>
> So it clearly is the explanation for the issue.
>
>
> Obviously we can locally work around it by adding a
> SET LOCAL synchronous_commit = local;
> to the COPY. But I'd like to fully understand what's going on.

It is the hint bit sets delayed by asynchronous commit. I traced execution and
we do end up not setting all visible due to reaching the
!HeapTupleHeaderXminCommitted() path in lazy_scan_prune()

            case HEAPTUPLE_LIVE:
...
                /*
                 * Is the tuple definitely visible to all transactions?
                 *
                 * NB: Like with per-tuple hint bits, we can't set the
                 * PD_ALL_VISIBLE flag if the inserter committed
                 * asynchronously. See SetHintBits for more info. Check that
                 * the tuple is hinted xmin-committed because of that.
                 */
                if (prunestate->all_visible)
                {
                    TransactionId xmin;

                    if (!HeapTupleHeaderXminCommitted(tuple.t_data))


So it might be reasonable to use synchronous_commit=on in test_setup.sql?

It's not super satisfying, but i don't immediately see what else could prevent
all-visible to be set in this case? There's no dead rows, so concurrent
snapshots shouldn't prevent cleanup.

I guess we could alternatively try doing something like flushing pending async
commits at the start of vacuum. But that probably isn't warranted.

Greetings,

Andres Freund



Re: create_index test fails when synchronous_commit = off @ master

От
Aleksander Alekseev
Дата:
Hi Andres,

> So it might be reasonable to use synchronous_commit=on in test_setup.sql?

> It's not super satisfying, but I don't immediately see what else could prevent
> all-visible to be set in this case?

I don't see what else can be done either. Here is the corresponding patch.

-- 
Best regards,
Aleksander Alekseev

Вложения

Re: create_index test fails when synchronous_commit = off @ master

От
Aleksander Alekseev
Дата:
Hi hackers,

> I don't see what else can be done either. Here is the corresponding patch.

Here is an updated patch that includes the commit message.

-- 
Best regards,
Aleksander Alekseev

Вложения

Re: create_index test fails when synchronous_commit = off @ master

От
Aleksander Alekseev
Дата:
Hi hackers,

> Here is an updated patch that includes the commit message.

Since this is a trivial change and no one seems to object to it so far, I'm going to re-check the patch against the recent `master` and change its status to "Ready for Committer" somewhere next week.

--
Best regards,
Aleksander Alekseev

Re: create_index test fails when synchronous_commit = off @ master

От
Andres Freund
Дата:
On 2022-03-11 10:38:22 +0300, Aleksander Alekseev wrote:
> > Here is an updated patch that includes the commit message.
> 
> Since this is a trivial change and no one seems to object to it so far, I'm
> going to re-check the patch against the recent `master` and change its
> status to "Ready for Committer" somewhere next week.

Pushed it now.

- Andres