Re: In-placre persistance change of a relation

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: In-placre persistance change of a relation
Дата
Msg-id 20240604.160032.391318120410028809.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: In-placre persistance change of a relation  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: In-placre persistance change of a relation
Список pgsql-hackers
Thank you for the comments.

# The most significant feedback I received was that this approach is
# not misdirected..

At Tue, 4 Jun 2024 09:09:12 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Tue, May 28, 2024 at 04:49:45PM -0700, Jelte Fennema-Nio wrote:
> > Two notes after looking at this quickly during the advanced patch
> > feedback session:
> > 
> > 1. I would maybe split 0003 into two separate patches. One to make SET
> > UNLOGGED fast, which seems quite easy to do because no WAL is needed.
> > And then a follow up to make SET LOGGED fast, which does all the
> > XLOG_FPI stuff.
> 
> Yeah, that would make sense.  The LOGGED->UNLOGGED part is
> straight-forward because we only care about the init fork.  The
> UNLOGGED->LOGGED case bugs me, though, a lot.

I indeed agree with that. Will do that in the next version.

> > 2. When wal_level = minitam, still some WAL logging is needed. The
> > pages that were changed since the last still need to be made available
> > for crash recovery.

I don't quite understand this.  It seems that you are reffering to the
LOGGED to UNLOGGED case. UNLOGGED tables are emptied after a crash,
and the newly created INIT fork does that trick. Maybe I'm
misunderstanding something, though.

> More notes from me, as I was part of this session.
> 
> +         * XXXX: Some access methods don't support in-place persistence
> +         * changes. GiST uses page LSNs to figure out whether a block has been
> [...]
> +        if (r->rd_rel->relkind == RELKIND_INDEX &&
> +        /* GiST is excluded */
> +            r->rd_rel->relam != BTREE_AM_OID &&
> +            r->rd_rel->relam != HASH_AM_OID &&
> +            r->rd_rel->relam != GIN_AM_OID &&
> +            r->rd_rel->relam != SPGIST_AM_OID &&
> +            r->rd_rel->relam != BRIN_AM_OID)
> 
> This knowledge should not be encapsulated in the backend code.  The
> index AMs should be able to tell, instead, if they are able to support
> this code path so as any out-of-core index AM can decide things on its
> own.  This ought to be split in its own patch, simple enough as of a
> boolean or a routine telling how this backend path should behave.

Right. I was hesitant to expand the scope before being certain that I
can proceed in this direction without significant objections. Now I
can include that in the next version.

> +            for (fork = 0; fork < INIT_FORKNUM; fork++)
> +            {
> +                if (smgrexists(RelationGetSmgr(r), fork))
> +                    log_newpage_range(r, fork, 0,
> +                                      smgrnblocks(RelationGetSmgr(r), fork),
> +                                      false);
> +            }
> 
> A simple copy of the blocks means that we keep anything bloated in
> them, while a rewrite in ALTER TABLE means that we would start afresh
> by deforming the tuples from the origin before giving them to the
> target, without any bloat.  The compression of the FPWs and the
> removal of the holes in the pages would surely limit the impact, but
> this has not been discussed on this thread, and this is a nice
> property of the existing implementation that would get silently
> removed by this patch set.

Sure. That bloat can be removed beforehand by explicitly running
VACUUM on the table if needed, but it would be ideal if the same
compression occurred automatically. Alternatively, it might be an
option to fall back to the existing path when the target table is
found to have excessive bloat (though I'm not sure how much should be
considered excessive). We could also allow users to decide by adding a
command option.

> Another point that Nathan has made is that it may be more appealling
> to study how this is better than an integration with the multi-INSERT
> APIs into AMs, so as it is possible to group the inserts in batches
> rather than process them one-at-a-time, see [1].  I am ready to accept
> that what this patch does is more efficient as long as everything is
> block-based in some cases.  Still there is a risk-vs-gain argument
> here, and I am not sure whether what we have here is a good tradeoff
> compared to the potential risk of breaking things.  The amount of new
> infrastructure is large for this code path.  Grouping the inserts in
> large batches may finish by being more efficient than a WAL stream
> full of FPWs, as well, even if toast values are deformed?  So perhaps
> there is an argument for making that optional at query level, instead.

I agree about the uncertainties. With the switching feature mentioned
above, it might be sufficient to use the multi-insert stuff in the
existing path. However, the uncertainties regarding performance would
still remain.

> As a hole, I can say that grouping the INSERTs will be always more
> efficient, while what we have here can be less efficient in some
> cases.  I'm OK to be outvoted, but the level of complications created
> by this block-based copy and WAL-logging concerns me when it comes to
> tweaking the relpersistence like that.

Of course, it is a promising option to move away from the
block-logging and fall back to the existing path using the
multi-insert stuff in the UNLOGGED to LOGGED case. Let me consider
that point.


Besides the above, even though this discussion might become
unnecessary, there was a concern that the blockwise logging might
result in unexpected outcomes due to unflushed buffer data. (although
I could be mistaken). I believe that is not the case because all
buffer blocks are flushed out beforehand.


> [1]: https://commitfest.postgresql.org/48/4777/

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: jian he
Дата:
Сообщение: Re: ON ERROR in json_query and the like
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: meson: Specify -Wformat as a common warning flag for extensions