Обсуждение: GenBKI emits useless open;close for catalogs without rows

Поиск
Список
Период
Сортировка

GenBKI emits useless open;close for catalogs without rows

От
Matthias van de Meent
Дата:
Hi,

Whilst looking at PostgreSQL's bootstrapping process, I noticed that
postgres.bki contains quite a few occurrances of the pattern "open
$catname; close $catname".
I suppose this pattern isn't too expensive, but according to my
limited research a combined open+close cycle doens't do anything
meaningful, so it does waste some CPU cycles in the process.

The attached patch 1 removes the occurances of those combined
open/close statements in postgresql.bki. Locally it passes
check-world, so I assume that opening and closing a table is indeed
not required for initializing a data-less catalog during
bootstrapping.

A potential addition to the patch would to stop manually closing
relations: initdb and check-world succeed without manual 'close'
operations because the 'open' command auto-closes the previous open
relation (in boot_openrel). Testing also suggests that the last opened
relation apparently doesn't need closing - check-world succeeds
without issues (incl. with TAP enabled). That is therefore implemented
in attached patch 2 - it removes the 'close' syntax in its entirety.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Вложения

Re: GenBKI emits useless open;close for catalogs without rows

От
Alvaro Herrera
Дата:
On 2023-Sep-01, Matthias van de Meent wrote:

> A potential addition to the patch would to stop manually closing
> relations: initdb and check-world succeed without manual 'close'
> operations because the 'open' command auto-closes the previous open
> relation (in boot_openrel). Testing also suggests that the last opened
> relation apparently doesn't need closing - check-world succeeds
> without issues (incl. with TAP enabled). That is therefore implemented
> in attached patch 2 - it removes the 'close' syntax in its entirety.

Hmm, what happens with the last relation in the bootstrap process?  Is
closerel() called via some other path for that one?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Essentially, you're proposing Kevlar shoes as a solution for the problem
that you want to walk around carrying a loaded gun aimed at your foot.
(Tom Lane)



Re: GenBKI emits useless open;close for catalogs without rows

От
Matthias van de Meent
Дата:
On Fri, 1 Sept 2023 at 19:43, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2023-Sep-01, Matthias van de Meent wrote:
>
> > A potential addition to the patch would to stop manually closing
> > relations: initdb and check-world succeed without manual 'close'
> > operations because the 'open' command auto-closes the previous open
> > relation (in boot_openrel). Testing also suggests that the last opened
> > relation apparently doesn't need closing - check-world succeeds
> > without issues (incl. with TAP enabled). That is therefore implemented
> > in attached patch 2 - it removes the 'close' syntax in its entirety.
>
> Hmm, what happens with the last relation in the bootstrap process?  Is
> closerel() called via some other path for that one?

There is a final cleanup() call that closes the last open boot_reldesc
relation (if any) at the end of BootstrapModeMain, after boot_yyparse
has completed and its changes have been committed.

- Matthias



Re: GenBKI emits useless open;close for catalogs without rows

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2023-Sep-01, Matthias van de Meent wrote:
>> A potential addition to the patch would to stop manually closing
>> relations: initdb and check-world succeed without manual 'close'
>> operations because the 'open' command auto-closes the previous open
>> relation (in boot_openrel). Testing also suggests that the last opened
>> relation apparently doesn't need closing - check-world succeeds
>> without issues (incl. with TAP enabled). That is therefore implemented
>> in attached patch 2 - it removes the 'close' syntax in its entirety.

> Hmm, what happens with the last relation in the bootstrap process?  Is
> closerel() called via some other path for that one?

Taking a quick census of existing closerel() callers: there is
cleanup() in bootstrap.c, but it's called uncomfortably late
and outside any transaction, so I misdoubt that it works
properly if asked to actually shoulder any responsibility.
(A little code reshuffling could fix that.)
There are also a couple of low-level elog warnings in CREATE
that would likely get triggered, though I suppose we could just
remove those elogs.

I guess my reaction to this patch is "why bother?".  It seems
unlikely to yield any measurable benefit, though of course
that guess could be wrong.

            regards, tom lane



Re: GenBKI emits useless open;close for catalogs without rows

От
Matthias van de Meent
Дата:
On Fri, 1 Sept 2023 at 19:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > On 2023-Sep-01, Matthias van de Meent wrote:
> >> A potential addition to the patch would to stop manually closing
> >> relations: initdb and check-world succeed without manual 'close'
> >> operations because the 'open' command auto-closes the previous open
> >> relation (in boot_openrel). Testing also suggests that the last opened
> >> relation apparently doesn't need closing - check-world succeeds
> >> without issues (incl. with TAP enabled). That is therefore implemented
> >> in attached patch 2 - it removes the 'close' syntax in its entirety.
>
> > Hmm, what happens with the last relation in the bootstrap process?  Is
> > closerel() called via some other path for that one?
>
> Taking a quick census of existing closerel() callers: there is
> cleanup() in bootstrap.c, but it's called uncomfortably late
> and outside any transaction, so I misdoubt that it works
> properly if asked to actually shoulder any responsibility.
> (A little code reshuffling could fix that.)
> There are also a couple of low-level elog warnings in CREATE
> that would likely get triggered, though I suppose we could just
> remove those elogs.

Yes, that should be easy to fix.

> I guess my reaction to this patch is "why bother?".  It seems
> unlikely to yield any measurable benefit, though of course
> that guess could be wrong.

There is a small but measurable decrease in size of the generated bki
(2kb with both patches, on an initial 945kB), and there is some
related code that can be eliminated. If that's not worth bothering,
then I can drop the patch. Otherwise, I can update the patch to do the
cleanup that was within the transaction boundaries at the end of
boot_yyparse.

If decreasing the size of postgres.bki is not worth the effort, I'll
drop any effort on doing so, but considering that it is about 1MB of
our uncompressed distributables, I'd say decreases in size are worth
the effort, most of the time.

Kind regards,

Matthias van de Meent



Re: GenBKI emits useless open;close for catalogs without rows

От
Matthias van de Meent
Дата:
On Tue, 12 Sept 2023 at 17:51, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Fri, 1 Sept 2023 at 19:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > > On 2023-Sep-01, Matthias van de Meent wrote:
> > >> A potential addition to the patch would to stop manually closing
> > >> relations: initdb and check-world succeed without manual 'close'
> > >> operations because the 'open' command auto-closes the previous open
> > >> relation (in boot_openrel). Testing also suggests that the last opened
> > >> relation apparently doesn't need closing - check-world succeeds
> > >> without issues (incl. with TAP enabled). That is therefore implemented
> > >> in attached patch 2 - it removes the 'close' syntax in its entirety.
> >
> > > Hmm, what happens with the last relation in the bootstrap process?  Is
> > > closerel() called via some other path for that one?
> >
> > Taking a quick census of existing closerel() callers: there is
> > cleanup() in bootstrap.c, but it's called uncomfortably late
> > and outside any transaction, so I misdoubt that it works
> > properly if asked to actually shoulder any responsibility.
> > (A little code reshuffling could fix that.)
> > There are also a couple of low-level elog warnings in CREATE
> > that would likely get triggered, though I suppose we could just
> > remove those elogs.
>
> Yes, that should be easy to fix.
>
> > I guess my reaction to this patch is "why bother?".  It seems
> > unlikely to yield any measurable benefit, though of course
> > that guess could be wrong.
>
> There is a small but measurable decrease in size of the generated bki
> (2kb with both patches, on an initial 945kB), and there is some
> related code that can be eliminated. If that's not worth bothering,
> then I can drop the patch. Otherwise, I can update the patch to do the
> cleanup that was within the transaction boundaries at the end of
> boot_yyparse.
>
> If decreasing the size of postgres.bki is not worth the effort, I'll
> drop any effort on doing so, but considering that it is about 1MB of
> our uncompressed distributables, I'd say decreases in size are worth
> the effort, most of the time.

With the attached patch I've see a significant decrease in the size of
postgres.bki of about 25%, and a likely related decrease in wall clock
time spent in the bootstrap transaction: with timestamp logs inserted
around the boot_yyparse() transaction the measured time went from
around 49 ms on master to around 45 ms patched. In the grand scheme of
initdb that might not be a lot of time (initdb takes about 73ms
locally with syncing disabled) but it is a nice gain in performance.

Comparison:

master @ 9c13b681
 $ du -b pg_install/share/postgres.bki
945220
 $ initdb --no-instructions --auth=md5 --pwfile pwfile -N -D ~/test-dbinit/
[...]
2023-09-16 02:22:57.339 CEST [10422] LOG:  Finished bootstrapping:
to_start: 10 ms, transaction: 49 ms, finishing: 1 ms, total: 59 ms
[...]

patched
 $ du -b pg_install/share/postgres.bki
702574
 $ initdb --no-instructions --auth=md5 --pwfile pwfile -N -D ~/test-dbinit/
[...]
2023-09-16 02:25:57.664 CEST [15645] LOG:  Finished bootstrapping:
to_start: 10 ms, transaction: 45 ms, finishing: 1 ms, total: 54 ms
[...]

Various methods of reducing the size of postgres.bki were applied, as
detailed in the patch's commit message. I believe the current output
is still quite human readable.

There are other potential avenues for further reducing the bki size,
e.g. through using smaller generated OIDs (reducing the number of
characters used per OID), applying RLE on sequential NULLs (there are
3k+ occurances of /( __){2,10}/ in the generated bki file remaining),
and other tricks, but several of those are likely to be detrimental to
the readability and manual verifiability of the bki.

Kind regards,

Matthias van de Meent

Вложения

Re: GenBKI emits useless open;close for catalogs without rows

От
Heikki Linnakangas
Дата:
On 18/09/2023 17:50, Matthias van de Meent wrote:
> (initdb takes about 73ms locally with syncing disabled)

That's impressive. It takes about 600 ms on my laptop. Of which about 
140 ms goes into processing the BKI file. And that's with "initdb 
-no-sync" option.

> Various methods of reducing the size of postgres.bki were applied, as
> detailed in the patch's commit message. I believe the current output
> is still quite human readable.

Overall this does not seem very worthwhile to me.

One thing caught my eye though: We currently have an "open" command 
after every "create". Except for bootstrap relations; creating a 
bootstrap relation opens it implicitly. That seems like a weird 
inconsistency. If we make "create" to always open the relation, we can 
both make it more consistent and save a few lines. That's maybe worth 
doing, per the attached. It removes the "open" command altogether, as 
it's not needed anymore.

Looking at "perf" profile of initdb, I also noticed that a small but 
measurable amount of time is spent in the "isatty(0)" call in do_end(). 
Does anyone care about doing bootstrap mode interactively? We could 
probably remove that.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: GenBKI emits useless open;close for catalogs without rows

От
Andres Freund
Дата:
Hi,

On 2023-09-19 21:05:41 +0300, Heikki Linnakangas wrote:
> On 18/09/2023 17:50, Matthias van de Meent wrote:
> > (initdb takes about 73ms locally with syncing disabled)
>
> That's impressive. It takes about 600 ms on my laptop. Of which about 140 ms
> goes into processing the BKI file. And that's with "initdb -no-sync" option.

I think there must be a digit missing in Matthias' numbers.


> > Various methods of reducing the size of postgres.bki were applied, as
> > detailed in the patch's commit message. I believe the current output
> > is still quite human readable.
>
> Overall this does not seem very worthwhile to me.

Because the wins are too small?

FWIW, Making postgres.bki smaller and improving bootstrapping time does seem
worthwhile to me. But it doesn't seem quite right to handle the batching in
the file format, it should be on the server side, no?

We really should stop emitting WAL during initdb...


> Looking at "perf" profile of initdb, I also noticed that a small but
> measurable amount of time is spent in the "isatty(0)" call in do_end(). Does
> anyone care about doing bootstrap mode interactively? We could probably
> remove that.

Heh, yea, that's pretty pointless.

Greetings,

Andres Freund



Re: GenBKI emits useless open;close for catalogs without rows

От
Matthias van de Meent
Дата:
On Tue, 19 Sept 2023 at 20:05, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 18/09/2023 17:50, Matthias van de Meent wrote:
> > (initdb takes about 73ms locally with syncing disabled)
>
> That's impressive. It takes about 600 ms on my laptop. Of which about
> 140 ms goes into processing the BKI file. And that's with "initdb
> -no-sync" option.

Hmm, yes, I misinterpreted my own benchmark setup, the actual value
would be somewhere around 365ms: I thought I was doing 50*50 runs in
one timed run, but really I was doing only 50 runs. TO add insult to
injury, I divided the total time taken by 250 instead of either 50 or
2500... Thanks for correcting me on that.

> > Various methods of reducing the size of postgres.bki were applied, as
> > detailed in the patch's commit message. I believe the current output
> > is still quite human readable.
>
> Overall this does not seem very worthwhile to me.

Reducing the size of redistributables sounds worthwhile to me, but if
none of these changes are worth the effort, then alright, nothing
gained, only time lost.

> Looking at "perf" profile of initdb, I also noticed that a small but
> measurable amount of time is spent in the "isatty(0)" call in do_end().
> Does anyone care about doing bootstrap mode interactively? We could
> probably remove that.

Yeah, that sounds like a good idea.

Kind regards,

Matthias van de Meent



Re: GenBKI emits useless open;close for catalogs without rows

От
Matthias van de Meent
Дата:
On Fri, 22 Sept 2023 at 00:25, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2023-09-19 21:05:41 +0300, Heikki Linnakangas wrote:
> > On 18/09/2023 17:50, Matthias van de Meent wrote:
> > > (initdb takes about 73ms locally with syncing disabled)
> >
> > That's impressive. It takes about 600 ms on my laptop. Of which about 140 ms
> > goes into processing the BKI file. And that's with "initdb -no-sync" option.
>
> I think there must be a digit missing in Matthias' numbers.

Yes, kind of. The run was on 50 iterations, not the assumed 250.
Also note that the improved measurements were recorded inside the
boostrap-mode PostgreSQL instance, not inside the initdb that was
processing the postgres.bki file. So it might well be that I didn't
improve the total timing by much.

> > > Various methods of reducing the size of postgres.bki were applied, as
> > > detailed in the patch's commit message. I believe the current output
> > > is still quite human readable.
> >
> > Overall this does not seem very worthwhile to me.
>
> Because the wins are too small?
>
> FWIW, Making postgres.bki smaller and improving bootstrapping time does seem
> worthwhile to me. But it doesn't seem quite right to handle the batching in
> the file format, it should be on the server side, no?

The main reason I did batching in the file format is to reduce the
storage overhead of the current one "INSERT" per row. Batching
improved that by replacing the token with a different construct, but
it's not neccessarily the only solution. The actual parser still
inserts the tuples one by one in the relation, as I didn't spend time
on making a simple_heap_insert analog for bulk insertions.

> We really should stop emitting WAL during initdb...

I think it's quite elegant that we're able to bootstrap the relation
data of a new PostgreSQL cluster from the WAL generated in another
cluster, even if it is indeed a bit wasteful. I do see your point
though - the WAL shouldn't be needed if we're already fsyncing the
files to disk.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: GenBKI emits useless open;close for catalogs without rows

От
Peter Eisentraut
Дата:
On 19.09.23 20:05, Heikki Linnakangas wrote:
> One thing caught my eye though: We currently have an "open" command 
> after every "create". Except for bootstrap relations; creating a 
> bootstrap relation opens it implicitly. That seems like a weird 
> inconsistency. If we make "create" to always open the relation, we can 
> both make it more consistent and save a few lines. That's maybe worth 
> doing, per the attached. It removes the "open" command altogether, as 
> it's not needed anymore.

This seems like a good improvement to me.

It would restrict the bootstrap language so that you can only manipulate 
a table right after creating it, but I don't see why that wouldn't be 
sufficient.




Re: GenBKI emits useless open;close for catalogs without rows

От
Peter Eisentraut
Дата:
On 08.11.23 08:16, Peter Eisentraut wrote:
> On 19.09.23 20:05, Heikki Linnakangas wrote:
>> One thing caught my eye though: We currently have an "open" command 
>> after every "create". Except for bootstrap relations; creating a 
>> bootstrap relation opens it implicitly. That seems like a weird 
>> inconsistency. If we make "create" to always open the relation, we can 
>> both make it more consistent and save a few lines. That's maybe worth 
>> doing, per the attached. It removes the "open" command altogether, as 
>> it's not needed anymore.
> 
> This seems like a good improvement to me.
> 
> It would restrict the bootstrap language so that you can only manipulate 
> a table right after creating it, but I don't see why that wouldn't be 
> sufficient.

Then again, this sort of achieves the opposite of what Matthias was 
aiming for: You are now forcing some relations to be opened even though 
we will end up closing it right away.

(In any case, documentation in bki.sgml would need to be updated for 
this patch.)




Re: GenBKI emits useless open;close for catalogs without rows

От
vignesh C
Дата:
On Wed, 8 Nov 2023 at 12:50, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 08.11.23 08:16, Peter Eisentraut wrote:
> > On 19.09.23 20:05, Heikki Linnakangas wrote:
> >> One thing caught my eye though: We currently have an "open" command
> >> after every "create". Except for bootstrap relations; creating a
> >> bootstrap relation opens it implicitly. That seems like a weird
> >> inconsistency. If we make "create" to always open the relation, we can
> >> both make it more consistent and save a few lines. That's maybe worth
> >> doing, per the attached. It removes the "open" command altogether, as
> >> it's not needed anymore.
> >
> > This seems like a good improvement to me.
> >
> > It would restrict the bootstrap language so that you can only manipulate
> > a table right after creating it, but I don't see why that wouldn't be
> > sufficient.
>
> Then again, this sort of achieves the opposite of what Matthias was
> aiming for: You are now forcing some relations to be opened even though
> we will end up closing it right away.
>
> (In any case, documentation in bki.sgml would need to be updated for
> this patch.)

I have changed the status of the patch to WOA, feel free to update the
status once Peter's documentation comments are addressed.

Regards,
Vignesh



Re: GenBKI emits useless open;close for catalogs without rows

От
"Andrey M. Borodin"
Дата:

> On 22 Sep 2023, at 18:50, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:

Hi Matthias!

This is kind reminder that this thread is waiting for your response.
CF entry [0] is in "Waiting on Author", I'll move it to July CF.

Thanks!


Best regards, Andrey Borodin.

[0] https://commitfest.postgresql.org/47/4544/


Re: GenBKI emits useless open;close for catalogs without rows

От
Michael Paquier
Дата:
On Mon, Apr 08, 2024 at 11:11:07AM +0300, Andrey M. Borodin wrote:
> This is kind reminder that this thread is waiting for your response.
> CF entry [0] is in "Waiting on Author", I'll move it to July CF.

Hmm, is that productive?  This patch has been waiting on author since
the 1st of February, and it was already moved from the CF 2024-01 to
2024-03.  It would make more sense to me to mark it as RwF, then
resubmit if there is still interest in working on this topic rather
than move it again.

My personal inner rule is there is enough ground for a patch to be
marked as RwF if it has been waiting on author since the middle of a
commit fest, which would be the 15th of March for CF 2024-03.  This
lets two weeks to authors to react.
--
Michael

Вложения

Re: GenBKI emits useless open;close for catalogs without rows

От
Robert Haas
Дата:
On Tue, Apr 9, 2024 at 12:03 AM Michael Paquier <michael@paquier.xyz> wrote:
> Hmm, is that productive?  This patch has been waiting on author since
> the 1st of February, and it was already moved from the CF 2024-01 to
> 2024-03.  It would make more sense to me to mark it as RwF, then
> resubmit if there is still interest in working on this topic rather
> than move it again.

Done.

--
Robert Haas
EDB: http://www.enterprisedb.com