Обсуждение: Move bki file pre-processing from initdb to bootstrap

Поиск
Список
Период
Сортировка

Move bki file pre-processing from initdb to bootstrap

От
Krishnakumar R
Дата:
Hi All,

This patch moves the pre-processing for tokens in the bki file from
initdb to bootstrap. With these changes the bki file will only be
opened once in bootstrap and parsing will be done by the bootstrap
parser.

The flow of bki file processing will be as follows:
- In initdb gather the values used to replace the tokens in the bki file.
- Pass these values into postgres bootstrap startup using '-i' option
as key-value pairs.
- In bootstrap open the bki file (the bki file name was received as a
parameter).
- During the parsing of the bki file, replace the tokens received as
parameters with their values.

Related discussion can be found here:
https://www.postgresql.org/message-id/20220216021219.ygzrtb3hd5bn7olz%40alap3.anarazel.de

Note: Currently the patch breaks on windows due to placement of extra
quotes when passing parameters (Thanks to Thomas Munro for helping me
find that). Will follow up with v2 fixing the windows issues on
passing  the parameters and format fixes.

Please review and provide feedback.

--
Thanks and Regards,
Krishnakumar (KK).
[Microsoft]

Вложения

Re: Move bki file pre-processing from initdb to bootstrap

От
Tom Lane
Дата:
Krishnakumar R <kksrcv001@gmail.com> writes:
> This patch moves the pre-processing for tokens in the bki file from
> initdb to bootstrap. With these changes the bki file will only be
> opened once in bootstrap and parsing will be done by the bootstrap
> parser.

You haven't provided any iota of evidence why this would be an
improvement.

            regards, tom lane



Re: Move bki file pre-processing from initdb to bootstrap

От
Peter Eisentraut
Дата:
On 01.09.23 14:37, Tom Lane wrote:
> Krishnakumar R <kksrcv001@gmail.com> writes:
>> This patch moves the pre-processing for tokens in the bki file from
>> initdb to bootstrap. With these changes the bki file will only be
>> opened once in bootstrap and parsing will be done by the bootstrap
>> parser.
> 
> You haven't provided any iota of evidence why this would be an
> improvement.

I had played with similar ideas in the past, because it would shave some 
time of initdb, which would accumulate noticeably over a full test run.

But now with the initdb caching mechanism, I wonder whether this is 
still needed.




Re: Move bki file pre-processing from initdb to bootstrap

От
Andres Freund
Дата:
Hi,

On 2023-09-01 14:59:57 +0200, Peter Eisentraut wrote:
> On 01.09.23 14:37, Tom Lane wrote:
> > Krishnakumar R <kksrcv001@gmail.com> writes:
> > > This patch moves the pre-processing for tokens in the bki file from
> > > initdb to bootstrap. With these changes the bki file will only be
> > > opened once in bootstrap and parsing will be done by the bootstrap
> > > parser.
> > 
> > You haven't provided any iota of evidence why this would be an
> > improvement.
> 
> I had played with similar ideas in the past, because it would shave some
> time of initdb, which would accumulate noticeably over a full test run.
> 
> But now with the initdb caching mechanism, I wonder whether this is still
> needed.

I think it's still relevant - it's not just our own test infrastructure that
runs a lot of initdbs, it's also lots of projects using postgres.


The main reason I'd like to move this infrastructure to the backend is that I
really would like to get rid of single user mode. It adds complications all
over, it's barely tested, pointlessly hard to use. I wrote a rough prototype
of that a while back:
https://postgr.es/m/20220220214439.bhc35hhbaub6dush%40alap3.anarazel.de

Greetings,

Andres Freund



Re: Move bki file pre-processing from initdb to bootstrap

От
Peter Eisentraut
Дата:
On 01.09.23 10:01, Krishnakumar R wrote:
> This patch moves the pre-processing for tokens in the bki file from
> initdb to bootstrap. With these changes the bki file will only be
> opened once in bootstrap and parsing will be done by the bootstrap
> parser.

I did some rough performance tests on this.  I get about a 10% 
improvement on initdb run time, so this appears to have merit.

I wonder whether we can reduce the number of symbols that we need this 
treatment for.

For example, for NAMEDATALEN, SIZEOF_POINTER, ALIGNOF_POINTER, 
FLOAT8PASSBYVAL, these are known at build time, so we could have 
genbki.pl substitute them at build time.

The locale-related symbols (ENCODING, LC_COLLATE, etc.), I wonder 
whether we can eliminate the need for them.  Right now, these are only 
used in the bki entry for the template1 database.  How about initdb 
creates template0 first, with hardcoded default encoding, collation, 
etc., and then creates template1 from that, using the normal CREATE 
DATABASE command with the appropriate options.  Or initdb could just run 
an UPDATE on pg_database to put the right settings in place.

I don't like this part so much, because it adds like 4 more places each 
of these variables is mentioned, which increases the mental and testing 
overhead for dealing with these features (which are an area of active 
development).

In general, it would be good if this could be factored a bit more so 
each variable doesn't have to be hardcoded in so many places.


Some more detailed comments on the code:

+                   boot_yylval.str = pstrdup(yytext);
+                   sprintf(boot_yylval.str, "%d", NAMEDATALEN);

This is weird.  You are first assigning the text and then overwriting it 
with the numeric value?

You can also use boot_yylval.ival for storing numbers.

+                   if (bootp_null(ebootp, ebootp->username)) return 
NULLVAL;

Add proper line breaks in the code.

+bool bootp_null(extra_bootstrap_params *e, char *s)

Add a comment what this function is supposed to do.

This function could be static.

+   while ((flag = getopt(argc, argv, "B:c:d:D:Fi:kr:X:-:")) != -1)

You should use an option letter that isn't already in use in one of the 
other modes of "postgres".  We try to keep those consistent.

New options should be added to the --help output (usage() in main.c).

+   elog(INFO, "Open bki file %s\n", bki_file);
+   boot_yyin = fopen(bki_file, "r");

Why is this needed?  It already reads the bki file from stdin?

+   printfPQExpBuffer(&cmd, "\"%s\" --boot -X %d %s %s %s %s -i 
%s=%s,%s=%s,%s=%s,"
+                     "%s=%s,%s=%s,%s=%s,%s=%s,%s=%c",
+                     backend_exec,
+                     wal_segment_size_mb * (1024 * 1024),
+                     boot_options, extra_options,
+                     data_checksums ? "-k" : "",
+                     debug ? "-d 5" : "",

This appears to undo some of the changes done in cccdbc5d95.

+#define BOOT_LC_COLLATE "lc_collate"
+#define BOOT_LC_CTYPE "lc_ctype"
+#define BOOT_ICU_LOCALE "icu_locale"

etc.  This doesn't look particularly useful.  You can just use the 
strings directly.




Re: Move bki file pre-processing from initdb to bootstrap

От
Krishnakumar R
Дата:
Thank you, Peter, Andres and Tom for your comments and thoughts.

Hi Peter,

> For example, for NAMEDATALEN, SIZEOF_POINTER, ALIGNOF_POINTER,
> FLOAT8PASSBYVAL, these are known at build time, so we could have
> genbki.pl substitute them at build time.

I have modified the patch to use genbki to generate these ones during
build time.

> The locale-related symbols (ENCODING, LC_COLLATE, etc.), I wonder
> whether we can eliminate the need for them.  Right now, these are only
> used in the bki entry for the template1 database.  How about initdb
> creates template0 first, with hardcoded default encoding, collation,
> etc., and then creates template1 from that, using the normal CREATE
> DATABASE command with the appropriate options.  Or initdb could just run
> an UPDATE on pg_database to put the right settings in place.

Using a combination of this suggestion and discussions Andres pointed
to in this thread, updated the patch to add placeholder values first
into template1 and then do UPDATEs in initdb itself.

> You should use an option letter that isn't already in use in one of the
> other modes of "postgres".  We try to keep those consistent.
>
> New options should be added to the --help output (usage() in main.c).

Used a -b option under bootstrap mode and added help.

> elog(INFO, "Open bki file %s\n", bki_file);
> +   boot_yyin = fopen(bki_file, "r");
>
> Why is this needed?  It already reads the bki file from stdin?

We no longer open the bki file in initdb and pass to postgres to parse
from stdin, instead we open the bki file directly in bootstrap and
pass the file stream to the parser. Hence the need to switch the yyin.
Have added a comment in the commit logs to capture this.

The version comparison has been moved from initdb to bootstrap. This
created some compatibility problems with windows tests. For now I kept
the version check to not have \n added, which worked fine and serves
the purpose. However hoping to have something better in v3 in addition
to addressing any other comments.

Please let me know your thoughts and review comments.

--
Thanks and Regards,
Krishnakumar (KK).
[Microsoft]

On Tue, Sep 19, 2023 at 3:18 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 01.09.23 10:01, Krishnakumar R wrote:
> > This patch moves the pre-processing for tokens in the bki file from
> > initdb to bootstrap. With these changes the bki file will only be
> > opened once in bootstrap and parsing will be done by the bootstrap
> > parser.
>
> I did some rough performance tests on this.  I get about a 10%
> improvement on initdb run time, so this appears to have merit.
>
> I wonder whether we can reduce the number of symbols that we need this
> treatment for.
>
> For example, for NAMEDATALEN, SIZEOF_POINTER, ALIGNOF_POINTER,
> FLOAT8PASSBYVAL, these are known at build time, so we could have
> genbki.pl substitute them at build time.
>
> The locale-related symbols (ENCODING, LC_COLLATE, etc.), I wonder
> whether we can eliminate the need for them.  Right now, these are only
> used in the bki entry for the template1 database.  How about initdb
> creates template0 first, with hardcoded default encoding, collation,
> etc., and then creates template1 from that, using the normal CREATE
> DATABASE command with the appropriate options.  Or initdb could just run
> an UPDATE on pg_database to put the right settings in place.
>
> I don't like this part so much, because it adds like 4 more places each
> of these variables is mentioned, which increases the mental and testing
> overhead for dealing with these features (which are an area of active
> development).
>
> In general, it would be good if this could be factored a bit more so
> each variable doesn't have to be hardcoded in so many places.
>
>
> Some more detailed comments on the code:
>
> +                   boot_yylval.str = pstrdup(yytext);
> +                   sprintf(boot_yylval.str, "%d", NAMEDATALEN);
>
> This is weird.  You are first assigning the text and then overwriting it
> with the numeric value?
>
> You can also use boot_yylval.ival for storing numbers.
>
> +                   if (bootp_null(ebootp, ebootp->username)) return
> NULLVAL;
>
> Add proper line breaks in the code.
>
> +bool bootp_null(extra_bootstrap_params *e, char *s)
>
> Add a comment what this function is supposed to do.
>
> This function could be static.
>
> +   while ((flag = getopt(argc, argv, "B:c:d:D:Fi:kr:X:-:")) != -1)
>
> You should use an option letter that isn't already in use in one of the
> other modes of "postgres".  We try to keep those consistent.
>
> New options should be added to the --help output (usage() in main.c).
>
> +   elog(INFO, "Open bki file %s\n", bki_file);
> +   boot_yyin = fopen(bki_file, "r");
>
> Why is this needed?  It already reads the bki file from stdin?
>
> +   printfPQExpBuffer(&cmd, "\"%s\" --boot -X %d %s %s %s %s -i
> %s=%s,%s=%s,%s=%s,"
> +                     "%s=%s,%s=%s,%s=%s,%s=%s,%s=%c",
> +                     backend_exec,
> +                     wal_segment_size_mb * (1024 * 1024),
> +                     boot_options, extra_options,
> +                     data_checksums ? "-k" : "",
> +                     debug ? "-d 5" : "",
>
> This appears to undo some of the changes done in cccdbc5d95.
>
> +#define BOOT_LC_COLLATE "lc_collate"
> +#define BOOT_LC_CTYPE "lc_ctype"
> +#define BOOT_ICU_LOCALE "icu_locale"
>
> etc.  This doesn't look particularly useful.  You can just use the
> strings directly.
>

Вложения

Re: Move bki file pre-processing from initdb to bootstrap

От
Krishnakumar R
Дата:
> The version comparison has been moved from initdb to bootstrap. This
> created some compatibility problems with windows tests. For now I kept
> the version check to not have \n added, which worked fine and serves
> the purpose. However hoping to have something better in v3 in addition
> to addressing any other comments.

With help from Thomas, figured out that on windows fopen uses binary
mode in the backend which causes issues with EOL. Please find the
attached patch updated with a fix for this.

--
Thanks and Regards,
Krishnakumar (KK).
[Microsoft]

Вложения

Re: Move bki file pre-processing from initdb to bootstrap

От
Peter Eisentraut
Дата:
On 06.10.23 02:24, Krishnakumar R wrote:
>> elog(INFO, "Open bki file %s\n", bki_file);
>> +   boot_yyin = fopen(bki_file, "r");
>>
>> Why is this needed?  It already reads the bki file from stdin?
> We no longer open the bki file in initdb and pass to postgres to parse
> from stdin, instead we open the bki file directly in bootstrap and
> pass the file stream to the parser. Hence the need to switch the yyin.
> Have added a comment in the commit logs to capture this.

Why this change?  I mean, there is nothing wrong with it, but I don't 
follow how changing from reading from stdin to reading from a named file 
is related to moving the parameter substitution from initdb to the backend.

One effect of this is that we would now have two different ways initdb 
interacts with the backend.  In bootstrap mode, it reads from a named 
file, and the second run (the one that loads the system views etc.) 
reads from stdin.  It's already confusing enough, so any further 
divergence should be adequately explained.




Re: Move bki file pre-processing from initdb to bootstrap

От
Peter Eisentraut
Дата:
On 17.10.23 03:32, Krishnakumar R wrote:
>> The version comparison has been moved from initdb to bootstrap. This
>> created some compatibility problems with windows tests. For now I kept
>> the version check to not have \n added, which worked fine and serves
>> the purpose. However hoping to have something better in v3 in addition
>> to addressing any other comments.
> 
> With help from Thomas, figured out that on windows fopen uses binary
> mode in the backend which causes issues with EOL. Please find the
> attached patch updated with a fix for this.

I suggest that this patch set be split up into three incremental parts:

1. Move some build-time settings from initdb to postgres.bki.
2. The database locale handling.
3. The bki file handling.

Each of these topics really needs a separate detailed consideration.




Re: Move bki file pre-processing from initdb to bootstrap

От
Krishnakumar R
Дата:
Thank you for review, Peter.

Makes sense on the split part. Was starting to think in same lines, at the end of last iteration. Will come back shortly. 

On Fri, Nov 10, 2023 at 12:48 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 17.10.23 03:32, Krishnakumar R wrote:
>> The version comparison has been moved from initdb to bootstrap. This
>> created some compatibility problems with windows tests. For now I kept
>> the version check to not have \n added, which worked fine and serves
>> the purpose. However hoping to have something better in v3 in addition
>> to addressing any other comments.
>
> With help from Thomas, figured out that on windows fopen uses binary
> mode in the backend which causes issues with EOL. Please find the
> attached patch updated with a fix for this.

I suggest that this patch set be split up into three incremental parts:

1. Move some build-time settings from initdb to postgres.bki.
2. The database locale handling.
3. The bki file handling.

Each of these topics really needs a separate detailed consideration.

Re: Move bki file pre-processing from initdb to bootstrap

От
vignesh C
Дата:
On Sat, 11 Nov 2023 at 00:03, Krishnakumar R <kksrcv001@gmail.com> wrote:
>
> Thank you for review, Peter.
>
> Makes sense on the split part. Was starting to think in same lines, at the end of last iteration. Will come back
shortly.
>
> On Fri, Nov 10, 2023 at 12:48 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> On 17.10.23 03:32, Krishnakumar R wrote:
>> >> The version comparison has been moved from initdb to bootstrap. This
>> >> created some compatibility problems with windows tests. For now I kept
>> >> the version check to not have \n added, which worked fine and serves
>> >> the purpose. However hoping to have something better in v3 in addition
>> >> to addressing any other comments.
>> >
>> > With help from Thomas, figured out that on windows fopen uses binary
>> > mode in the backend which causes issues with EOL. Please find the
>> > attached patch updated with a fix for this.
>>
>> I suggest that this patch set be split up into three incremental parts:
>>
>> 1. Move some build-time settings from initdb to postgres.bki.
>> 2. The database locale handling.
>> 3. The bki file handling.
>>
>> Each of these topics really needs a separate detailed consideration.

The patch which you submitted has been awaiting your attention for
quite some time now.  As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible.  Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.

Regards,
Vignesh