Обсуждение: GUC names in messages

Поиск
Список
Период
Сортировка

GUC names in messages

От
Peter Smith
Дата:
Hi hackers,

While reviewing another patch I noticed how the GUCs are
inconsistently named within the GUC_check_errdetail messages:

======

below, the GUC name is embedded but not quoted:

src/backend/access/transam/xlogprefetcher.c:
GUC_check_errdetail("recovery_prefetch is not supported on platforms
that lack posix_fadvise().");
src/backend/access/transam/xlogrecovery.c:
GUC_check_errdetail("recovery_target_timeline is not a valid
number.");
src/backend/commands/variable.c:
GUC_check_errdetail("effective_io_concurrency must be set to 0 on
platforms that lack posix_fadvise().");
src/backend/commands/variable.c:
GUC_check_errdetail("maintenance_io_concurrency must be set to 0 on
platforms that lack posix_fadvise().");
src/backend/port/sysv_shmem.c:
GUC_check_errdetail("huge_page_size must be 0 on this platform.");
src/backend/port/win32_shmem.c:
GUC_check_errdetail("huge_page_size must be 0 on this platform.");
src/backend/replication/syncrep.c:
GUC_check_errdetail("synchronous_standby_names parser failed");
src/backend/storage/file/fd.c:
GUC_check_errdetail("debug_io_direct is not supported on this
platform.");
src/backend/storage/file/fd.c:
GUC_check_errdetail("debug_io_direct is not supported for WAL because
XLOG_BLCKSZ is too small");
src/backend/storage/file/fd.c:
GUC_check_errdetail("debug_io_direct is not supported for data because
BLCKSZ is too small");
src/backend/tcop/postgres.c:
GUC_check_errdetail("client_connection_check_interval must be set to 0
on this platform.");

~~~

below, the GUC name is embedded and double-quoted:

src/backend/commands/vacuum.c:
GUC_check_errdetail("\"vacuum_buffer_usage_limit\" must be 0 or
between %d kB and %d kB",
src/backend/commands/variable.c:
GUC_check_errdetail("Conflicting \"datestyle\" specifications.");
src/backend/storage/buffer/localbuf.c:
GUC_check_errdetail("\"temp_buffers\" cannot be changed after any
temporary tables have been accessed in the session.");
src/backend/tcop/postgres.c:
GUC_check_errdetail("\"max_stack_depth\" must not exceed %ldkB.",
src/backend/tcop/postgres.c:        GUC_check_errdetail("Cannot enable
parameter when \"log_statement_stats\" is true.");
src/backend/tcop/postgres.c:        GUC_check_errdetail("Cannot enable
\"log_statement_stats\" when "

~~~

below, the GUC name is substituted but not quoted:

src/backend/access/table/tableamapi.c:      GUC_check_errdetail("%s
cannot be empty.",
src/backend/access/table/tableamapi.c:      GUC_check_errdetail("%s is
too long (maximum %d characters).",

~~~

I had intended to make a patch to address the inconsistency, but
couldn't decide which of those styles was the preferred one.

Then I worried this could be the tip of the iceberg -- GUC names occur
in many other error messages where they are sometimes quoted and
sometimes not quoted:
e.g. Not quoted -- errhint("You might need to run fewer transactions
at a time or increase max_connections.")));
e.g. Quoted -- errmsg("\"max_wal_size\" must be at least twice
\"wal_segment_size\"")));

Ideally, they should all look the same everywhere, shouldn't they?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: GUC names in messages

От
Peter Smith
Дата:
On Wed, Nov 1, 2023 at 8:02 PM Peter Smith <smithpb2250@gmail.com> wrote:
...
>
> I had intended to make a patch to address the inconsistency, but
> couldn't decide which of those styles was the preferred one.
>
> Then I worried this could be the tip of the iceberg -- GUC names occur
> in many other error messages where they are sometimes quoted and
> sometimes not quoted:
> e.g. Not quoted -- errhint("You might need to run fewer transactions
> at a time or increase max_connections.")));
> e.g. Quoted -- errmsg("\"max_wal_size\" must be at least twice
> \"wal_segment_size\"")));
>
> Ideally, they should all look the same everywhere, shouldn't they?
>

One idea to achieve consistency might be to always substitute GUC
names using a macro.

#define GUC_NAME(s) ("\"" s "\"")

ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
    errmsg("%s must be at least twice %s",
        GUC_NAME("max_wal_size"),
        GUC_NAME("wal_segment_size"))));

Thoughts?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: GUC names in messages

От
Daniel Gustafsson
Дата:
> On 1 Nov 2023, at 10:02, Peter Smith <smithpb2250@gmail.com> wrote:

> GUC_check_errdetail("effective_io_concurrency must be set to 0 on
> platforms that lack posix_fadvise().");
> src/backend/commands/variable.c:
> GUC_check_errdetail("maintenance_io_concurrency must be set to 0 on
> platforms that lack posix_fadvise().");

These should be substituted to reduce the number of distinct messages that need
to be translated.  I wouldn't be surprised if more like these have slipped
through.

> I had intended to make a patch to address the inconsistency, but
> couldn't decide which of those styles was the preferred one.

Given the variety in the codebase I don't think there is a preferred one.

> Then I worried this could be the tip of the iceberg

All good rabbit-holes uncovered during hacking are.. =)

> Ideally, they should all look the same everywhere, shouldn't they?

Having a policy would be good, having one which is known and enforced is even
better (like how we are consistent around error messages based on our Error
Message Style Guide).

--
Daniel Gustafsson




Re: GUC names in messages

От
Daniel Gustafsson
Дата:
> On 1 Nov 2023, at 10:22, Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Nov 1, 2023 at 8:02 PM Peter Smith <smithpb2250@gmail.com> wrote:
> ...
>>
>> I had intended to make a patch to address the inconsistency, but
>> couldn't decide which of those styles was the preferred one.
>>
>> Then I worried this could be the tip of the iceberg -- GUC names occur
>> in many other error messages where they are sometimes quoted and
>> sometimes not quoted:
>> e.g. Not quoted -- errhint("You might need to run fewer transactions
>> at a time or increase max_connections.")));
>> e.g. Quoted -- errmsg("\"max_wal_size\" must be at least twice
>> \"wal_segment_size\"")));
>>
>> Ideally, they should all look the same everywhere, shouldn't they?
>>
>
> One idea to achieve consistency might be to always substitute GUC
> names using a macro.
>
> #define GUC_NAME(s) ("\"" s "\"")
>
> ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>    errmsg("%s must be at least twice %s",
>        GUC_NAME("max_wal_size"),
>        GUC_NAME("wal_segment_size"))));

Something like this might make translations harder since the remaining string
leaves little context about the message.  We already have that today to some
extent (so it might not be an issue), and it might be doable to automatically
add translator comments, but it's something to consider.

--
Daniel Gustafsson




Re: GUC names in messages

От
Tom Lane
Дата:
Daniel Gustafsson <daniel@yesql.se> writes:
> On 1 Nov 2023, at 10:22, Peter Smith <smithpb2250@gmail.com> wrote:
>> One idea to achieve consistency might be to always substitute GUC
>> names using a macro.
>>
>> #define GUC_NAME(s) ("\"" s "\"")
>>
>> ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> errmsg("%s must be at least twice %s",
>> GUC_NAME("max_wal_size"),
>> GUC_NAME("wal_segment_size"))));

> Something like this might make translations harder since the remaining string
> leaves little context about the message.  We already have that today to some
> extent (so it might not be an issue), and it might be doable to automatically
> add translator comments, but it's something to consider.

Our error message style guidelines say not to assemble messages out
of separate parts, because it makes translation difficult.  Originally
we applied that rule to GUC names mentioned in messages as well.
Awhile ago the translation team decided that that made for too many
duplicative translations, so they'd be willing to compromise on
substituting GUC names.  That's only been changed in a haphazard
fashion though, mostly in cases where there actually were duplicative
messages that could be merged this way.  And there's never been any
real clarity about whether to quote GUC names, though certainly we're
more likely to quote anything injected with %s.  So that's why we have
a mishmash right now.

I'm not enamored of the GUC_NAME idea suggested above.  I don't
think it buys anything, and what it does do is make *every single
one* of our GUC-mentioning messages wrong.  I think if we want to
standardize here, we should standardize on something that's
already pretty common in the code base.

Another problem with the idea as depicted above is that it
mistakenly assumes that "..." is the correct quoting method
in all languages.  You could make GUC_NAME be a pure no-op
macro and continue to put quoting in the translatable string
where it belongs, but then the macro brings even less value.

            regards, tom lane



Re: GUC names in messages

От
Peter Eisentraut
Дата:
On 01.11.23 10:25, Tom Lane wrote:
> And there's never been any
> real clarity about whether to quote GUC names, though certainly we're
> more likely to quote anything injected with %s.  So that's why we have
> a mishmash right now.

I'm leaning toward not quoting GUC names.  The quoting is needed in 
places where the value can be arbitrary, to avoid potential confusion. 
But the GUC names are well-known, and we wouldn't add confusing GUC 
names like "table" or "not found" in the future.



Re: GUC names in messages

От
Laurenz Albe
Дата:
On Wed, 2023-11-01 at 16:12 -0400, Peter Eisentraut wrote:
> On 01.11.23 10:25, Tom Lane wrote:
> > And there's never been any
> > real clarity about whether to quote GUC names, though certainly we're
> > more likely to quote anything injected with %s.  So that's why we have
> > a mishmash right now.
>
> I'm leaning toward not quoting GUC names.  The quoting is needed in
> places where the value can be arbitrary, to avoid potential confusion.
> But the GUC names are well-known, and we wouldn't add confusing GUC
> names like "table" or "not found" in the future.

I agree for names with underscores in them.  But I think that quoting
is necessary for names like "timezone" or "datestyle" that might be
mistaken for normal words.  My personal preference is to always quote
GUC names, but I think it is OK not to quote GOCs whose name are
clearly not natural language words.

Yours,
Laurenz Albe



Re: GUC names in messages

От
Peter Smith
Дата:
On Thu, Nov 2, 2023 at 1:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
> > On 1 Nov 2023, at 10:22, Peter Smith <smithpb2250@gmail.com> wrote:
> >> One idea to achieve consistency might be to always substitute GUC
> >> names using a macro.
> >>
> >> #define GUC_NAME(s) ("\"" s "\"")
> >>
> >> ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> >> errmsg("%s must be at least twice %s",
> >> GUC_NAME("max_wal_size"),
> >> GUC_NAME("wal_segment_size"))));
>
> > Something like this might make translations harder since the remaining string
> > leaves little context about the message.  We already have that today to some
> > extent (so it might not be an issue), and it might be doable to automatically
> > add translator comments, but it's something to consider.
>
> Our error message style guidelines say not to assemble messages out
> of separate parts, because it makes translation difficult.  Originally
> we applied that rule to GUC names mentioned in messages as well.
> Awhile ago the translation team decided that that made for too many
> duplicative translations, so they'd be willing to compromise on
> substituting GUC names.  That's only been changed in a haphazard
> fashion though, mostly in cases where there actually were duplicative
> messages that could be merged this way.  And there's never been any
> real clarity about whether to quote GUC names, though certainly we're
> more likely to quote anything injected with %s.  So that's why we have
> a mishmash right now.
>
> I'm not enamored of the GUC_NAME idea suggested above.  I don't
> think it buys anything, and what it does do is make *every single
> one* of our GUC-mentioning messages wrong.  I think if we want to
> standardize here, we should standardize on something that's
> already pretty common in the code base.
>

Thanks to everybody for the feedback received so far.

Perhaps as a first step, I can try to quantify the GUC name styles
already in the source code. The numbers might help decide how to
proceed

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: GUC names in messages

От
Nathan Bossart
Дата:
On Wed, Nov 01, 2023 at 09:46:52PM +0100, Laurenz Albe wrote:
> I agree for names with underscores in them.  But I think that quoting
> is necessary for names like "timezone" or "datestyle" that might be
> mistaken for normal words.  My personal preference is to always quote
> GUC names, but I think it is OK not to quote GOCs whose name are
> clearly not natural language words.

+1, IMHO quoting GUC names makes it abundantly clear that they are special
identifiers.  In de4d456, we quoted the role names in a bunch of messages.
We didn't quote the attribute/option names, but those are in all-caps, so
they already stand out nicely.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: GUC names in messages

От
Alvaro Herrera
Дата:
On 2023-Nov-01, Nathan Bossart wrote:

> On Wed, Nov 01, 2023 at 09:46:52PM +0100, Laurenz Albe wrote:
> > I agree for names with underscores in them.  But I think that quoting
> > is necessary for names like "timezone" or "datestyle" that might be
> > mistaken for normal words.  My personal preference is to always quote
> > GUC names, but I think it is OK not to quote GOCs whose name are
> > clearly not natural language words.
> 
> +1, IMHO quoting GUC names makes it abundantly clear that they are special
> identifiers.  In de4d456, we quoted the role names in a bunch of messages.
> We didn't quote the attribute/option names, but those are in all-caps, so
> they already stand out nicely.

I like this, and I propose we codify it in the message style guide.  How
about this?  We can start looking at code changes to make once we decide
we agree with this.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La verdad no siempre es bonita, pero el hambre de ella sí"

Вложения

Re: GUC names in messages

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On 2023-Nov-01, Nathan Bossart wrote:
>> +1, IMHO quoting GUC names makes it abundantly clear that they are special
>> identifiers.  In de4d456, we quoted the role names in a bunch of messages.
>> We didn't quote the attribute/option names, but those are in all-caps, so
>> they already stand out nicely.

> I like this, and I propose we codify it in the message style guide.  How
> about this?  We can start looking at code changes to make once we decide
> we agree with this.

WFM.

            regards, tom lane



Re: GUC names in messages

От
Nathan Bossart
Дата:
On Tue, Nov 07, 2023 at 10:33:03AM +0100, Alvaro Herrera wrote:
> On 2023-Nov-01, Nathan Bossart wrote:
>> +1, IMHO quoting GUC names makes it abundantly clear that they are special
>> identifiers.  In de4d456, we quoted the role names in a bunch of messages.
>> We didn't quote the attribute/option names, but those are in all-caps, so
>> they already stand out nicely.
> 
> I like this, and I propose we codify it in the message style guide.  How
> about this?  We can start looking at code changes to make once we decide
> we agree with this.

> +   <para>
> +    In messages containing configuration variable names, quotes are
> +    not necessary when the names are visibly not English natural words, such
> +    as when they have underscores or are all-uppercase.  Otherwise, quotes
> +    must be added.  Do include double-quotes in a message where an arbitrary
> +    variable name is to be expanded.
> +   </para>

І'd vote for quoting all GUC names, if for no other reason than "visibly
not English natural words" feels a bit open to interpretation.  But this
seems like it's on the right track, so I won't argue too strongly if I'm
the only holdout.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: GUC names in messages

От
Peter Smith
Дата:
FWIW, I am halfway through doing regex checking of the PG16 source for
all GUC names in messages to see what current styles are in use today.

Not sure if those numbers will influence the decision.

I hope I can post my findings today or tomorrow.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: GUC names in messages

От
Peter Smith
Дата:
On Wed, Nov 8, 2023 at 7:40 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> FWIW, I am halfway through doing regex checking of the PG16 source for
> all GUC names in messages to see what current styles are in use today.
>
> Not sure if those numbers will influence the decision.
>
> I hope I can post my findings today or tomorrow.
>

Here are my findings from the current PG16 source messages.

I used a regex search:
".*GUCNAME

to find how each GUCNAME is used in the messages in *.c files.

The GUC names are taken from the guc_tables.c code, so they are
grouped accordingly below.

~TOTALS:

messages where GUC names are QUOTED:
- bool = 11
- int = 11
- real = 0
- string = 10
- enum = 7
TOTAL = 39

messages where GUC names are NOT QUOTED:
- bool = 14
- int = 60
- real = 0
- string = 59
- enum = 31
TOTAL = 164

~~~

Details are in the attached file. PSA.

I've categorised them as being currently QUOTED, NOT QUOTED, and NONE
(most are not used in any messages).

Notice that NOT QUOTED is the far more common pattern, so my vote
would be just to standardise on making everything this way. I know
there was some concern raised about ambiguous words like "timezone"
and "datestyle" etc but in practice, those are rare. Also, those GUCs
are different in that they are written as camel-case (e.g.
"DateStyle") in the guc_tables.c, so if they were also written
camel-case in the messages that could remove ambiguities with normal
words. YMMV.

Anyway, I will await a verdict about what to do.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

Re: GUC names in messages

От
Peter Smith
Дата:
On Thu, Nov 2, 2023 at 1:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
...
> Our error message style guidelines say not to assemble messages out
> of separate parts, because it makes translation difficult.  Originally
> we applied that rule to GUC names mentioned in messages as well.
> Awhile ago the translation team decided that that made for too many
> duplicative translations, so they'd be willing to compromise on
> substituting GUC names.  That's only been changed in a haphazard
> fashion though, mostly in cases where there actually were duplicative
> messages that could be merged this way.  And there's never been any
> real clarity about whether to quote GUC names, though certainly we're
> more likely to quote anything injected with %s.  So that's why we have
> a mishmash right now.

Right. While looking at all the messages I observed a number of them
having almost the same (but not quite the same) wording:

For example,

errhint("Consider increasing the configuration parameter \"max_wal_size\".")));
errhint("You might need to increase %s.", "max_locks_per_transaction")));
errhint("You might need to increase %s.", "max_pred_locks_per_transaction")));
errmsg("could not find free replication state, increase
max_replication_slots")));
hint ? errhint("You might need to increase %s.", "max_slot_wal_keep_size") : 0);
errhint("You may need to increase max_worker_processes.")));
errhint("Consider increasing configuration parameter
\"max_worker_processes\".")));
errhint("Consider increasing the configuration parameter
\"max_worker_processes\".")));
errhint("You might need to increase %s.", "max_worker_processes")));
errhint("You may need to increase max_worker_processes.")));
errhint("You might need to increase %s.", "max_logical_replication_workers")));

~

The most common pattern there is "You might need to increase %s.".

Here is a patch to modify those other similar variations so they share
that common wording.

PSA.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

Re: GUC names in messages

От
Kyotaro Horiguchi
Дата:
At Thu, 9 Nov 2023 12:55:44 +1100, Peter Smith <smithpb2250@gmail.com> wrote in 
> The most common pattern there is "You might need to increase %s.".
..
> Here is a patch to modify those other similar variations so they share
> that common wording.
> 
> PSA.

I'm uncertain whether the phrases "Consider doing something" and "You
might need to do something" are precisely interchangeable. However,
(for me..)  it appears that either phrase could be applied for all
messages that the patch touches.

In short, I'm fine with the patch.


By the way, I was left scratching my head after seeing the following
message.

>        ereport(PANIC,
>            (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
> -           errmsg("could not find free replication state, increase max_replication_slots")));

Being told to increase max_replication_slots in a PANIC
message feels somewhat off to me. Just looking at the message, it
seems unconvincing to increase "slots" because there is a lack of
"state". So, I poked around in the code and found the following
comment:

> ReplicationOriginShmemSize(void)
> {
> ...
> /*
>  * XXX: max_replication_slots is arguably the wrong thing to use, as here
>  * we keep the replay state of *remote* transactions. But for now it seems
>  * sufficient to reuse it, rather than introduce a separate GUC.
>  */

I haven't read the related code, but if my understanding based on this
comment is correct, wouldn't it mean that a lack of storage space for
the state at the location outputting the message indicates a bug in
the program, not a user configuration error? In other words, isn't
this message something that at least shouldn't be a user-facing
message, and might it be more appropriate to use an Assert instead?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: GUC names in messages

От
Alvaro Herrera
Дата:
On 2023-Nov-09, Peter Smith wrote:

> Notice that NOT QUOTED is the far more common pattern, so my vote
> would be just to standardise on making everything this way. I know
> there was some concern raised about ambiguous words like "timezone"
> and "datestyle" etc but in practice, those are rare. Also, those GUCs
> are different in that they are written as camel-case (e.g.
> "DateStyle") in the guc_tables.c, so if they were also written
> camel-case in the messages that could remove ambiguities with normal
> words. YMMV.

Well, I think camel-casing is also a sufficient differentiator for these
identifiers not being English words.  We'd need to ensure they're always
written that way, when not quoted.  However, in cases where arbitrary
values are expanded, I don't know that they would be expanded that way,
so I would still go for quoting in that case.

There's also a few that are not camel-cased nor have any underscores --
looking at postgresql.conf.sample, we have "port", "bonjour", "ssl",
"fsync", "geqo", "jit", "autovacuum", "xmlbinary", "xmloption".  (We also
have "include", but I doubt that's ever used in an error message).  But
actually, there's more: every reloption is a candidate, and there we
have "fillfactor", "autosummarize", "fastupdate", "buffering".  So if we
want to make generic advice on how to deal with option names in error
messages, I think the wording on conditional quoting I proposed should
go in (adding CamelCase as a reason not to quote), and then we can fix
the code to match.  Looking at your list, I think the changes to make
are not too numerous.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Nadie está tan esclavizado como el que se cree libre no siéndolo" (Goethe)



Re: GUC names in messages

От
Peter Smith
Дата:
On Thu, Nov 9, 2023 at 10:04 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2023-Nov-09, Peter Smith wrote:
>
> > Notice that NOT QUOTED is the far more common pattern, so my vote
> > would be just to standardise on making everything this way. I know
> > there was some concern raised about ambiguous words like "timezone"
> > and "datestyle" etc but in practice, those are rare. Also, those GUCs
> > are different in that they are written as camel-case (e.g.
> > "DateStyle") in the guc_tables.c, so if they were also written
> > camel-case in the messages that could remove ambiguities with normal
> > words. YMMV.
>
> Well, I think camel-casing is also a sufficient differentiator for these
> identifiers not being English words.  We'd need to ensure they're always
> written that way, when not quoted.  However, in cases where arbitrary
> values are expanded, I don't know that they would be expanded that way,
> so I would still go for quoting in that case.
>
> There's also a few that are not camel-cased nor have any underscores --
> looking at postgresql.conf.sample, we have "port", "bonjour", "ssl",
> "fsync", "geqo", "jit", "autovacuum", "xmlbinary", "xmloption".  (We also
> have "include", but I doubt that's ever used in an error message).  But
> actually, there's more: every reloption is a candidate, and there we
> have "fillfactor", "autosummarize", "fastupdate", "buffering".  So if we
> want to make generic advice on how to deal with option names in error
> messages, I think the wording on conditional quoting I proposed should
> go in (adding CamelCase as a reason not to quote), and then we can fix
> the code to match.  Looking at your list, I think the changes to make
> are not too numerous.
>

Sorry for my delay in getting back to this thread.

PSA a patch for this work.

There may be some changes I've missed, but hopefully, this is a nudge
in the right direction.

======
Kind Regards,
Peter Smith.
Fujitsu Australia.

Вложения

Re: GUC names in messages

От
Michael Paquier
Дата:
On Thu, Nov 23, 2023 at 06:27:04PM +1100, Peter Smith wrote:
> There may be some changes I've missed, but hopefully, this is a nudge
> in the right direction.

Thanks for spending some time on that.

    <para>
+    In messages containing configuration variable names, do not include quotes
+    when the names are visibly not English natural words, such as when they
+    have underscores or are all-uppercase or have mixed case. Otherwise, quotes
+    must be added.  Do include quotes in a message where an arbitrary variable
+    name is to be expanded.
+   </para>

That seems to describe clearly the consensus reached on the thread
(quotes for GUCs that are single terms, no quotes for names that are
obviously parameters).

In terms of messages that have predictible names, 0002 moves in the
needle in the right direction.  There seem to be more:
src/backend/postmaster/bgworker.c:  errhint("Consider increasing the
configuration parameter \"max_worker_processes\".")));
contrib/pg_prewarm/autoprewarm.c:  errhint("Consider increasing
configuration parameter \"max_worker_processes\".")));

Things like parse_and_validate_value() and set_config_option_ext()
include log strings about GUC and these use quotes.  Could these areas
be made smarter with a routine to check if quotes are applied
automatically when we have a "simple" GUC name, aka I guess made of
only lower-case characters?  This could be done with a islower() on
the string name, for instance.
--
Michael

Вложения

Re: GUC names in messages

От
Alvaro Herrera
Дата:
On 2023-Nov-24, Michael Paquier wrote:

> On Thu, Nov 23, 2023 at 06:27:04PM +1100, Peter Smith wrote:
> > There may be some changes I've missed, but hopefully, this is a nudge
> > in the right direction.
> 
> Thanks for spending some time on that.

+1

>     <para>
> +    In messages containing configuration variable names, do not include quotes
> +    when the names are visibly not English natural words, such as when they
> +    have underscores or are all-uppercase or have mixed case. Otherwise, quotes
> +    must be added.  Do include quotes in a message where an arbitrary variable
> +    name is to be expanded.
> +   </para>
> 
> That seems to describe clearly the consensus reached on the thread
> (quotes for GUCs that are single terms, no quotes for names that are
> obviously parameters).

Yeah, this is pretty much the patch I proposed earlier.

> In terms of messages that have predictible names, 0002 moves in the
> needle in the right direction.  There seem to be more:
> src/backend/postmaster/bgworker.c:  errhint("Consider increasing the
> configuration parameter \"max_worker_processes\".")));
> contrib/pg_prewarm/autoprewarm.c:  errhint("Consider increasing
> configuration parameter \"max_worker_processes\".")));

Yeah.  Also, these could be changed to have the GUC name outside the
message proper, which would reduce the total number of messages.  (But
care must be given to the word "the" there.)

> Things like parse_and_validate_value() and set_config_option_ext()
> include log strings about GUC and these use quotes.  Could these areas
> be made smarter with a routine to check if quotes are applied
> automatically when we have a "simple" GUC name, aka I guess made of
> only lower-case characters?  This could be done with a islower() on
> the string name, for instance.

I think we could leave these improvements for a second round.  They
don't need to hold back the improvement we already have.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"You're _really_ hosed if the person doing the hiring doesn't understand
relational systems: you end up with a whole raft of programmers, none of
whom has had a Date with the clue stick."              (Andrew Sullivan)
https://postgr.es/m/20050809113420.GD2768@phlogiston.dyndns.org



Re: GUC names in messages

От
Michael Paquier
Дата:
On Fri, Nov 24, 2023 at 10:53:40AM +0100, Alvaro Herrera wrote:
> I think we could leave these improvements for a second round.  They
> don't need to hold back the improvement we already have.

Of course, no problem here to do things one step at a time.
--
Michael

Вложения

Re: GUC names in messages

От
Peter Smith
Дата:
On Fri, Nov 24, 2023 at 2:11 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Nov 23, 2023 at 06:27:04PM +1100, Peter Smith wrote:
> > There may be some changes I've missed, but hopefully, this is a nudge
> > in the right direction.
>
> Thanks for spending some time on that.
>
>     <para>
> +    In messages containing configuration variable names, do not include quotes
> +    when the names are visibly not English natural words, such as when they
> +    have underscores or are all-uppercase or have mixed case. Otherwise, quotes
> +    must be added.  Do include quotes in a message where an arbitrary variable
> +    name is to be expanded.
> +   </para>
>
> That seems to describe clearly the consensus reached on the thread
> (quotes for GUCs that are single terms, no quotes for names that are
> obviously parameters).
>
> In terms of messages that have predictible names, 0002 moves in the
> needle in the right direction.  There seem to be more:
> src/backend/postmaster/bgworker.c:  errhint("Consider increasing the
> configuration parameter \"max_worker_processes\".")));
> contrib/pg_prewarm/autoprewarm.c:  errhint("Consider increasing
> configuration parameter \"max_worker_processes\".")));

Done in patch 0002

>
> Things like parse_and_validate_value() and set_config_option_ext()
> include log strings about GUC and these use quotes.  Could these areas
> be made smarter with a routine to check if quotes are applied
> automatically when we have a "simple" GUC name, aka I guess made of
> only lower-case characters?  This could be done with a islower() on
> the string name, for instance.

See what you think of patch 0003

~~

PSA v2 patches.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

Re: GUC names in messages

От
Peter Smith
Дата:
On Fri, Nov 24, 2023 at 8:53 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2023-Nov-24, Michael Paquier wrote:
>
> > On Thu, Nov 23, 2023 at 06:27:04PM +1100, Peter Smith wrote:
> > > There may be some changes I've missed, but hopefully, this is a nudge
> > > in the right direction.
> >
> > Thanks for spending some time on that.
>
> +1
>
> >     <para>
> > +    In messages containing configuration variable names, do not include quotes
> > +    when the names are visibly not English natural words, such as when they
> > +    have underscores or are all-uppercase or have mixed case. Otherwise, quotes
> > +    must be added.  Do include quotes in a message where an arbitrary variable
> > +    name is to be expanded.
> > +   </para>
> >
> > That seems to describe clearly the consensus reached on the thread
> > (quotes for GUCs that are single terms, no quotes for names that are
> > obviously parameters).
>
> Yeah, this is pretty much the patch I proposed earlier.
>
> > In terms of messages that have predictible names, 0002 moves in the
> > needle in the right direction.  There seem to be more:
> > src/backend/postmaster/bgworker.c:  errhint("Consider increasing the
> > configuration parameter \"max_worker_processes\".")));
> > contrib/pg_prewarm/autoprewarm.c:  errhint("Consider increasing
> > configuration parameter \"max_worker_processes\".")));
>
> Yeah.  Also, these could be changed to have the GUC name outside the
> message proper, which would reduce the total number of messages.  (But
> care must be given to the word "the" there.)
>

 I had posted something similar a few posts back [1], but it just
caused more questions unrelated to GUC name quotes so I abandoned that
temporarily.

So for now, I hope this thread can be only about quotes on GUC names,
otherwise, I thought it may become stuck debating dozens of individual
messages. Certainly later, or in another thread, we can revisit all
messages again to try to identify/extract any "common" ones.

> > Things like parse_and_validate_value() and set_config_option_ext()
> > include log strings about GUC and these use quotes.  Could these areas
> > be made smarter with a routine to check if quotes are applied
> > automatically when we have a "simple" GUC name, aka I guess made of
> > only lower-case characters?  This could be done with a islower() on
> > the string name, for instance.
>
> I think we could leave these improvements for a second round.  They
> don't need to hold back the improvement we already have.
>

I tried something for this already but kept it in a separate patch. See v2-0003

======
[1] https://www.postgresql.org/message-id/CAHut%2BPv8VG7fvXzg5PNeQuUhJG17xwCWNpZSUUkN11ArV%3D%3DCdg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: GUC names in messages

От
Michael Paquier
Дата:
On Mon, Nov 27, 2023 at 10:04:35AM +1100, Peter Smith wrote:
> On Fri, Nov 24, 2023 at 8:53 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> Yeah.  Also, these could be changed to have the GUC name outside the
>> message proper, which would reduce the total number of messages.  (But
>> care must be given to the word "the" there.)
>
>  I had posted something similar a few posts back [1], but it just
> caused more questions unrelated to GUC name quotes so I abandoned that
> temporarily.

Yes, I kind of agree to let that out of the picture for the moment.
It would be good to reduce the translation chunks.

> So for now, I hope this thread can be only about quotes on GUC names,
> otherwise, I thought it may become stuck debating dozens of individual
> messages. Certainly later, or in another thread, we can revisit all
> messages again to try to identify/extract any "common" ones.

-HINT:  Perhaps you need a different "datestyle" setting.
+HINT:  Perhaps you need a different DateStyle setting.

Is the change for "datestyle" really required?  It does not betray the
GUC quoting policy added by 0001.

>> I think we could leave these improvements for a second round.  They
>> don't need to hold back the improvement we already have.
>
> I tried something for this already but kept it in a separate patch. See v2-0003

+               if (*p == '_')
+                       underscore = true;

Is there a reason why we don't just use islower() or is that just to
get something entirely local independent?  I am not sure that it needs
to be that complicated.  We should just check that all the characters
are lower-case and apply quotes.
--
Michael

Вложения

Re: GUC names in messages

От
Tom Lane
Дата:
Michael Paquier <michael@paquier.xyz> writes:
> Is there a reason why we don't just use islower() or is that just to
> get something entirely local independent?

islower() and related functions are not to be trusted for this
purpose.  They will certainly give locale-dependent results,
and they might give entirely wrong ones if there's any inconsistency
between the database encoding and what libc thinks the locale is.

            regards, tom lane



Re: GUC names in messages

От
Peter Smith
Дата:
On Mon, Nov 27, 2023 at 12:44 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Nov 27, 2023 at 10:04:35AM +1100, Peter Smith wrote:
> > On Fri, Nov 24, 2023 at 8:53 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> >> Yeah.  Also, these could be changed to have the GUC name outside the
> >> message proper, which would reduce the total number of messages.  (But
> >> care must be given to the word "the" there.)
> >
> >  I had posted something similar a few posts back [1], but it just
> > caused more questions unrelated to GUC name quotes so I abandoned that
> > temporarily.
>
> Yes, I kind of agree to let that out of the picture for the moment.
> It would be good to reduce the translation chunks.
>
> > So for now, I hope this thread can be only about quotes on GUC names,
> > otherwise, I thought it may become stuck debating dozens of individual
> > messages. Certainly later, or in another thread, we can revisit all
> > messages again to try to identify/extract any "common" ones.
>
> -HINT:  Perhaps you need a different "datestyle" setting.
> +HINT:  Perhaps you need a different DateStyle setting.
>
> Is the change for "datestyle" really required?  It does not betray the
> GUC quoting policy added by 0001.
>

TBH, I suspect something fishy about these mixed-case GUCs.

In the documentation and in the guc_tables.c they are all described in
MixedCase (e.g. "DateStyle" instead of "datestyle"), so I felt the
messages should use the same case the documentation, which is why I
changed all the ones you are referring to.

I know the code is doing a case-insensitive hashtable lookup but I
suspect some of the string passing still in the code for those
particular GUCs ought to be using the same mixed case string literal
as in the guc_tables.c. Currently, I have seen a few quirks where the
case is inconsistent with the MixedCase docs. It needs some more
investigation to understand the reason. For example,

2023-11-27 11:03:48.565 AEDT [15303] STATEMENT:  set intervalstyle=123;
ERROR:  invalid value for parameter "intervalstyle": "123"

versus

2023-11-27 11:13:56.018 AEDT [15303] STATEMENT:  set datestyle=123;
ERROR:  invalid value for parameter DateStyle: "123"

> >> I think we could leave these improvements for a second round.  They
> >> don't need to hold back the improvement we already have.
> >
> > I tried something for this already but kept it in a separate patch. See v2-0003
>
> +               if (*p == '_')
> +                       underscore = true;
>
> Is there a reason why we don't just use islower() or is that just to
> get something entirely local independent?  I am not sure that it needs
> to be that complicated.  We should just check that all the characters
> are lower-case and apply quotes.

Thanks for the feedback. Probably I have overcomplicated it. I'll revisit it.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: GUC names in messages[

От
Michael Paquier
Дата:
On Mon, Nov 27, 2023 at 01:41:18PM +1100, Peter Smith wrote:
> TBH, I suspect something fishy about these mixed-case GUCs.
>
> In the documentation and in the guc_tables.c they are all described in
> MixedCase (e.g. "DateStyle" instead of "datestyle"), so I felt the
> messages should use the same case the documentation, which is why I
> changed all the ones you are referring to.

FWIW, I've been tempted for a few years to propose that we should keep
the parsers as they behave now, but format the name of these
parameters in the code and the docs to just be lower-case all the
time.

>> Is there a reason why we don't just use islower() or is that just to
>> get something entirely local independent?  I am not sure that it needs
>> to be that complicated.  We should just check that all the characters
>> are lower-case and apply quotes.
>
> Thanks for the feedback. Probably I have overcomplicated it. I'll revisit it.

The use of a static variable with a fixed size was itching me a bit as
well..  I was wondering if it would be cleaner to use %s%s%s in the
strings adding a note that these are GUC names that may be optionally
quoted, then hide what gets assigned in a macro with a result rather
similar to LSN_FORMAT_ARGS (GUC_FORMAT?).  The routine checking if
quotes should be applied would only need to return a boolean to tell
what to do, and could be hidden in the macro.
--
Michael

Вложения

Re: GUC names in messages

От
Laurenz Albe
Дата:
On Mon, 2023-11-27 at 13:41 +1100, Peter Smith wrote:
> TBH, I suspect something fishy about these mixed-case GUCs.
>
> In the documentation and in the guc_tables.c they are all described in
> MixedCase (e.g. "DateStyle" instead of "datestyle"), so I felt the
> messages should use the same case the documentation, which is why I
> changed all the ones you are referring to.

I agree with that decision; we should use mixed case for these parameters.

Otherwise we might get complaints that the following query does not return
any results:

  SELECT * FROM pg_settings WHERE name = 'timezone';

Yours,
Laurenz Albe



Re: GUC names in messages

От
Tom Lane
Дата:
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> On Mon, 2023-11-27 at 13:41 +1100, Peter Smith wrote:
>> In the documentation and in the guc_tables.c they are all described in
>> MixedCase (e.g. "DateStyle" instead of "datestyle"), so I felt the
>> messages should use the same case the documentation, which is why I
>> changed all the ones you are referring to.

> I agree with that decision; we should use mixed case for these parameters.
> Otherwise we might get complaints that the following query does not return
> any results:
>   SELECT * FROM pg_settings WHERE name = 'timezone';

Yeah.  Like Michael upthread, I've wondered occasionally about changing
these names to all-lower-case.  It'd surely be nicer if we'd done it
like that to begin with.  But I can't convince myself that the ensuing
user pain would be justified.

            regards, tom lane



Re: GUC names in messages

От
Michael Paquier
Дата:
On Mon, Nov 27, 2023 at 01:35:44AM -0500, Tom Lane wrote:
> Laurenz Albe <laurenz.albe@cybertec.at> writes:
> > On Mon, 2023-11-27 at 13:41 +1100, Peter Smith wrote:
>>> In the documentation and in the guc_tables.c they are all described in
>>> MixedCase (e.g. "DateStyle" instead of "datestyle"), so I felt the
>>> messages should use the same case the documentation, which is why I
>>> changed all the ones you are referring to.
>
>> I agree with that decision; we should use mixed case for these parameters.
>> Otherwise we might get complaints that the following query does not return
>> any results:
>>   SELECT * FROM pg_settings WHERE name = 'timezone';

(I'm sure that you mean the opposite.  This query does not return any
results on HEAD, but it would with "TimeZone".)

> Yeah.  Like Michael upthread, I've wondered occasionally about changing
> these names to all-lower-case.  It'd surely be nicer if we'd done it
> like that to begin with.  But I can't convince myself that the ensuing
> user pain would be justified.

Perhaps not.  I'd like to think that a lot of queries on pg_settings
have the wisdom to apply a lower() or upper(), but that's very
unlikely.

-    errhint("Perhaps you need a different \"datestyle\" setting.")));
+    errhint("Perhaps you need a different DateStyle setting.")));

Saying that, I'd let this one be in 0002.  It causes a log of diff
churn in the tests and quoting it based on Alvaro's suggestion would
still be correct because it's fully lower-case.  (Yeah, I'm perhaps
nit-ing here, so feel free to counter-argue if you prefer what the
patch does.)
--
Michael

Вложения

Re: GUC names in messages

От
Peter Smith
Дата:
Here is patch set v3.

Patches 0001 and 0002 are unchanged from v2.

Patch 0003 now uses a "%s%s%s" format specifier with GUC_FORMAT macro
in guc.c, as recently suggested by Michael [1].

~

(Meanwhile, the MixedCase stuff is still an open question, to be
addressed in a later patch version)

======
[1] https://www.postgresql.org/message-id/ZWQVxu8zWIx64V7l%40paquier.xyz

Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

Re: GUC names in messages

От
Laurenz Albe
Дата:
On Tue, 2023-11-28 at 07:53 +0900, Michael Paquier wrote:
> On Mon, Nov 27, 2023 at 01:35:44AM -0500, Tom Lane wrote:
> > Laurenz Albe <laurenz.albe@cybertec.at> writes:
> > > On Mon, 2023-11-27 at 13:41 +1100, Peter Smith wrote:
> > > > In the documentation and in the guc_tables.c they are all described in
> > > > MixedCase (e.g. "DateStyle" instead of "datestyle"), so I felt the
> > > > messages should use the same case the documentation, which is why I
> > > > changed all the ones you are referring to.
> >
> > > I agree with that decision; we should use mixed case for these parameters.
> > > Otherwise we might get complaints that the following query does not return
> > > any results:
> > >   SELECT * FROM pg_settings WHERE name = 'timezone';
>
> (I'm sure that you mean the opposite.  This query does not return any
> results on HEAD, but it would with "TimeZone".)

No, I meant it just like I said.  If all messages suggest that the parameter
is called "timezone", and not "TimeZone" (because we convert the name to lower
case), then it is surprising that the above query does not return results.

It would be better to call the parameter "TimeZone" everywhere.

(It would be best to convert the parameter to lower case, but I am worried
about the compatibility-pain-to-benefit ratio.)

Yours,
Laurenz Albe



Re: GUC names in messages

От
Michael Paquier
Дата:
On Tue, Nov 28, 2023 at 11:54:33AM +1100, Peter Smith wrote:
> Here is patch set v3.
>
> Patches 0001 and 0002 are unchanged from v2.

After some grepping, I've noticed that 0002 had a mistake with
track_commit_timestamp: some alternate output of modules/commit_ts/
was not updated.  meson was able to reproduce the failure as well.

I am not sure regarding what we should do a mixed cases as well, so I
have discarded DateStyle for now, and applied the rest.

Also applied 0001 from Alvaro.

> Patch 0003 now uses a "%s%s%s" format specifier with GUC_FORMAT macro
> in guc.c, as recently suggested by Michael [1].

I cannot think about a better idea as these strings need to be
translated so they need three %s.

+        if (*p == '_')
+            underscore = true;
+        else if ('a' <= *p && *p <= 'z')
+            lowercase = true;

An issue with this code is that it would forget to quote GUCs that use
dots, like the ones from an extension.  I don't really see why we
cannot just make the macro return true only if all the characters of a
GUC name is made of lower-case alpha characters?

With an extra indentation applied, I finish with the attached for
0003.
--
Michael

Вложения

Re: GUC names in messages

От
Kyotaro Horiguchi
Дата:
At Thu, 30 Nov 2023 14:59:21 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> > Patch 0003 now uses a "%s%s%s" format specifier with GUC_FORMAT macro
> > in guc.c, as recently suggested by Michael [1].
> 
> I cannot think about a better idea as these strings need to be
> translated so they need three %s.


In this patch, the quotation marks cannot be changed from double
quotes.

After a brief review of the use of quotation marks in various
languages, it's observed that French uses guillemets (« »), German
uses lower qutation marks („ “), Spanish uses angular quotation marks
(« ») or alternatively, lower quotetaion marks. Japanese commonly uses
corner brackets (「」), but can also adopt double or single quotation
marks in certain contexts. I took a look at the backend's fr.po file
for a trial, and it indeed seems that guillemets are being used.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: GUC names in messages

От
Michael Paquier
Дата:
On Thu, Nov 30, 2023 at 03:29:49PM +0900, Kyotaro Horiguchi wrote:
> In this patch, the quotation marks cannot be changed from double
> quotes.

Indeed, that's a good point.  I completely forgot about that.
--
Michael

Вложения

Re: GUC names in messages

От
Peter Smith
Дата:
On Thu, Nov 30, 2023 at 4:59 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Nov 28, 2023 at 11:54:33AM +1100, Peter Smith wrote:
> > Here is patch set v3.
> >
> > Patches 0001 and 0002 are unchanged from v2.
>
> After some grepping, I've noticed that 0002 had a mistake with
> track_commit_timestamp: some alternate output of modules/commit_ts/
> was not updated.  meson was able to reproduce the failure as well.
>
> I am not sure regarding what we should do a mixed cases as well, so I
> have discarded DateStyle for now, and applied the rest.
>
> Also applied 0001 from Alvaro.
>

Thanks for pushing those parts.

> > Patch 0003 now uses a "%s%s%s" format specifier with GUC_FORMAT macro
> > in guc.c, as recently suggested by Michael [1].
>
> I cannot think about a better idea as these strings need to be
> translated so they need three %s.
>
> +               if (*p == '_')
> +                       underscore = true;
> +               else if ('a' <= *p && *p <= 'z')
> +                       lowercase = true;
>
> An issue with this code is that it would forget to quote GUCs that use
> dots, like the ones from an extension.  I don't really see why we
> cannot just make the macro return true only if all the characters of a
> GUC name is made of lower-case alpha characters?

Not forgotten. I felt the dot separator in such names might be
mistaken for a period in a sentence which is why I left quotes for
those ones. YMMV.

======
Kind Regards,
Peter. Smith.
Fujitsu Australia



Re: GUC names in messages

От
Alvaro Herrera
Дата:
> +/*
> + * Return whether the GUC name should be enclosed in double-quotes.
> + *
> + * Quoting is intended for names which could be mistaken for normal English
> + * words.  Quotes are only applied to GUC names that are written entirely with
> + * lower-case alphabetical characters.
> + */
> +static bool
> +quotes_needed_for_GUC_name(const char *name)
> +{
> +    for (const char *p = name; *p; p++)
> +    {
> +        if ('a' > *p || *p > 'z')
> +            return false;
> +    }
> +
> +    return true;
> +}

I think you need a function that the name possibly quoted, in a way that
lets the translator handle the quoting:

  static char buffer[SOMEMAXLEN];

  quotes_needed = ...;

  if (quotes_needed)
     /* translator: a quoted configuration parameter name */
     snprintf(buffer, _("\"%s\""), name);
     return buffer
  else
     /* no translation needed in this case */
     return name;

then the calling code just does a single %s that prints the string
returned by this function.  (Do note that the function is not reentrant,
like pg_dump's fmtId.  Shouldn't be a problem ...)

> @@ -3621,8 +3673,8 @@ set_config_option_ext(const char *name, const char *value,
>      {
>          if (changeVal && !makeDefault)
>          {
> -            elog(DEBUG3, "\"%s\": setting ignored because previous source is higher priority",
> -                 name);
> +            elog(DEBUG3, "%s%s%s: setting ignored because previous source is higher priority",
> +                 GUC_FORMAT(name));

Note that elog() doesn't do translation, and DEBUG doesn't really need
to worry too much about style anyway.  I'd leave these as-is.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)



Re: GUC names in messages

От
Peter Eisentraut
Дата:
On 30.11.23 06:59, Michael Paquier wrote:
>               ereport(elevel,
>                       (errcode(ERRCODE_UNDEFINED_OBJECT),
> -                     errmsg("unrecognized configuration parameter \"%s\" in file \"%s\" line %d",
> -                            item->name,
> +            /* translator: %s%s%s is for an optionally quoted GUC name */
> +                     errmsg("unrecognized configuration parameter %s%s%s in file \"%s\" line %d",
> +                            GUC_FORMAT(item->name),
>                               item->filename, item->sourceline)));

I think this is completely over-engineered and wrong.  If we start down 
this road, then the next person is going to start engineering some rules 
by which we should quote file names and other things.  Which will lead 
to more confusion, not less.  The whole point of this quoting thing is 
that you do it all the time or not, not dynamically based on what's 
inside of it.

The original version of this string (and similar ones) seems the most 
correct, simple, and useful one to me.




Re: GUC names in messages

От
Peter Smith
Дата:
On Fri, Dec 1, 2023 at 7:38 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 30.11.23 06:59, Michael Paquier wrote:
> >                       ereport(elevel,
> >                                       (errcode(ERRCODE_UNDEFINED_OBJECT),
> > -                                      errmsg("unrecognized configuration parameter \"%s\" in file \"%s\" line %d",
> > -                                                     item->name,
> > +                     /* translator: %s%s%s is for an optionally quoted GUC name */
> > +                                      errmsg("unrecognized configuration parameter %s%s%s in file \"%s\" line %d",
> > +                                                     GUC_FORMAT(item->name),
> >                                                       item->filename, item->sourceline)));
>
> I think this is completely over-engineered and wrong.  If we start down
> this road, then the next person is going to start engineering some rules
> by which we should quote file names and other things.  Which will lead
> to more confusion, not less.  The whole point of this quoting thing is
> that you do it all the time or not, not dynamically based on what's
> inside of it.
>
> The original version of this string (and similar ones) seems the most
> correct, simple, and useful one to me.
>

Yeah, trying to manipulate the quoting dynamically seems like it was
an overreach...

Removing that still leaves some other changes needed to "fix" the
messages using MixedCase GUCs.


PSA v4

======
Details:

Patch 0001 -- "datestyle" becomes DateStyle in messages
Rebased this again, which was part of an earlier patch set
- I think any GUC names documented as MixedCase should keep that same
case in messages; this also obeys the guidelines recently pushed [1].
- Some others agreed, expecting the exact GUC name (in the message)
can be found in pg_settings [2].
- OTOH, Michael didn't like the diff churn [3] caused by this patch.

~~~

Patch 0002 -- use mixed case for intervalstyle error message
I found that the GUC name substituted to the error message was coming
from the statement, not from the original name in the guc_tables, so
there was a case mismatch:

BEFORE Patch 0002 (see the lowercase in the error message)
2023-12-08 13:21:32.897 AEDT [32609] STATEMENT:  set intervalstyle = 1234;
ERROR:  invalid value for parameter "intervalstyle": "1234"
HINT:  Available values: postgres, postgres_verbose, sql_standard, iso_8601.

AFTER Patch 0002
2023-12-08 13:38:48.638 AEDT [29684] STATEMENT:  set intervalstyle = 1234;
ERROR:  invalid value for parameter "IntervalStyle": "1234"
HINT:  Available values: postgres, postgres_verbose, sql_standard, iso_8601.

======
[1] GUC quoting guidelines -
https://github.com/postgres/postgres/commit/a243569bf65c5664436e8f63d870b7ee9c014dcb
[2] The case should match pg_settings -
https://www.postgresql.org/message-id/db3e4290ced77111c17e7a2adfb1d660734f5f78.camel%40cybertec.at
[3] Dislike of diff churn -
https://www.postgresql.org/message-id/ZWUd8dYYA9v83KvI%40paquier.xyz

Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

Re: GUC names in messages

От
Alvaro Herrera
Дата:
On 2023-Dec-08, Peter Smith wrote:

> Patch 0001 -- "datestyle" becomes DateStyle in messages
> Rebased this again, which was part of an earlier patch set
> - I think any GUC names documented as MixedCase should keep that same
> case in messages; this also obeys the guidelines recently pushed [1].

I agree.

> Patch 0002 -- use mixed case for intervalstyle error message
> I found that the GUC name substituted to the error message was coming
> from the statement, not from the original name in the guc_tables, so
> there was a case mismatch:

I agree.  Let's also add a test that shows this difference (my 0002
here).

I'm annoyed that this saga has transiently created a few untranslated
strings by removing unnecessary quotes but failing to move the variable
names outside the translatable part of the string.  I change a few of
those in 0003 -- mostly the ones in strings already touched by commit
8d9978a7176a, but also a few others.  Didn't go out of my way to grep
for other possible messages to fix, though.  (I feel like this is
missing some "translator:" comments.)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"We’ve narrowed the problem down to the customer’s pants being in a situation
 of vigorous combustion" (Robert Haas, Postgres expert extraordinaire)

Вложения

Re: GUC names in messages

От
Peter Eisentraut
Дата:
On 08.12.23 05:10, Peter Smith wrote:
> Patch 0001 -- "datestyle" becomes DateStyle in messages
> Rebased this again, which was part of an earlier patch set
> - I think any GUC names documented as MixedCase should keep that same
> case in messages; this also obeys the guidelines recently pushed [1].
> - Some others agreed, expecting the exact GUC name (in the message)
> can be found in pg_settings [2].
> - OTOH, Michael didn't like the diff churn [3] caused by this patch.

I'm fine with adjusting the mixed-case stuff, but intuitively, I don't 
think removing the quotes in this is an improvement:

- GUC_check_errdetail("Conflicting \"datestyle\" specifications.");
+ GUC_check_errdetail("Conflicting DateStyle specifications.");




Re: GUC names in messages

От
Peter Smith
Дата:
On Sat, Dec 9, 2023 at 1:48 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 08.12.23 05:10, Peter Smith wrote:
> > Patch 0001 -- "datestyle" becomes DateStyle in messages
> > Rebased this again, which was part of an earlier patch set
> > - I think any GUC names documented as MixedCase should keep that same
> > case in messages; this also obeys the guidelines recently pushed [1].
> > - Some others agreed, expecting the exact GUC name (in the message)
> > can be found in pg_settings [2].
> > - OTOH, Michael didn't like the diff churn [3] caused by this patch.
>
> I'm fine with adjusting the mixed-case stuff, but intuitively, I don't
> think removing the quotes in this is an improvement:
>
> - GUC_check_errdetail("Conflicting \"datestyle\" specifications.");
> + GUC_check_errdetail("Conflicting DateStyle specifications.");
>

My original intention of this thread was only to document the GUC name
quoting guidelines and then apply those consistently in the code.

I'm happy either way for the MixedCase names to be quoted or not
quoted, whatever is the consensus.

If the rule is changed to quote those MixedCase GUCs then the docs
will require minor tweaking

CURRENT
   <para>
    In messages containing configuration variable names, do not include quotes
    when the names are visibly not natural English words, such as when they
    have underscores, are all-uppercase or have mixed case. Otherwise, quotes
    must be added. Do include quotes in a message where an arbitrary variable
    name is to be expanded.
   </para>

"are all-uppercase or have mixed case." --> "or are all-uppercase."

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: GUC names in messages

От
Peter Smith
Дата:
This v5* looks good to me, except it will need some further
modification if PeterE's suggestion [1] to keep quotes for the
MixedCase GUCs is adopted.

======
[1] https://www.postgresql.org/message-id/9e7802b2-2cf2-4c2d-b680-b2ccb9db1d2f%40eisentraut.org

Kind Regards,
Peter Smith.
Futjisu Australia.



Re: GUC names in messages

От
Michael Paquier
Дата:
On Mon, Dec 11, 2023 at 10:14:11AM +1100, Peter Smith wrote:
> This v5* looks good to me, except it will need some further
> modification if PeterE's suggestion [1] to keep quotes for the
> MixedCase GUCs is adopted.

-                errdetail("The database cluster was initialized with CATALOG_VERSION_NO %d,"
-                          " but the server was compiled with CATALOG_VERSION_NO %d.",
-                          ControlFile->catalog_version_no, CATALOG_VERSION_NO),
+                /*- translator: %s is a variable name and %d is its value */
+                errdetail("The database cluster was initialized with %s %d,"
+                          " but the server was compiled with %s %d.",
+                          "CATALOG_VERSION_NO",

Good point.  There are a lot of strings that can be shaved from the
translations here.

src/backend/access/transam/xlog.c:               errdetail("The database cluster was initialized with
PG_CONTROL_VERSION%d (0x%08x),"
 
src/backend/access/transam/xlog.c:               errdetail("The database cluster was initialized with
PG_CONTROL_VERSION%d,"
 
src/backend/access/transam/xlog.c:               errdetail("The database cluster was initialized without
USE_FLOAT8_BYVAL"
src/backend/access/transam/xlog.c:               errdetail("The database cluster was initialized with
USE_FLOAT8_BYVAL"

I think that you should apply the same conversion for these ones.
There is no gain with the 1st and 3rd ones, but the 2nd and 4th one
can be grouped together.

FWIW, if we don't convert MixedCase GUCs to become mixedcase, I don't
think that there is any need to apply quotes to them because they
don't really look like natural English words.  That's as far as my
opinion goes, so feel free to ignore me if the consensus is different.
--
Michael

Вложения

Re: GUC names in messages

От
Peter Eisentraut
Дата:
On 11.12.23 00:07, Peter Smith wrote:
> If the rule is changed to quote those MixedCase GUCs then the docs
> will require minor tweaking
> 
> CURRENT
>     <para>
>      In messages containing configuration variable names, do not include quotes
>      when the names are visibly not natural English words, such as when they
>      have underscores, are all-uppercase or have mixed case. Otherwise, quotes
>      must be added. Do include quotes in a message where an arbitrary variable
>      name is to be expanded.
>     </para>
> 
> "are all-uppercase or have mixed case." --> "or are all-uppercase."

After these discussions, I think this rule change was not a good idea. 
It effectively enforces these kinds of inconsistencies.  For example, if 
you ever refactored

     "DateStyle is wrong"

to

     "%s is wrong"

you'd need to adjust the quotes, and thus user-visible behavior, for 
entirely internal reasons.  This is not good.  And then came the idea to 
determine the quoting dynamically, which I think everyone agreed was too 
much.  So I don't see a way to make this work well.




Re: GUC names in messages

От
Michael Paquier
Дата:
On Thu, Dec 14, 2023 at 09:38:40AM +0100, Peter Eisentraut wrote:
> After these discussions, I think this rule change was not a good idea. It
> effectively enforces these kinds of inconsistencies.  For example, if you
> ever refactored
>
>     "DateStyle is wrong"
>
> to
>
>     "%s is wrong"
>
> you'd need to adjust the quotes, and thus user-visible behavior, for
> entirely internal reasons.  This is not good.

So, what are you suggesting?  Should the encouraged rule be removed
from the docs?  Or do you object to some of the changes done in the
latest patch series v5?

FWIW, I am a bit meh with v5-0001, because I don't see the benefits.
On the contrary v5-0003 is useful, because it reduces a bit the number
of strings to translate.  That's always good to take.  I don't have a
problem with v5-0002, either, where we'd begin using the name of the
GUC as stored in the static tables rather than the name provided in
the SET query, particularly for the reason that it makes the GUC name
a bit more consistent even when using double-quotes around the
parameter name in the query, where the error messages would not force
a lower-case conversion.  The patch would, AFAIU, change HEAD from
that:
=# SET "intervalstylE" to popo;
ERROR:  22023: invalid value for parameter "intervalstylE": "popo"
To that:
=# SET "intervalstylE" to popo;
ERROR:  22023: invalid value for parameter "IntervalStyle": "popo"

> And then came the idea to
> determine the quoting dynamically, which I think everyone agreed was too
> much.  So I don't see a way to make this work well.

Yeah, with the quotes being language-dependent, any idea I can think
of is as good as unreliable and dead.
--
Michael

Вложения

Re: GUC names in messages

От
Peter Smith
Дата:
Hi,

This thread seems to be a bit stuck, so I thought I would try to
summarize my understanding to hopefully get it moving again...

The original intent of the thread was just to introduce some
guidelines for quoting or not quoting GUC names in messages because
previously it seemed quite ad-hoc. Along the way, there was some scope
creep. IIUC, now there are 3 main topics in this thread:

1. GUC name quoting
2. GUC name case
3. Using common messages

======

#1. GUC name quoting.

Some basic guidelines were decided and a patch is already pushed [1].

   <para>
    In messages containing configuration variable names, do not include quotes
    when the names are visibly not natural English words, such as when they
    have underscores, are all-uppercase or have mixed case. Otherwise, quotes
    must be added. Do include quotes in a message where an arbitrary variable
    name is to be expanded.
   </para>

AFAIK there is nothing controversial there, although maybe the
guideline for 'mixed case' needs revisiting depending on objections
about point #2.

~~~

#2. GUC name case.

GUC names defined in guc_tables.c are either lowercase (xxxx),
lowercase with underscores (xxxx_yyyy) or mixed case (XxxxYyyy).

There are only a few examples of mixed case. They are a bit
problematic, but IIUC they are not going to change away so we need to
deal with them:
- TimeZone
- DateStyle
- IntervalStyle

It was proposed (e.g. [2]) that it would be better/intuitive if the
GUC name of the error message would be the same case as in the
guc_table.c. In other words, other words you should be able to find
the same name from the message in pg_settings.

So mesages with "datestyle" should become DateStyle because:
SELECT * FROM pg_settings WHERE name = 'DateStyle'; ==> found
SELECT * FROM pg_settings WHERE name = 'datestlye'; ==> not found

That latest v5 patches make those adjustments
- Patch v5-0001 fixes case for DateStyle. Yeah, there is some diff
churn because there are a lot of DateStyle tests, but IMO that's too
bad.
- Patch v5-0002 fixed case for IntervalStyle.

~~~

#3. Using common messages

Any message with a non-translatable component to them (e.g. the GUC
name) can be restructured in a way so there is a common translatable
errmsg part with the non-translatable parameters substituted.

e.g.
- GUC_check_errdetail("The only allowed value is \"immediate\".");
+ GUC_check_errdetail("The only allowed value is \"%s\".", "immediate");

AFAIK think there is no disagreement that this is a good idea,
although IMO it deserved to be in a separate thread.

I think there will be many messages that qualify to be modified, and
probably there will be some discussion about whether certain common
messages that can be merged -- (e.g. Is "You might need to increase
%s." same as "Consider increasing %s." or not?).

Anyway, this part is a WIP. Currently, patch v5-0003 makes a start for
this task.

//////

I think patches v5-0002, v5-0003 are uncontroversial.

So the sticking point seems to be the MixedCase GUC (e.g. patch
v5-0001). I agree, that the churn is not ideal (it's only because
there are lots of DateStyle messages in test output), but OTOH that's
just what happens if a rule is applied when previously there were no
rules.

Also, PeterE wrote [4]

> On Thu, Dec 14, 2023 at 09:38:40AM +0100, Peter Eisentraut wrote:
> > After these discussions, I think this rule change was not a good idea. It
> > effectively enforces these kinds of inconsistencies.  For example, if you
> > ever refactored
> >
> >     "DateStyle is wrong"
> >
> > to
> >
> >     "%s is wrong"
> >
> > you'd need to adjust the quotes, and thus user-visible behavior, for
> > entirely internal reasons.  This is not good.
>

I didn't understand the problem. By the current guidelines the mixed
case GUC won't quoted in the message (see patch v5-0001)

So whether it is:
errmsg("DateStyle is wrong"), OR
errmsg("%s is wrong", "DateStyle")

where is the "you'd need to adjust the quotes" problem there?

======
[1] GUC quoting guidelines --
https://www.postgresql.org/docs/devel/error-style-guide.html
[2] Case in messages should be same as pg_settings --
https://www.postgresql.org/message-id/db3e4290ced77111c17e7a2adfb1d660734f5f78.camel%40cybertec.at
[3] v5 patches --
https://www.postgresql.org/message-id/202312081255.wlsfmhe2sri7%40alvherre.pgsql
[4] PeterE concerna about DateStyle --
https://www.postgresql.org/message-id/6d66eb1a-290d-4aaa-972a-0a06a1af02af%40eisentraut.org

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: GUC names in messages

От
Peter Eisentraut
Дата:
On 21.12.23 07:24, Peter Smith wrote:
> #1. GUC name quoting.
> 
> Some basic guidelines were decided and a patch is already pushed [1].
> 
>     <para>
>      In messages containing configuration variable names, do not include quotes
>      when the names are visibly not natural English words, such as when they
>      have underscores, are all-uppercase or have mixed case. Otherwise, quotes
>      must be added. Do include quotes in a message where an arbitrary variable
>      name is to be expanded.
>     </para>
> 
> AFAIK there is nothing controversial there, although maybe the
> guideline for 'mixed case' needs revisiting depending on objections
> about point #2.

Now that I read this again, I think this is wrong.

We should decide the quoting for a category, not the actual content. 
Like, quote all file names; do not quote keywords.

This led to the attempted patch to decide the quoting of GUC parameter 
names dynamically based on the actual content, which no one really 
liked.  But then, to preserve consistency, we also need to be uniform in 
quoting GUC parameter names where the name is hardcoded.




Re: GUC names in messages

От
Peter Smith
Дата:
On Fri, Dec 22, 2023 at 12:24 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 21.12.23 07:24, Peter Smith wrote:
> > #1. GUC name quoting.
> >
> > Some basic guidelines were decided and a patch is already pushed [1].
> >
> >     <para>
> >      In messages containing configuration variable names, do not include quotes
> >      when the names are visibly not natural English words, such as when they
> >      have underscores, are all-uppercase or have mixed case. Otherwise, quotes
> >      must be added. Do include quotes in a message where an arbitrary variable
> >      name is to be expanded.
> >     </para>
> >
> > AFAIK there is nothing controversial there, although maybe the
> > guideline for 'mixed case' needs revisiting depending on objections
> > about point #2.
>
> Now that I read this again, I think this is wrong.
>
> We should decide the quoting for a category, not the actual content.
> Like, quote all file names; do not quote keywords.
>
> This led to the attempted patch to decide the quoting of GUC parameter
> names dynamically based on the actual content, which no one really
> liked.  But then, to preserve consistency, we also need to be uniform in
> quoting GUC parameter names where the name is hardcoded.
>

I agree. By attempting to define when to and when not to use quotes it
has become overcomplicated.

Earlier in the thread, I counted how quotes were used in the existing
messages [5]; there were ~39 quoted and 164 not quoted. Based on that
we chose to stay with the majority, and leave all the unquoted ones so
only adding quotes "when necessary". In hindsight, that was probably
the wrong choice because it opened a can of worms about what "when
necessary" even means (e.g. what about underscores, mixed case etc).

Certainly one simple rule "just quote everything" is easiest to follow.

~~~

OPTION#1. DO quote hardcoded GUC names everywhere
- pro: consistent with the dynamic names, which are always quoted
- pro: no risk of mistaking GUC names for normal words in the message
- con: more patch changes than not quoting

Laurenz [2] "My personal preference is to always quote GUC names"
Nathan [3][4] "І'd vote for quoting all GUC names, if for no other
reason than "visibly not English natural words" feels a bit open to
interpretation."
PeterE [6] "... to preserve consistency, we also need to be uniform in
quoting GUC parameter names where the name is hardcoded."

~

OPTION#2. DO NOT quote hardcoded GUC names anywhere
- pro: less patch changes than quoting everything
- con: not consistent with the dynamic names, which are always quoted
- con: risk of mistaking GUC names for normal words in the message

PeterE, originally [1] said "I'm leaning toward not quoting GUC
names", but IIUC changed his opinion in [6].

~~~

Given the above, I've updated the v6 patch set to just *always* quote
GUC names. The docs are also re-written.

======
[1] https://www.postgresql.org/message-id/22998fc0-93c2-48d2-b0f9-361cd5764695%40eisentraut.org
[2] https://www.postgresql.org/message-id/4b83f9888428925e3049e24b60a73f4b94dc2368.camel%40cybertec.at
[3] https://www.postgresql.org/message-id/20231102015239.GA82553%40nathanxps13
[4] https://www.postgresql.org/message-id/20231107145821.GA779199%40nathanxps13
[5] https://www.postgresql.org/message-id/CAHut%2BPtqTao%2BOKRxGcCzUxt9h9d0%3DTQZZoRjMYe3xe0-O7_hsQ%40mail.gmail.com
[6] https://www.postgresql.org/message-id/1704b2cf-2444-484a-a7a4-2ba79f72951d%40eisentraut.org

Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

Re: GUC names in messages

От
Peter Eisentraut
Дата:
On 04.01.24 07:53, Peter Smith wrote:
>> Now that I read this again, I think this is wrong.
>>
>> We should decide the quoting for a category, not the actual content.
>> Like, quote all file names; do not quote keywords.
>>
>> This led to the attempted patch to decide the quoting of GUC parameter
>> names dynamically based on the actual content, which no one really
>> liked.  But then, to preserve consistency, we also need to be uniform in
>> quoting GUC parameter names where the name is hardcoded.
>>
> 
> I agree. By attempting to define when to and when not to use quotes it
> has become overcomplicated.
> 
> Earlier in the thread, I counted how quotes were used in the existing
> messages [5]; there were ~39 quoted and 164 not quoted. Based on that
> we chose to stay with the majority, and leave all the unquoted ones so
> only adding quotes "when necessary". In hindsight, that was probably
> the wrong choice because it opened a can of worms about what "when
> necessary" even means (e.g. what about underscores, mixed case etc).
> 
> Certainly one simple rule "just quote everything" is easiest to follow.

I've been going through the translation updates for PG17 these days and 
was led back around to this issue.  It seems we left it in an 
intermediate state that no one was really happy with and which is 
arguably as inconsistent or more so than before.

I think we should accept your two patches

v6-0001-GUC-names-docs.patch
v6-0002-GUC-names-add-quotes.patch

which effectively everyone was in favor of and which seem to be the most 
robust and sustainable solution.

(The remaining three patches from the v6 set would be PG18 material at 
this point.)




Re: GUC names in messages

От
Alvaro Herrera
Дата:
On 2024-May-16, Peter Eisentraut wrote:

> I think we should accept your two patches
> 
> v6-0001-GUC-names-docs.patch
> v6-0002-GUC-names-add-quotes.patch
> 
> which effectively everyone was in favor of and which seem to be the most
> robust and sustainable solution.

I think we should also take patch 0005 in pg17, which reduces the number
of strings to translate.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"XML!" Exclaimed C++.  "What are you doing here? You're not a programming
language."
"Tell that to the people who use me," said XML.
https://burningbird.net/the-parable-of-the-languages/



Re: GUC names in messages

От
Daniel Gustafsson
Дата:
> On 16 May 2024, at 13:35, Peter Eisentraut <peter@eisentraut.org> wrote:

> I think we should accept your two patches

I agree with this.

> v6-0001-GUC-names-docs.patch

+1

> v6-0002-GUC-names-add-quotes.patch

- errmsg("WAL generated with full_page_writes=off was replayed "
+ errmsg("WAL generated with \"full_page_writes=off\" was replayed "

I'm not a fan of this syntax, but I at the same time can't offer a better idea
so this isn't an objection but a hope that it can be made even better during
the v18 cycle.

--
Daniel Gustafsson




Re: GUC names in messages

От
Daniel Gustafsson
Дата:
> On 16 May 2024, at 13:56, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> I think we should also take patch 0005 in pg17, which reduces the number
> of strings to translate.

Agreed, lessening the burden on translators is always a good idea.

--
Daniel Gustafsson




Re: GUC names in messages

От
Tom Lane
Дата:
Daniel Gustafsson <daniel@yesql.se> writes:
> On 16 May 2024, at 13:35, Peter Eisentraut <peter@eisentraut.org> wrote:
>> - errmsg("WAL generated with full_page_writes=off was replayed "
>> + errmsg("WAL generated with \"full_page_writes=off\" was replayed "

> I'm not a fan of this syntax, but I at the same time can't offer a better idea
> so this isn't an objection but a hope that it can be made even better during
> the v18 cycle.

Yeah ... formally correct would be something like

    errmsg("WAL generated with \"full_page_writes\"=\"off\" was replayed "

but that's a bit much for my taste.

            regards, tom lane



Re: GUC names in messages

От
Peter Smith
Дата:
On Thu, May 16, 2024 at 9:35 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 04.01.24 07:53, Peter Smith wrote:
> >> Now that I read this again, I think this is wrong.
> >>
> >> We should decide the quoting for a category, not the actual content.
> >> Like, quote all file names; do not quote keywords.
> >>
> >> This led to the attempted patch to decide the quoting of GUC parameter
> >> names dynamically based on the actual content, which no one really
> >> liked.  But then, to preserve consistency, we also need to be uniform in
> >> quoting GUC parameter names where the name is hardcoded.
> >>
> >
> > I agree. By attempting to define when to and when not to use quotes it
> > has become overcomplicated.
> >
> > Earlier in the thread, I counted how quotes were used in the existing
> > messages [5]; there were ~39 quoted and 164 not quoted. Based on that
> > we chose to stay with the majority, and leave all the unquoted ones so
> > only adding quotes "when necessary". In hindsight, that was probably
> > the wrong choice because it opened a can of worms about what "when
> > necessary" even means (e.g. what about underscores, mixed case etc).
> >
> > Certainly one simple rule "just quote everything" is easiest to follow.
>
> I've been going through the translation updates for PG17 these days and
> was led back around to this issue.  It seems we left it in an
> intermediate state that no one was really happy with and which is
> arguably as inconsistent or more so than before.
>
> I think we should accept your two patches
>
> v6-0001-GUC-names-docs.patch
> v6-0002-GUC-names-add-quotes.patch
>
> which effectively everyone was in favor of and which seem to be the most
> robust and sustainable solution.
>
> (The remaining three patches from the v6 set would be PG18 material at
> this point.)

Thanks very much for taking an interest in resurrecting this thread.

It was always my intention to come back to this when the dust had
settled on PG17. But it would be even better if the docs for the rule
"just quote everything", and anything else you deem acceptable, can be
pushed sooner.

Of course, there will still be plenty more to do for PG18, including
locating examples in newly pushed code for messages that have slipped
through the cracks during the last few months using different formats,
and other improvements, but those tasks should become easier if we can
get some of these v6 patches out of the way first.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: GUC names in messages

От
Peter Eisentraut
Дата:
On 17.05.24 05:31, Peter Smith wrote:
>> I think we should accept your two patches
>>
>> v6-0001-GUC-names-docs.patch
>> v6-0002-GUC-names-add-quotes.patch
>>
>> which effectively everyone was in favor of and which seem to be the most
>> robust and sustainable solution.
>>
>> (The remaining three patches from the v6 set would be PG18 material at
>> this point.)
> Thanks very much for taking an interest in resurrecting this thread.
> 
> It was always my intention to come back to this when the dust had
> settled on PG17. But it would be even better if the docs for the rule
> "just quote everything", and anything else you deem acceptable, can be
> pushed sooner.
> 
> Of course, there will still be plenty more to do for PG18, including
> locating examples in newly pushed code for messages that have slipped
> through the cracks during the last few months using different formats,
> and other improvements, but those tasks should become easier if we can
> get some of these v6 patches out of the way first.

I committed your 0001 and 0002 now, with some small fixes.

There has also been quite a bit of new code, of course, since you posted 
your patches, so we'll probably find a few more things that could use 
adjustment.

I'd be happy to consider the rest of your patch set after beta1 and/or 
for PG18.




Re: GUC names in messages

От
Peter Smith
Дата:
On Fri, May 17, 2024 at 9:57 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 17.05.24 05:31, Peter Smith wrote:
> >> I think we should accept your two patches
> >>
> >> v6-0001-GUC-names-docs.patch
> >> v6-0002-GUC-names-add-quotes.patch
> >>
> >> which effectively everyone was in favor of and which seem to be the most
> >> robust and sustainable solution.
> >>
> >> (The remaining three patches from the v6 set would be PG18 material at
> >> this point.)
> > Thanks very much for taking an interest in resurrecting this thread.
> >
> > It was always my intention to come back to this when the dust had
> > settled on PG17. But it would be even better if the docs for the rule
> > "just quote everything", and anything else you deem acceptable, can be
> > pushed sooner.
> >
> > Of course, there will still be plenty more to do for PG18, including
> > locating examples in newly pushed code for messages that have slipped
> > through the cracks during the last few months using different formats,
> > and other improvements, but those tasks should become easier if we can
> > get some of these v6 patches out of the way first.
>
> I committed your 0001 and 0002 now, with some small fixes.
>
> There has also been quite a bit of new code, of course, since you posted
> your patches, so we'll probably find a few more things that could use
> adjustment.
>
> I'd be happy to consider the rest of your patch set after beta1 and/or
> for PG18.
>

Thanks for pushing!

I'll try to dedicate more time to this sometime soon to go through all
the code again to track down those loose ends.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: GUC names in messages

От
Peter Smith
Дата:
On Fri, May 17, 2024 at 9:57 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 17.05.24 05:31, Peter Smith wrote:
> >> I think we should accept your two patches
> >>
> >> v6-0001-GUC-names-docs.patch
> >> v6-0002-GUC-names-add-quotes.patch
> >>
> >> which effectively everyone was in favor of and which seem to be the most
> >> robust and sustainable solution.
> >>
> >> (The remaining three patches from the v6 set would be PG18 material at
> >> this point.)
> > Thanks very much for taking an interest in resurrecting this thread.
> >
> > It was always my intention to come back to this when the dust had
> > settled on PG17. But it would be even better if the docs for the rule
> > "just quote everything", and anything else you deem acceptable, can be
> > pushed sooner.
> >
> > Of course, there will still be plenty more to do for PG18, including
> > locating examples in newly pushed code for messages that have slipped
> > through the cracks during the last few months using different formats,
> > and other improvements, but those tasks should become easier if we can
> > get some of these v6 patches out of the way first.
>
> I committed your 0001 and 0002 now, with some small fixes.
>
> There has also been quite a bit of new code, of course, since you posted
> your patches, so we'll probably find a few more things that could use
> adjustment.
>
> I'd be happy to consider the rest of your patch set after beta1 and/or
> for PG18.

Thanks for pushing some of those v6 patches. Here is the new patch set v7*.

I have used a homegrown script/regex to help identify all the GUC
names that still needed quoting. Many of these occurrences are from
recently pushed code -- i.e. they are more recent than that v6-0002
patch previously pushed [1].

The new GUC quoting patches are separated by different GUC types only
to simplify my processing of them.

v7-0001 = Add quotes for GUCs - bool
v7-0002 = Add quotes for GUCs - int
v7-0003 = Add quotes for GUCs - real
v7-0004 = Add quotes for GUCs - string
v7-0005 = Add quotes for GUCs - enum

The other v7 patches are just carried forward unchanged from v6:

v7-0006 = fix case for IntervalStyle
v7-0007 = fix case for Datestyle
v7-0008 = make common translatable message strings

~~~~

STATUS

Here is the status of these v7* patches,  and remaining works to do:

* AFAIK those first 5 ("Add quotes") patches can be pushed ASAP in
PG17. If anybody finds more GUCs still not quoted then those are
probably somehow accidentally missed by me and should be fixed.

* The remaining 3 patches may wait until PG18.

* The patch 0008 ("make common translatable message strings") may be
OK to be pushed as-is. OTOH, this is the tip of another iceberg so I
expect if we look harder there will be many many more candidates to
turn into common messages. There may also be examples where  'similar'
messages can use identical common text, but those will require more
discussion/debate case-by-case

* Another remaining task is to check current usage and improve the
consistency of how some of the GUC values have been quoted. Refer to
mail from Kyotaro-san [2] for examples of this.

======
[1] v6-0001,0002 were already pushed.
https://www.postgresql.org/message-id/55ab714f-86e3-41a3-a1d2-a96a115db8bd%40eisentraut.org

[2] https://www.postgresql.org/message-id/20240520.165613.189183526936651938.horikyota.ntt%40gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения