Обсуждение: Inconsistent printf placeholders

Поиск
Список
Период
Сортировка

Inconsistent printf placeholders

От
Kyotaro Horiguchi
Дата:
Hello.

A recent commit 6612185883 introduced two error messages that are
identical in text but differ in their placeholders.

-            pg_fatal("could not read file \"%s\": read only %d of %d bytes",
-                     filename, (int) rb, (int) st.st_size);
+            pg_fatal("could not read file \"%s\": read only %zd of %lld bytes",
+                     filename, rb, (long long int) st.st_size);
...
-            pg_fatal("could not read file \"%s\": read only %d of %d bytes",
+            pg_fatal("could not read file \"%s\": read only %d of %u bytes",
                      rf->filename, rb, length);

I'd be happy if the two messages kept consistency. I suggest aligning
types instead of making the messages different, as attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Вложения

Re: Inconsistent printf placeholders

От
Daniel Gustafsson
Дата:
> On 14 Mar 2024, at 05:20, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

> I'd be happy if the two messages kept consistency. I suggest aligning
> types instead of making the messages different, as attached.

I've only skimmed this so far but +1 on keeping the messages the same where
possible to reduce translation work.  Adding a comment on the message where the
casting is done to indicate that it is for translation might reduce the risk of
it "getting fixed" down the line.

--
Daniel Gustafsson




Re: Inconsistent printf placeholders

От
Peter Eisentraut
Дата:
On 14.03.24 05:20, Kyotaro Horiguchi wrote:
> A recent commit 6612185883 introduced two error messages that are
> identical in text but differ in their placeholders.
> 
> -            pg_fatal("could not read file \"%s\": read only %d of %d bytes",
> -                     filename, (int) rb, (int) st.st_size);
> +            pg_fatal("could not read file \"%s\": read only %zd of %lld bytes",
> +                     filename, rb, (long long int) st.st_size);
> ...
> -            pg_fatal("could not read file \"%s\": read only %d of %d bytes",
> +            pg_fatal("could not read file \"%s\": read only %d of %u bytes",
>                        rf->filename, rb, length);
> 
> I'd be happy if the two messages kept consistency. I suggest aligning
> types instead of making the messages different, as attached.

If you want to make them uniform, then I suggest the error messages 
should both be "%zd of %zu bytes", which are the actual types read() 
deals with.




Re: Inconsistent printf placeholders

От
Kyotaro Horiguchi
Дата:
Thank you for the suggestions.

At Thu, 14 Mar 2024 13:45:41 +0100, Daniel Gustafsson <daniel@yesql.se> wrote in 
> I've only skimmed this so far but +1 on keeping the messages the same where
> possible to reduce translation work.  Adding a comment on the message where the
> casting is done to indicate that it is for translation might reduce the risk of
> it "getting fixed" down the line.

Added a comment "/* cast xxx to avoid extra translatable messages */.

At Thu, 14 Mar 2024 14:02:46 +0100, Peter Eisentraut <peter@eisentraut.org> wrote in 
> If you want to make them uniform, then I suggest the error messages

Yeah. Having the same messages with only the placeholders changed is
not very pleasing during translation. If possible, I would like to
align them.

> should both be "%zd of %zu bytes", which are the actual types read()
> deals with.

I have considered only the two messages. Actually, buffile.c and md.c
are already like that. The attached aligns the messages in
pg_combinebackup.c and reconstruct.c with the precedents.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Вложения

Re: Inconsistent printf placeholders

От
David Rowley
Дата:
On Fri, 15 Mar 2024 at 15:27, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> I have considered only the two messages. Actually, buffile.c and md.c
> are already like that. The attached aligns the messages in
> pg_combinebackup.c and reconstruct.c with the precedents.

This looks like a worthy cause to make translator work easier.

I don't want to widen the goalposts or anything, but just wondering if
you'd searched for any others that could get similar treatment?

I only just had a quick look at the following.

$ cat src/backend/po/fr.po | grep -E "^msgid\s" | sed -E
's/%[a-zA-Z]+/\%/g' | sort | uniq -d -c
     31 msgid ""
      2 msgid "could not accept SSL connection: %"
      2 msgid "could not initialize LDAP: %"
      2 msgid "could not look up local user ID %: %"
      2 msgid "could not open file \"%\": %"
      2 msgid "could not read file \"%\": read % of %"
      2 msgid "could not read from log segment %, offset %: %"
      2 msgid "could not read from log segment %, offset %: read % of %"
      2 msgid "index % out of valid range, 0..%"
      2 msgid "invalid value for parameter \"%\": %"
      2 msgid "%%% is outside the valid range for parameter \"%\" (% .. %)"
      2 msgid "must be owner of large object %"
      2 msgid "oversize GSSAPI packet sent by the client (% > %)"
      2 msgid "permission denied for large object %"
      2 msgid "string is too long for tsvector (% bytes, max % bytes)"
      2 msgid "timestamp out of range: \"%\""
      2 msgid "Valid values are between \"%\" and \"%\"."

I've not looked at how hard it would be with any of the above to
determine how hard it would be to make the formats consistent.  The
3rd last one seems similar enough that it might be worth doing
together with this?

David



Re: Inconsistent printf placeholders

От
Kyotaro Horiguchi
Дата:
At Fri, 15 Mar 2024 16:01:28 +1300, David Rowley <dgrowleyml@gmail.com> wrote in 
> On Fri, 15 Mar 2024 at 15:27, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> > I have considered only the two messages. Actually, buffile.c and md.c
> > are already like that. The attached aligns the messages in
> > pg_combinebackup.c and reconstruct.c with the precedents.
> 
> This looks like a worthy cause to make translator work easier.
> 
> I don't want to widen the goalposts or anything, but just wondering if
> you'd searched for any others that could get similar treatment?
> 
> I only just had a quick look at the following.
> 
> $ cat src/backend/po/fr.po | grep -E "^msgid\s" | sed -E
> 's/%[a-zA-Z]+/\%/g' | sort | uniq -d -c
>      31 msgid ""
>       2 msgid "could not accept SSL connection: %"
>       2 msgid "could not initialize LDAP: %"
>       2 msgid "could not look up local user ID %: %"
...
> I've not looked at how hard it would be with any of the above to
> determine how hard it would be to make the formats consistent.  The
> 3rd last one seems similar enough that it might be worth doing
> together with this?


I checked for that kind of msgids in a bit more intensive way. The
numbers are the line numbers in ja.po of backend. I didn't check the
same for other modules.

> ###: invalid timeline %lld
>        @ backup/walsummaryfuncs.c:95
>      invalid timeline %u
>        @ repl_gram.y:318 repl_gram.y:359

In the first case, the %lld can be more than 2^31.

In the second case, %u is uint32. However, the bigger issue is
checking if the uint32 value is negative:

repl_gram.c: 147
>    if ((yyvsp[0].uintval) <= 0)
>        ereport(ERROR,
>                (errcode(ERRCODE_SYNTAX_ERROR),
>                 errmsg("invalid timeline %u", (yyvsp[0].uintval))));

which cannot happen. I think we can simply remove the useless error
check.

> ###: could not read file \"%s\": read %d of %zu
>        @ ../common/controldata_utils.c:116 ../common/controldata_utils.c:119 access/transam/xlog.c:3417
access/transam/xlog.c:4278replication/logical/origin.c:750 replication/logical/origin.c:789
replication/logical/snapbuild.c:2040replication/slot.c:2218 replication/slot.c:2259 replication/walsender.c:660
utils/cache/relmapper.c:833
>      could not read file \"%s\": read %d of %lld
>        @ access/transam/twophase.c:1372
>      could not read file \"%s\": read %zd of %zu
>        @ backup/basebackup.c:2102
> ###: oversize GSSAPI packet sent by the client (%zu > %zu)
>        @ libpq/be-secure-gssapi.c:351
>      oversize GSSAPI packet sent by the client (%zu > %d)
>        @ libpq/be-secure-gssapi.c:575
> ###: compressed segment file \"%s\" has incorrect uncompressed size %d, skipping
>        @ pg_receivewal.c:362
>      compressed segment file \"%s\" has incorrect uncompressed size %zu, skipping
>        @ pg_receivewal.c:448
> ###: could not read file \"%s\": read only %zd of %lld bytes
>        @ pg_combinebackup.c:1304
>      could not read file \"%s\": read only %d of %u bytes
>        @ reconstruct.c:514


We can "fix" them the same way.  I debated whether to use ssize_t for
read() and replace all instances of size_t with Size. However, in the
end, I decided to only keep it consistent with the surrounding code.

> ###: index %d out of valid range, 0..%d
>        @ utils/adt/varlena.c:3220 utils/adt/varlena.c:3287
>      index %lld out of valid range, 0..%lld
>        @ utils/adt/varlena.c:3251 utils/adt/varlena.c:3323
> ###: string is too long for tsvector (%d bytes, max %d bytes)
>        @ tsearch/to_tsany.c:194 utils/adt/tsvector.c:277 utils/adt/tsvector_op.c:1126
>      string is too long for tsvector (%ld bytes, max %ld bytes)
>        @ utils/adt/tsvector.c:223

We can unify them and did in the attached, but I'm not sure that's
sensible..

> ###: could not look up local user ID %d: %s
>        @ ../port/user.c:43 ../port/user.c:79
>      could not look up local user ID %ld: %s
>        @ libpq/auth.c:1886

Both of the above use uid_t (defined as int) as user ID and it is
explicitly cast to "int" in the first case and to long in the second,
which seems mysterious. Although I'm not sure if there's a possibility
of uid_t being widened in the future, I unified them to the latter way
for robustness.

> ###: error while cloning relation \"%s.%s\" (\"%s\" to \"%s\"): %m
>        @ file.c:44
>      error while cloning relation \"%s.%s\" (\"%s\" to \"%s\"): %s
>        @ file.c:65
> 
> The latter seems to be changed to %m by reassiging save_errno to errno, as done in other places.
> 
> ###: could not get data directory using %s: %m
>        @ option.c:448
>      could not get data directory using %s: %s
>        @ option.c:452
> ###: could not get control data using %s: %m
>        @ controldata.c:129 controldata.c:199
>      could not get control data using %s: %s
>        @ controldata.c:175 controldata.c:507
> ###: %s: %m
>        @ copy.c:401 psqlscanslash.l:805 psqlscanslash.l:817 psqlscanslash.l:835
>      %s: %s
>        @ command.c:2728 copy.c:388

In these cases, %m can be replaced by %s by using
wait_result_to_str(-1), but I'm not sure it is sensible. (I didn't do
that in the attached)

Finally, it doesn't seem possible to sensibly unify everything that
follows.

> ###: could not initialize LDAP: %s
>        @ libpq/auth.c:2327
>      could not initialize LDAP: %m
>        @ libpq/auth.c:2345
> ###: Valid values are between \"%d\" and \"%d\".
>        @ access/common/reloptions.c:1616
>      Valid values are between \"%f\" and \"%f\".
>        @ access/common/reloptions.c:1636
> ###: permission denied for large object %s
>        @ catalog/aclchk.c:2745
>      permission denied for large object %u
>        @ libpq/be-fsstubs.c:871 storage/large_object/inv_api.c:297 storage/large_object/inv_api.c:309
storage/large_object/inv_api.c:506storage/large_object/inv_api.c:617 storage/large_object/inv_api.c:807
 
> ###: commutator operator %s is already the commutator of operator %s
>        @ catalog/pg_operator.c:739
>      commutator operator %s is already the commutator of operator %u
>        @ catalog/pg_operator.c:744
> ###: must be owner of large object %s
>        @ catalog/aclchk.c:2877
>      must be owner of large object %u
>        @ catalog/objectaddress.c:2511 libpq/be-fsstubs.c:329
> ###: could not open file \"%s\": %m
>        @ replication/slot.c:2186 replication/walsender.c:628 replication/walsender.c:3051 storage/file/copydir.c:151
storage/file/fd.c:803storage/file/fd.c:3510 storage/file/fd.c:3740 storage/file/fd.c:3830 storage/smgr/md.c:661
utils/cache/relmapper.c:818utils/cache/relmapper.c:935 utils/error/elog.c:2085 utils/init/miscinit.c:1526
utils/init/miscinit.c:1660utils/init/miscinit.c:1737 utils/misc/guc.c:4712 utils/misc/guc.c:4762
 
>      could not open file \"%s\": %s
>        @ ../port/open.c:115
> ###: %d%s%s is outside the valid range for parameter \"%s\" (%d .. %d)
>        @ utils/misc/guc.c:3179
>      %g%s%s is outside the valid range for parameter \"%s\" (%g .. %g)
>        @ utils/misc/guc.c:3215
> ###: could not accept SSL connection: %m
>        @ libpq/be-secure-openssl.c:503
>      could not accept SSL connection: %s
>        @ libpq/be-secure-openssl.c:546
> ###: invalid value for parameter \"%s\": %d
>        @ utils/adt/regexp.c:716 utils/adt/regexp.c:725 utils/adt/regexp.c:1082 utils/adt/regexp.c:1146
utils/adt/regexp.c:1155utils/adt/regexp.c:1164 utils/adt/regexp.c:1173 utils/adt/regexp.c:1853 utils/adt/regexp.c:1862
utils/adt/regexp.c:1871utils/misc/guc.c:6761 utils/misc/guc.c:6795
 
>      invalid value for parameter \"%s\": %g
>        @ utils/misc/guc.c:6829
> ###: could not close file \"%s\": %m
>        @ bbstreamer_file.c:138 pg_recvlogical.c:650
>      could not close file \"%s\": %s
>        @ receivelog.c:227 receivelog.c:317 receivelog.c:688
> ###: could not fsync file \"%s\": %m
>        @ pg_recvlogical.c:204
>      could not fsync file \"%s\": %s
>        @ receivelog.c:775 receivelog.c:1022 walmethods.c:1206
> ###: could not read from input file: %s
>        @ compress_lz4.c:628 compress_lz4.c:647 compress_none.c:97 compress_none.c:140
>      could not read from input file: %m
>        @ compress_zstd.c:373 pg_backup_custom.c:655
> ###: could not get pg_ctl version data using %s: %m
>        @ exec.c:47
>      could not get pg_ctl version data using %s: %s
>        @ exec.c:51
> ###: negator operator %s is already the negator of operator %s
>        @ catalog/pg_operator.c:807
>      negator operator %s is already the negator of operator %u
>        @ catalog/pg_operator.c:812
> ###: timestamp out of range: \"%s\"
>        @ access/transam/xlogrecovery.c:4937 utils/adt/timestamp.c:202 utils/adt/timestamp.c:455
>      timestamp out of range: \"%g\"
>        @ utils/adt/timestamp.c:762 utils/adt/timestamp.c:774

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Вложения

Re: Inconsistent printf placeholders

От
Kyotaro Horiguchi
Дата:
At Fri, 15 Mar 2024 16:20:27 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> I checked for that kind of msgids in a bit more intensive way. The
> numbers are the line numbers in ja.po of backend. I didn't check the
> same for other modules.
> 
> > ###: invalid timeline %lld
> >        @ backup/walsummaryfuncs.c:95
> >      invalid timeline %u
> >        @ repl_gram.y:318 repl_gram.y:359

"The numbers are the line numbers in ja.po" is wrong. Correctly, it
should be written as:

The information consists of two lines: the first of them is the msgid,
and the second indicates the locations where they appear.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Inconsistent printf placeholders

От
Peter Eisentraut
Дата:
On 15.03.24 08:20, Kyotaro Horiguchi wrote:
> diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
> @@ -1369,8 +1369,8 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
>                        errmsg("could not read file \"%s\": %m", path)));
>           else
>               ereport(ERROR,
> -                    (errmsg("could not read file \"%s\": read %d of %lld",
> -                            path, r, (long long int) stat.st_size)));
> +                    (errmsg("could not read file \"%s\": read %zd of %zu",
> +                            path, r, (Size) stat.st_size)));
>       }
>   
>       pgstat_report_wait_end();

This might be worse, because stat.st_size is of type off_t, which could 
be smaller than Size/size_t.

> diff --git a/src/backend/libpq/be-secure-gssapi.c b/src/backend/libpq/be-secure-gssapi.c
> index bc04e78abb..68645b4519 100644
> --- a/src/backend/libpq/be-secure-gssapi.c
> +++ b/src/backend/libpq/be-secure-gssapi.c
> @@ -572,9 +572,9 @@ secure_open_gssapi(Port *port)
>           if (input.length > PQ_GSS_RECV_BUFFER_SIZE)
>           {
>               ereport(COMMERROR,
> -                    (errmsg("oversize GSSAPI packet sent by the client (%zu > %d)",
> +                    (errmsg("oversize GSSAPI packet sent by the client (%zu > %zu)",
>                               (size_t) input.length,
> -                            PQ_GSS_RECV_BUFFER_SIZE)));
> +                            (size_t) PQ_GSS_RECV_BUFFER_SIZE)));
>               return -1;
>           }
>   

Might be better to add that cast to the definition of 
PQ_GSS_RECV_BUFFER_SIZE instead, so that all code can benefit.

> diff --git a/src/backend/replication/repl_gram.y b/src/backend/replication/repl_gram.y
> index 7474f5bd67..baa76280b9 100644
> --- a/src/backend/replication/repl_gram.y
> +++ b/src/backend/replication/repl_gram.y
> @@ -312,11 +312,6 @@ timeline_history:
>                   {
>                       TimeLineHistoryCmd *cmd;
>   
> -                    if ($2 <= 0)
> -                        ereport(ERROR,
> -                                (errcode(ERRCODE_SYNTAX_ERROR),
> -                                 errmsg("invalid timeline %u", $2)));
> -
>                       cmd = makeNode(TimeLineHistoryCmd);
>                       cmd->timeline = $2;
>   
> @@ -352,13 +347,7 @@ opt_slot:
>   
>   opt_timeline:
>               K_TIMELINE UCONST
> -                {
> -                    if ($2 <= 0)
> -                        ereport(ERROR,
> -                                (errcode(ERRCODE_SYNTAX_ERROR),
> -                                 errmsg("invalid timeline %u", $2)));
> -                    $$ = $2;
> -                }
> +                { $$ = $2; }
>                   | /* EMPTY */            { $$ = 0; }
>               ;
>   

I don't think this is correct.  It loses the check for == 0.

> diff --git a/src/backend/tsearch/to_tsany.c b/src/backend/tsearch/to_tsany.c
> index 88cba58cba..9d21178107 100644
> --- a/src/backend/tsearch/to_tsany.c
> +++ b/src/backend/tsearch/to_tsany.c
> @@ -191,7 +191,8 @@ make_tsvector(ParsedText *prs)
>       if (lenstr > MAXSTRPOS)
>           ereport(ERROR,
>                   (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
> -                 errmsg("string is too long for tsvector (%d bytes, max %d bytes)", lenstr, MAXSTRPOS)));
> +                 /* cast values to avoid extra translatable messages */
> +                 errmsg("string is too long for tsvector (%ld bytes, max %ld bytes)", (long) lenstr, (long)
MAXSTRPOS)));
>   
>       totallen = CALCDATASIZE(prs->curwords, lenstr);
>       in = (TSVector) palloc0(totallen);

I think it would be better instead to change the message in tsvectorin() 
to *not* use long.  The size of long is unportable, so I would rather 
avoid using it at all.

> diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
> index 8d28dd42ce..5de490b569 100644
> --- a/src/backend/utils/adt/varlena.c
> +++ b/src/backend/utils/adt/varlena.c
> @@ -3217,8 +3217,9 @@ byteaGetByte(PG_FUNCTION_ARGS)
>       if (n < 0 || n >= len)
>           ereport(ERROR,
>                   (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
> -                 errmsg("index %d out of valid range, 0..%d",
> -                        n, len - 1)));
> +                 /* cast values to avoid extra translable messages */
> +                 errmsg("index %lld out of valid range, 0..%lld",
> +                        (long long)n, (long long) len - 1)));
>   
>       byte = ((unsigned char *) VARDATA_ANY(v))[n];

I think this is taking it too far.  We shouldn't try to make all similar 
messages use the same placeholders.  If the underlying types are 
different, we should use them.  Adding more casts makes the code less 
robust overall.  The size_t/ssize_t cleanup is different, because there 
the types were arguably wrong to begin with, and by using the right 
types we move toward more consistency.

> diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c
> index 6f0814d9ac..feb4d5dcf4 100644
> --- a/src/bin/pg_combinebackup/pg_combinebackup.c
> +++ b/src/bin/pg_combinebackup/pg_combinebackup.c
> @@ -1301,8 +1301,9 @@ slurp_file(int fd, char *filename, StringInfo buf, int maxlen)
>           if (rb < 0)
>               pg_fatal("could not read file \"%s\": %m", filename);
>           else
> -            pg_fatal("could not read file \"%s\": read only %zd of %lld bytes",
> -                     filename, rb, (long long int) st.st_size);
> +            /* cast st_size to avoid extra translatable messages */
> +            pg_fatal("could not read file \"%s\": read only %zd of %zu bytes",
> +                     filename, rb, (size_t) st.st_size);
>       }
>   
>       /* Adjust buffer length for new data and restore trailing-\0 invariant */

Similar to above, casting off_t to size_t is dubious.

> diff --git a/src/port/user.c b/src/port/user.c
> index 7444aeb64b..9364bdb69e 100644
> --- a/src/port/user.c
> +++ b/src/port/user.c
> @@ -40,8 +40,8 @@ pg_get_user_name(uid_t user_id, char *buffer, size_t buflen)
>       }
>       if (pwerr != 0)
>           snprintf(buffer, buflen,
> -                 _("could not look up local user ID %d: %s"),
> -                 (int) user_id,
> +                 _("could not look up local user ID %ld: %s"),
> +                 (long) user_id,
>                    strerror_r(pwerr, pwdbuf, sizeof(pwdbuf)));
>       else
>           snprintf(buffer, buflen,

Also dubious use of "long" here.




Re: Inconsistent printf placeholders

От
Kyotaro Horiguchi
Дата:
Thank you for looking this.

At Tue, 19 Mar 2024 10:50:23 +0100, Peter Eisentraut <peter@eisentraut.org> wrote in 
> On 15.03.24 08:20, Kyotaro Horiguchi wrote:
> > diff --git a/src/backend/access/transam/twophase.c
> > b/src/backend/access/transam/twophase.c
> > @@ -1369,8 +1369,8 @@ ReadTwoPhaseFile(TransactionId xid, bool
> > missing_ok)
> >                        errmsg("could not read file \"%s\":
> >                        %m", path)));
> >           else
> >               ereport(ERROR,
> > -                    (errmsg("could not read file \"%s\":
> > -                    read %d of %lld",
> > -                            path, r, (long long
> > -                            int) stat.st_size)));
> > + (errmsg("could not read file \"%s\": read %zd of %zu",
> > + path, r, (Size) stat.st_size)));
> >       }
> >         pgstat_report_wait_end();
> 
> This might be worse, because stat.st_size is of type off_t, which
> could be smaller than Size/size_t.

I think you were trying to mention that off_t could be wider than
size_t and you're right in that point. I thought that it is safe since
we are trying to read the whole content of the file at once here into
palloc'ed memory.

However, on second thought, if st_size is out of the range of ssize_t,
and palloc accepts that size, at least on Linux, read(2) reads only
0x7ffff000 bytes and raches the error reporting. Addition to that,
this size was closer to the memory allocation size limit than I had
thought.

As the result, I removed the change. However, I kept the change of the
type of variable "r" and corresponding placeholder %zd.

> > diff --git a/src/backend/libpq/be-secure-gssapi.c
> > b/src/backend/libpq/be-secure-gssapi.c
> > index bc04e78abb..68645b4519 100644
> > --- a/src/backend/libpq/be-secure-gssapi.c
> > +++ b/src/backend/libpq/be-secure-gssapi.c
> > @@ -572,9 +572,9 @@ secure_open_gssapi(Port *port)
> >           if (input.length > PQ_GSS_RECV_BUFFER_SIZE)
> >           {
> >               ereport(COMMERROR,
> > -                    (errmsg("oversize GSSAPI packet sent
> > -                    by the client (%zu > %d)",
> > + (errmsg("oversize GSSAPI packet sent by the client (%zu > %zu)",
> >                               (size_t) input.length,
> > -                            PQ_GSS_RECV_BUFFER_SIZE)));
> > + (size_t) PQ_GSS_RECV_BUFFER_SIZE)));
> >               return -1;
> >           }
> >   
> 
> Might be better to add that cast to the definition of
> PQ_GSS_RECV_BUFFER_SIZE instead, so that all code can benefit.

As far as I see, the only exceptional uses I found were a comparison
with int values, and being passed as an OM_uint32 (to one of the
parameters of gss_wrap_size_limit()). Therefore, I agree that it is
beneficial. By the way, we currently define Size as the same as size_t
(since 1998). Is it correct to consider Size as merely for backward
compatibility and we should use size_t for new code?  I used size_t in
the modified part in the attached patch.

> > diff --git a/src/backend/replication/repl_gram.y
> > b/src/backend/replication/repl_gram.y
> > index 7474f5bd67..baa76280b9 100644
> > --- a/src/backend/replication/repl_gram.y
> > +++ b/src/backend/replication/repl_gram.y
> > @@ -312,11 +312,6 @@ timeline_history:
> >                   {
> >                       TimeLineHistoryCmd *cmd;
> >   -                    if ($2 <= 0)
> > -                        ereport(ERROR,
> > -                                (errcode(ERRCODE_SYNTAX_ERROR),
> > -                                 errmsg("invalid
> > -                                 timeline %u",
> > -                                 $2)));
> > -
...
> I don't think this is correct.  It loses the check for == 0.

Ugh. It's my mistake. So we need to consider unifying the messages
again. In walsummaryfuncs.c, %lld is required, but it's silly for the
uses in repl_gram.y. Finally, I chose not to change anything here.

> > diff --git a/src/backend/tsearch/to_tsany.c
> > b/src/backend/tsearch/to_tsany.c
> > index 88cba58cba..9d21178107 100644
> > --- a/src/backend/tsearch/to_tsany.c
> > +++ b/src/backend/tsearch/to_tsany.c
> > @@ -191,7 +191,8 @@ make_tsvector(ParsedText *prs)
> >       if (lenstr > MAXSTRPOS)
> >           ereport(ERROR,
> >                   (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
> > -                 errmsg("string is too long for tsvector (%d
> > -                 bytes, max %d bytes)", lenstr, MAXSTRPOS)));
> > + /* cast values to avoid extra translatable messages */
> > + errmsg("string is too long for tsvector (%ld bytes, max %ld bytes)",
> > (long) lenstr, (long) MAXSTRPOS)));
> >         totallen = CALCDATASIZE(prs->curwords, lenstr);
> >       in = (TSVector) palloc0(totallen);
> 
> I think it would be better instead to change the message in
> tsvectorin() to *not* use long.  The size of long is unportable, so I
> would rather avoid using it at all.

The casts to long are tentative only to adjust to the corresponding
placeholder, and in this context, portability concerns are not
applicable. However, those casts are apparently useless. As you
suggested, I tried to change tsvectorin() instead, but there's a
problem here.

tsvector.c:224
>     errmsg("string is too long for tsvector (%ld bytes, max %ld bytes)",
>            (long) (cur - tmpbuf), (long) MAXSTRPOS)));

cur and tmpbuf are pointers. The byte width of the subtraction results
varies by architecture. However, the surrounding code apparently
assumes that the difference fits within an int. I added a cast to int
for the pointer arithmetic here. (Although I'm not sure this is the
right direction.)

> > diff --git a/src/backend/utils/adt/varlena.c
> > b/src/backend/utils/adt/varlena.c
> > index 8d28dd42ce..5de490b569 100644
> > --- a/src/backend/utils/adt/varlena.c
> > +++ b/src/backend/utils/adt/varlena.c
> > @@ -3217,8 +3217,9 @@ byteaGetByte(PG_FUNCTION_ARGS)
> >       if (n < 0 || n >= len)
> >           ereport(ERROR,
> >                   (errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
> > -                 errmsg("index %d out of valid range, 0..%d",
> > -                        n, len - 1)));
> > + /* cast values to avoid extra translable messages */
> > + errmsg("index %lld out of valid range, 0..%lld",
> > + (long long)n, (long long) len - 1)));
> >         byte = ((unsigned char *) VARDATA_ANY(v))[n];
> 
> I think this is taking it too far.  We shouldn't try to make all
> similar messages use the same placeholders.  If the underlying types
> are different, we should use them.  Adding more casts makes the code
> less robust overall.  The size_t/ssize_t cleanup is different, because
> there the types were arguably wrong to begin with, and by using the
> right types we move toward more consistency.

Ouch! Understood. They treat byte and bit locations accordingly. I
agree that it's too far. Removed.

> > diff --git a/src/bin/pg_combinebackup/pg_combinebackup.c b/src/bin/pg_combinebackup/pg_combinebackup.c
> > index 6f0814d9ac..feb4d5dcf4 100644
> > --- a/src/bin/pg_combinebackup/pg_combinebackup.c
> > +++ b/src/bin/pg_combinebackup/pg_combinebackup.c
> > -            pg_fatal("could not read file \"%s\": read only %zd of %lld bytes",
> > -                     filename, rb, (long long int) st.st_size);
> > +            /* cast st_size to avoid extra translatable messages */
> > +            pg_fatal("could not read file \"%s\": read only %zd of %zu bytes",
> > +                     filename, rb, (size_t) st.st_size);
> >       }
> >         /* Adjust buffer length for new data and restore trailing-\0 invariant */
> 
> Similar to above, casting off_t to size_t is dubious.

The same discussion regarding the change in twophase.c is also
applicable to this change. I applied the same amendment.

> > diff --git a/src/port/user.c b/src/port/user.c
> > index 7444aeb64b..9364bdb69e 100644
> > --- a/src/port/user.c
> > +++ b/src/port/user.c
> > @@ -40,8 +40,8 @@ pg_get_user_name(uid_t user_id, char *buffer, size_t
> > buflen)
> >       }
> >       if (pwerr != 0)
> >           snprintf(buffer, buflen,
> > -                 _("could not look up local user ID %d: %s"),
> > -                 (int) user_id,
> > + _("could not look up local user ID %ld: %s"),
> > +                 (long) user_id,
> >                    strerror_r(pwerr, pwdbuf, sizeof(pwdbuf)));
> >       else
> >           snprintf(buffer, buflen,
> 
> Also dubious use of "long" here.

Okay, used %d instead. In addition to that, I removed the casts from
uid_t expecting that compilers will detect the change of the
definition of uid_t.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Вложения