Обсуждение: A couple of gripes about the gettext plurals patch
I see that the recently committed changes typically use ngettext in this style: ereport(msglevel, /* translator: %d always has a value larger than 1 */ (errmsg(ngettext("dropcascades to %d other object", "drop cascades to %d other objects", numReportedClient + numNotReportedClient), numReportedClient + numNotReportedClient), This is bogus: errmsg expects that it should itself feed its first argument through gettext(), not receive an argument that is already translated. That's at the least a waste of cycles, and it's not entirely impossible that double translation could end up with a just plain wrong result. A simple fix would be to use errmsg_internal() in these cases, but I wonder if we should instead invent nerrmsg() or something like that. Anyway, there are some other usages besides errmsg(ngettext()) that are broken in the same way and will also require consideration. I'm also wondering whether PGAC_CHECK_GETTEXT() should be made to check for ngettext() instead of bind_textdomain_codeset(). Which one was added later? regards, tom lane
On Sunday 26 April 2009 21:29:20 Tom Lane wrote: > ereport(msglevel, > /* translator: %d always has a value larger than 1 */ > (errmsg(ngettext("drop cascades to %d other object", > "drop cascades to %d other objects", > numReportedClient + numNotReportedClient), > numReportedClient + numNotReportedClient), > > This is bogus: errmsg expects that it should itself feed its first > argument through gettext(), not receive an argument that is already > translated. That's at the least a waste of cycles, and it's not > entirely impossible that double translation could end up with a just > plain wrong result. I think we can live with this for now, and we have lived with this sort of issue in other places for a while. We should consider reshuffling the interfaces a bit as you describe to reduce the problem. But I think this is not something we always avoid completely. > I'm also wondering whether PGAC_CHECK_GETTEXT() should be made to > check for ngettext() instead of bind_textdomain_codeset(). Which > one was added later? I checked this in the gettext changelog, and bind_textdomain_code() came (slightly) later, so we're OK.
Peter Eisentraut <peter_e@gmx.net> writes: > On Sunday 26 April 2009 21:29:20 Tom Lane wrote: >> This is bogus: errmsg expects that it should itself feed its first >> argument through gettext(), not receive an argument that is already >> translated. That's at the least a waste of cycles, and it's not >> entirely impossible that double translation could end up with a just >> plain wrong result. > I think we can live with this for now, and we have lived with this sort of > issue in other places for a while. We should consider reshuffling the > interfaces a bit as you describe to reduce the problem. The issue of double translation is really a minor point; what is bothering me is that we've got such an ad-hoc, non-compile-time-checkable approach here. Zdenek's discovery today that some of the format strings are flat-out wrong http://archives.postgresql.org/pgsql-hackers/2009-05/msg00946.php surprises me not in the least. I think that we need to have a more carefully designed interface before we allow any more plural translations to be put in place, not after. regards, tom lane
On Monday 25 May 2009 22:02:47 Tom Lane wrote: > The issue of double translation is really a minor point; what is > bothering me is that we've got such an ad-hoc, > non-compile-time-checkable approach here. Zdenek's discovery > today that some of the format strings are flat-out wrong > http://archives.postgresql.org/pgsql-hackers/2009-05/msg00946.php > surprises me not in the least. See response there why this is the way it is. Note also that gcc's format argument checking can see through ngettext() quite well, so as far as I can tell, we are not exposed to accidental format string mismatches. Example code: #include <stdarg.h> #include <stdio.h> #include <libintl.h> extern void errmsg(const char *fmt, ...) __attribute__((format(printf,1,2))); void errmsg(const char *fmt, ...) { va_list ap; va_start(ap, fmt); vfprintf(stderr, fmt, ap); va_end(ap); } int main(int argc, char *argv[]) { errmsg(ngettext("got %d argument, namely %s\n", "got %d arguments, first ist %s\n", argc), argc, argv[0]); return 0; } I tried throwing various kinds of subtle garbage into the errmsg/ngettext line, but it was all discovered by gcc -Wall.
Peter Eisentraut <peter_e@gmx.net> writes: > I tried throwing various kinds of subtle garbage into the errmsg/ngettext > line, but it was all discovered by gcc -Wall. I experimented with this and found that indeed both format strings are checked ... if you have a reasonably recent libintl.h AND you have specified --enable-nls. Otherwise it all goes to heck, apparently because the compiler doesn't try to look through our substitute definition #define ngettext(s,p,n) ((n) == 1 ? (s) : (p)) So I'm still of the opinion that we need some work here. I think that instead of this #define we need an actual function that we can hang a couple of __attribute_format_arg__ markers on. Otherwise things are going to slip by us. (Not sure about you, but I don't build with --enable-nls by default.) regards, tom lane
Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > I tried throwing various kinds of subtle garbage into the errmsg/ngettext > > line, but it was all discovered by gcc -Wall. > > I experimented with this and found that indeed both format strings are > checked ... if you have a reasonably recent libintl.h AND you have > specified --enable-nls. Otherwise it all goes to heck, apparently > because the compiler doesn't try to look through our substitute > definition > > #define ngettext(s,p,n) ((n) == 1 ? (s) : (p)) > > So I'm still of the opinion that we need some work here. I think > that instead of this #define we need an actual function that we can > hang a couple of __attribute_format_arg__ markers on. Otherwise > things are going to slip by us. (Not sure about you, but I don't > build with --enable-nls by default.) TODO item? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Tuesday 26 May 2009 21:05:29 Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > I tried throwing various kinds of subtle garbage into the errmsg/ngettext > > line, but it was all discovered by gcc -Wall. > > I experimented with this and found that indeed both format strings are > checked ... if you have a reasonably recent libintl.h AND you have > specified --enable-nls. Otherwise it all goes to heck, apparently > because the compiler doesn't try to look through our substitute > definition > > #define ngettext(s,p,n) ((n) == 1 ? (s) : (p)) I can't reproduce that. Do you have a concrete example? Different compiler versions, perhaps? > So I'm still of the opinion that we need some work here. I think > that instead of this #define we need an actual function that we can > hang a couple of __attribute_format_arg__ markers on. That would appear to be the logical solution, but since I can't reproduce the problem, I will have trouble to work on this.
Peter Eisentraut <peter_e@gmx.net> writes: > On Tuesday 26 May 2009 21:05:29 Tom Lane wrote: >> I experimented with this and found that indeed both format strings are >> checked ... if you have a reasonably recent libintl.h AND you have >> specified --enable-nls. Otherwise it all goes to heck, apparently >> because the compiler doesn't try to look through our substitute >> definition >> >> #define ngettext(s,p,n) ((n) == 1 ? (s) : (p)) > I can't reproduce that. Do you have a concrete example? Different compiler > versions, perhaps? Hmm, I was experimenting with Fedora 10's gcc (4.3.2) ... what are you testing? I don't have the test case anymore but will see if I can reconstruct it. regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > On Tuesday 26 May 2009 21:05:29 Tom Lane wrote: >> I experimented with this and found that indeed both format strings are >> checked ... if you have a reasonably recent libintl.h AND you have >> specified --enable-nls. Otherwise it all goes to heck, apparently >> because the compiler doesn't try to look through our substitute >> definition >> >> #define ngettext(s,p,n) ((n) == 1 ? (s) : (p)) > I can't reproduce that. Do you have a concrete example? Different compiler > versions, perhaps? After further experimentation I think I must have gotten confused yesterday. gcc 4.3.2 does seem to understand that it should check both format strings in the ?: construct. What was not making that check is the relic 2.95.3 version that I use for trailing-edge compatibility checking. However, 2.95.3 appears to handle only one format_arg() attribute per function, and so the proposed alternative coding doesn't help it very much either. Unless there's some intermediate version that can do two format_arg()s but doesn't look through ?:, it's probably not worth changing. So on the whole I withdraw that line of complaint. But I'm still not happy, because I've thought of another one ;-) To wit, the current coding fails to respect the gettext domain when working with pluralized messages. regards, tom lane
On Thursday 28 May 2009 00:54:32 Tom Lane wrote: > To wit, the current > coding fails to respect the gettext domain when working with pluralized > messages. The ngettext() calls use the default textdomain that main.c sets up. The PLs use dngettext(). Is that not correct?
Peter Eisentraut <peter_e@gmx.net> writes: > On Thursday 28 May 2009 00:54:32 Tom Lane wrote: >> To wit, the current >> coding fails to respect the gettext domain when working with pluralized >> messages. > The ngettext() calls use the default textdomain that main.c sets up. The PLs > use dngettext(). Is that not correct? If that's okay, why didn't we adopt that approach for the mainline errmsg processing? Or more to the point: I think it's a seriously bad idea that ereports in PLs need to be coded differently from those in the core backend, especially with respect to a relatively-little-used feature. Want to make a side bet on how long till the first bug gets committed? regards, tom lane