Обсуждение: BUG #17254: Crash with 0xC0000409 in pg_stat_statements when pg_stat_tmp\pgss_query_texts.stat exceeded 2GB.

Поиск
Список
Период
Сортировка

BUG #17254: Crash with 0xC0000409 in pg_stat_statements when pg_stat_tmp\pgss_query_texts.stat exceeded 2GB.

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17254
Logged by:          yusuke egashira
Email address:      egashira.yusuke@fujitsu.com
PostgreSQL version: 14.0
Operating system:   Windows Server 2019
Description:

The database server was crashed with 0xC0000409 in pg_stat_statements.

2021-10-28 14:15:14.244 JST [4980] LOG:  server process (PID 2556) was
terminated by exception 0xC0000409
2021-10-28 14:15:14.244 JST [4980] DETAIL:  Failed process was running:
select count(1) from pg_stat_statements;
2021-10-28 14:15:14.244 JST [4980] HINT:  See C include file "ntstatus.h"
for a description of the hexadecimal value.

This crash happened with PostgreSQL 14.0 for Windows downloaded from
https://www.enterprisedb.com/download-postgresql-binaries.

According to our investigation, when the pg_stat_statements reads the
pg_stat_tmp\pgss_query_texts.stat, the server seems to crash with 0xC0000409
if the size of pgss_query_texts.stat exceeds about 2GB.
Our user was performing tasks that executed a lot of very long SQL
statements (INSERT statement with over 100,000 parameters), and the server
crashed whenever pgss_query_texts.stat grew to a size larger than 2GB.

Strangely, I had created the crashdumps directory under PGHOME, but a
minidump file was not output.
# Doesn't a crash with 0xC0000409 output a dump?
I retrieved a minidump from postgres.exe which was rebuilt with enabled
dumping by WindowsErrorReport.
It seems that an exception occurred due to an argument error of _read().


* CHANGED:
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index e58e24a646..319c6ab62e 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -64,9 +64,6 @@ main(int argc, char *argv[])
         * If supported on the current platform, set up a handler to be
called if
         * the backend/postmaster crashes with a fatal signal or
exception.
         */
-#if defined(WIN32) && defined(HAVE_MINIDUMP_TYPE)
-       pgwin32_install_crashdump_handler();
-#endif

        progname = get_progname(argv[0]);

@@ -247,8 +244,6 @@ startup_hacks(const char *progname)
                        exit(1);
                }

-               /* In case of general protection fault, don't show GUI popup
box */
-               SetErrorMode(SEM_FAILCRITICALERRORS |
SEM_NOGPFAULTERRORBOX);


* STACK_TEXT:
ucrtbase!invoke_watson+0x18
ucrtbase!invalid_parameter+0x81
ucrtbase!invalid_parameter_noinfo+0x9
ucrtbase!_read+0x3aedc
pg_stat_statements!qtext_load_file+0x112
pg_stat_statements!pg_stat_statements_internal+0x3ae
pg_stat_statements!pg_stat_statements_1_9+0x17
postgres!ExecMakeTableFunctionResult+0x205
postgres!FunctionNext+0x62
postgres!ExecScanFetch+0x101
postgres!fetch_input_tuple+0x91
postgres!agg_retrieve_direct+0x245
postgres!ExecAgg+0x156
postgres!standard_ExecutorRun+0x13a
pg_stat_statements!pgss_ExecutorRun+0xa1
postgres!PortalRunSelect+0x96
postgres!PortalRun+0x1ef
postgres!exec_simple_query+0x627
postgres!PostgresMain+0x6e9
postgres!BackendRun+0x44
postgres!SubPostmasterMain+0x254
postgres!main+0x43b
postgres!__scrt_common_main_seh+0x10c
kernel32!BaseThreadInitThunk+0x14
ntdll!RtlUserThreadStart+0x21



The MSDN documentation says that the upper limit of the _read() argument is
INT_MAX (about 2GB), but the size gotten by fstat() exceeds this limit, so I
think we encountered server crash by an exception error.

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/read?view=msvc-160
> If buffer is NULL, or if buffer_size > INT_MAX, the invalid parameter
handler is invoked.

Until PostgreSQL 13, fstat() failed and returned ERROR when a file was
larger than 2GB, but as a result of improvements to fstat() in PostgreSQL
14, it appears that _read() has exceeded its limit and now causes a crash.


Re: BUG #17254: Crash with 0xC0000409 in pg_stat_statements when pg_stat_tmp\pgss_query_texts.stat exceeded 2GB.

От
Juan José Santamaría Flecha
Дата:
Thanks for the report. 

On Fri, Oct 29, 2021 at 12:52 PM PG Bug reporting form <noreply@postgresql.org> wrote:

The MSDN documentation says that the upper limit of the _read() argument is
INT_MAX (about 2GB), but the size gotten by fstat() exceeds this limit, so I
think we encountered server crash by an exception error.

https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/read?view=msvc-160
> If buffer is NULL, or if buffer_size > INT_MAX, the invalid parameter
handler is invoked.

Until PostgreSQL 13, fstat() failed and returned ERROR when a file was
larger than 2GB, but as a result of improvements to fstat() in PostgreSQL
14, it appears that _read() has exceeded its limit and now causes a crash.

The value of MaxAllocHugeSize is being oversized when _WIN64 is defined [1]. Shouldn't the limit for a slurp be MaxAllocSize?


Regards,

Juan José Santamaría Flecha

Re: BUG #17254: Crash with 0xC0000409 in pg_stat_statements when pg_stat_tmp\pgss_query_texts.stat exceeded 2GB.

От
Juan José Santamaría Flecha
Дата:

On Fri, Oct 29, 2021 at 9:43 PM Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> wrote:

The value of MaxAllocHugeSize is being oversized when _WIN64 is defined [1]. Shouldn't the limit for a slurp be MaxAllocSize?

Please find attached a patch for so.

Regards,

Juan José Santamaría Flecha 

Re: BUG #17254: Crash with 0xC0000409 in pg_stat_statements when pg_stat_tmp\pgss_query_texts.stat exceeded 2GB.

От
Juan José Santamaría Flecha
Дата:
On Sat, Oct 30, 2021 at 10:24 AM Juan José Santamaría Flecha <juanjo.santamaria@gmail.com> wrote:

Please find attached a patch for so.

Regards,

Juan José Santamaría Flecha 
Now, with 100% more patch attached. 
Вложения
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santamaria@gmail.com> writes:
> Now, with 100% more patch attached.

That seems like a pretty poor solution.  It will cause pg_stat_statements
to fail altogether as soon as the stats file exceeds 1GB.  (Admittedly,
failing is better than crashing, but not by that much.)  Worse, it causes
that to happen on EVERY platform, not only Windows where the problem is.

I think instead, we need to turn the subsequent one-off read() call into a
loop that reads no more than INT_MAX bytes at a time.  It'd be possible
to restrict that to Windows, but probably no harm in doing it the same
way everywhere.

A different line of thought is that maybe we shouldn't be letting the
file get so big in the first place.  Letting every backend have its
own copy of a multi-gigabyte stats file is going to be problematic,
and not only on Windows.  It looks like the existing logic just considers
the number of hash table entries, not their size ... should we rearrange
things to keep a running count of the space used?

            regards, tom lane



Re: BUG #17254: Crash with 0xC0000409 in pg_stat_statements when pg_stat_tmp\pgss_query_texts.stat exceeded 2GB.

От
Juan José Santamaría Flecha
Дата:

On Sat, Oct 30, 2021 at 6:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

That seems like a pretty poor solution.  It will cause pg_stat_statements
to fail altogether as soon as the stats file exceeds 1GB.  (Admittedly,
failing is better than crashing, but not by that much.)  Worse, it causes
that to happen on EVERY platform, not only Windows where the problem is.

I don't think it is a Windows only problem, even on POSIX platforms it might not be safe trying to read() over 2GB.
 
I think instead, we need to turn the subsequent one-off read() call into a
loop that reads no more than INT_MAX bytes at a time.  It'd be possible
to restrict that to Windows, but probably no harm in doing it the same
way everywhere.

Seems reasonable to me, can such a change be back-patched?
 
A different line of thought is that maybe we shouldn't be letting the
file get so big in the first place.  Letting every backend have its
own copy of a multi-gigabyte stats file is going to be problematic,
and not only on Windows.  It looks like the existing logic just considers
the number of hash table entries, not their size ... should we rearrange
things to keep a running count of the space used?

+1. There should be a mechanism to limit the effective memory size.

Regards,

Juan José Santamaría Flecha
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santamaria@gmail.com> writes:
> On Sat, Oct 30, 2021 at 6:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think instead, we need to turn the subsequent one-off read() call into a
>> loop that reads no more than INT_MAX bytes at a time.  It'd be possible
>> to restrict that to Windows, but probably no harm in doing it the same
>> way everywhere.

> Seems reasonable to me, can such a change be back-patched?

Don't see why not.

>> A different line of thought is that maybe we shouldn't be letting the
>> file get so big in the first place.  Letting every backend have its
>> own copy of a multi-gigabyte stats file is going to be problematic,
>> and not only on Windows.  It looks like the existing logic just considers
>> the number of hash table entries, not their size ... should we rearrange
>> things to keep a running count of the space used?

> +1. There should be a mechanism to limit the effective memory size.

This, on the other hand, would likely be something for HEAD only.
But now that we've seen a field complaint, it seems like a good
thing to pursue.

            regards, tom lane



I wrote:
> =?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santamaria@gmail.com> writes:
>> On Sat, Oct 30, 2021 at 6:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I think instead, we need to turn the subsequent one-off read() call into a
>>> loop that reads no more than INT_MAX bytes at a time.  It'd be possible
>>> to restrict that to Windows, but probably no harm in doing it the same
>>> way everywhere.

>> Seems reasonable to me, can such a change be back-patched?

> Don't see why not.

Here's a quick patch for that.  I don't have any ability to check it
on Windows, but the logic is easy to verify by reducing the arbitrary
constant to something small.  (I used 1GB, not INT_MAX, because I figured
we ought to read in multiples of a filesystem block if possible.)

            regards, tom lane

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 07fe0e7cda..726ba59e2b 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2125,6 +2125,7 @@ qtext_load_file(Size *buffer_size)
     char       *buf;
     int            fd;
     struct stat stat;
+    Size        nread;

     fd = OpenTransientFile(PGSS_TEXT_FILE, O_RDONLY | PG_BINARY);
     if (fd < 0)
@@ -2165,23 +2166,35 @@ qtext_load_file(Size *buffer_size)
     }

     /*
-     * OK, slurp in the file.  If we get a short read and errno doesn't get
-     * set, the reason is probably that garbage collection truncated the file
-     * since we did the fstat(), so we don't log a complaint --- but we don't
-     * return the data, either, since it's most likely corrupt due to
-     * concurrent writes from garbage collection.
+     * OK, slurp in the file.  Windows fails if we try to read more than
+     * INT_MAX bytes at once, and other platforms might not like that either,
+     * so read a very large file in 1GB segments.
      */
-    errno = 0;
-    if (read(fd, buf, stat.st_size) != stat.st_size)
+    nread = 0;
+    while (nread < stat.st_size)
     {
-        if (errno)
-            ereport(LOG,
-                    (errcode_for_file_access(),
-                     errmsg("could not read file \"%s\": %m",
-                            PGSS_TEXT_FILE)));
-        free(buf);
-        CloseTransientFile(fd);
-        return NULL;
+        int            toread = Min(1024 * 1024 * 1024, stat.st_size - nread);
+
+        /*
+         * If we get a short read and errno doesn't get set, the reason is
+         * probably that garbage collection truncated the file since we did
+         * the fstat(), so we don't log a complaint --- but we don't return
+         * the data, either, since it's most likely corrupt due to concurrent
+         * writes from garbage collection.
+         */
+        errno = 0;
+        if (read(fd, buf + nread, toread) != toread)
+        {
+            if (errno)
+                ereport(LOG,
+                        (errcode_for_file_access(),
+                         errmsg("could not read file \"%s\": %m",
+                                PGSS_TEXT_FILE)));
+            free(buf);
+            CloseTransientFile(fd);
+            return NULL;
+        }
+        nread += toread;
     }

     if (CloseTransientFile(fd) != 0)
@@ -2189,7 +2202,7 @@ qtext_load_file(Size *buffer_size)
                 (errcode_for_file_access(),
                  errmsg("could not close file \"%s\": %m", PGSS_TEXT_FILE)));

-    *buffer_size = stat.st_size;
+    *buffer_size = nread;
     return buf;
 }


=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santamaria@gmail.com> writes:
> On Sun, Oct 31, 2021 at 6:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Here's a quick patch for that.  I don't have any ability to check it
>> on Windows, but the logic is easy to verify by reducing the arbitrary
>> constant to something small.  (I used 1GB, not INT_MAX, because I figured
>> we ought to read in multiples of a filesystem block if possible.)

> I have tested the patch in Windows and it works as expected.

Great, thanks for testing!

            regards, tom lane



Re: BUG #17254: Crash with 0xC0000409 in pg_stat_statements when pg_stat_tmp\pgss_query_texts.stat exceeded 2GB.

От
Juan José Santamaría Flecha
Дата:

On Sun, Oct 31, 2021 at 6:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here's a quick patch for that.  I don't have any ability to check it
on Windows, but the logic is easy to verify by reducing the arbitrary
constant to something small.  (I used 1GB, not INT_MAX, because I figured
we ought to read in multiples of a filesystem block if possible.)

 I have tested the patch in Windows and it works as expected.

Regards,

Juan José Santamaría Flecha

RE: BUG #17254: Crash with 0xC0000409 in pg_stat_statements when pg_stat_tmp\pgss_query_texts.stat exceeded 2GB.

От
"egashira.yusuke@fujitsu.com"
Дата:
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santamaria@gmail.com> writes:
> On Sun, Oct 31, 2021 at 6:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Here's a quick patch for that.  I don't have any ability to check it
>> on Windows, but the logic is easy to verify by reducing the arbitrary
>> constant to something small.  (I used 1GB, not INT_MAX, because I figured
>> we ought to read in multiples of a filesystem block if possible.)

> I have tested the patch in Windows and it works as expected.

I have also patched our environment to make sure it does not crash.
Thank you for your prompt work.

Regards.
Yusuke Egashira.