Обсуждение: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault
BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTs causing segmentation fault
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 15815 Logged by: Steve I Email address: postgres-ca@byerquest.com PostgreSQL version: 9.6.12 Operating system: Amazon Aurora Description: AWS Aurora, stable on 9.6.6/8 for a year, updated to 9.6.12: LOG: server process (PID 31294) was terminated by signal 11: Segmentation fault DETAIL: Failed process was running: simplied >>> SELECT value FROM {table} WHERE lower(substring(value,1,1000)) >= '{string}' LOG: terminating any other active server processes FATAL: Can't handle storage runtime process crash This specific SQL will cause a segfault on our dataset 100%. If I change any part of it it won't e.g. remove lower, or substring, or change > to <, or any part of the string. We have a few other variations, but this example is the most often reported and reproducible. Guidance on if this is a know issue, how to provide additional information to further trace it in an AWS environment, or how to bypass it, is most appreciated.
PG Bug reporting form <noreply@postgresql.org> writes: > AWS Aurora, stable on 9.6.6/8 for a year, updated to 9.6.12: > LOG: server process (PID 31294) was terminated by signal 11: Segmentation > fault > DETAIL: Failed process was running: > simplied >>> SELECT value FROM {table} WHERE lower(substring(value,1,1000)) >> = '{string}' > LOG: terminating any other active server processes Huh. Can you get a stack trace from that? https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend Also, could we see the definition of the table (psql \d would be helpful)? regards, tom lane
Em ter, 21 de mai de 2019 às 11:27, PG Bug reporting form <noreply@postgresql.org> escreveu: > > AWS Aurora, stable on 9.6.6/8 for a year, updated to 9.6.12: > Aurora is a Postgres fork so you should report it to Amazon. However, ... > LOG: server process (PID 31294) was terminated by signal 11: Segmentation > fault > DETAIL: Failed process was running: > simplied >>> SELECT value FROM {table} WHERE lower(substring(value,1,1000)) > >= '{string}' > LOG: terminating any other active server processes > FATAL: Can't handle storage runtime process crash > Could you reproduce it with stock Postgres? Could you provide a test case? -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
I probably need AWS hands to get a trace from behind the curtain. Started a thread there https://forums.aws.amazon.com/thread.jspa?threadID=303488&tstart=0
Column | Type | Modifiers
----------------+---------+------------------------------------------------------
a | integer | not null default nextval('{table3}_seq'::regclass)
b | integer |
c | integer |
d | text |
Indexes:
… PRIMARY KEY, btree (a)
… UNIQUE CONSTRAINT, btree (b, c)
… btree (b)
… btree (c)
… btree (lower("substring"(d, 1, 1000)) text_pattern_ops, b)
… btree (lower("substring"(d, 1, 1000)), b)
Foreign-key constraints:
… FOREIGN KEY (b) REFERENCES {table2}(b)
… FOREIGN KEY (c) REFERENCES {table1}(c)
> 200G so it would/will take time to run tests on stock Postgres.
On Tue, May 21, 2019 at 8:46 AM Euler Taveira <euler@timbira.com.br> wrote:
Em ter, 21 de mai de 2019 às 11:27, PG Bug reporting form
<noreply@postgresql.org> escreveu:
>
> AWS Aurora, stable on 9.6.6/8 for a year, updated to 9.6.12:
>
Aurora is a Postgres fork so you should report it to Amazon. However, ...
> LOG: server process (PID 31294) was terminated by signal 11: Segmentation
> fault
> DETAIL: Failed process was running:
> simplied >>> SELECT value FROM {table} WHERE lower(substring(value,1,1000))
> >= '{string}'
> LOG: terminating any other active server processes
> FATAL: Can't handle storage runtime process crash
>
Could you reproduce it with stock Postgres? Could you provide a test case?
--
Euler Taveira Timbira -
http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Steve <postgres-ca@byrequest.com> writes: > Column | Type | Modifiers > ----------------+---------+------------------------------------------------------ > a | integer | not null default > nextval('{table3}_seq'::regclass) > b | integer | > c | integer | > d | text | > Indexes: > … PRIMARY KEY, btree (a) > … UNIQUE CONSTRAINT, btree (b, c) > … btree (b) > … btree (c) > … btree (lower("substring"(d, 1, 1000)) text_pattern_ops, b) > … btree (lower("substring"(d, 1, 1000)), b) > Foreign-key constraints: > … FOREIGN KEY (b) REFERENCES {table2}(b) > … FOREIGN KEY (c) REFERENCES {table1}(c) Hm, so this query is probably using the last of those indexes --- could we see EXPLAIN output to confirm that? If so, a plausible explanation is that a portion of that index is corrupt, although it's certainly not very nice that you're getting a crash rather than an error report. If you're in a hurry to restore functionality, dropping and recreating that index would likely make the problem go away ... but it would also destroy the evidence we'd need to find the cause of the crash. So if you can hold off till we see the stack trace, that'd be nice. regards, tom lane
Yeah I agree, and it is, ... our first step was to regenerate all the indexes, but the segfault persists.
We're reproducing the case in a restored-from-snapshot db. Perhaps I'll reindex again there. Since we have a AWS snapshot we can jump back pretty fast to retest.
BINGO, An AWS Development Manager just stepped in… They've identified the problem and deploying a patch release https://forums.aws.amazon.com/thread.jspa?messageID=901775󜊏
On Tue, May 21, 2019 at 9:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Steve <postgres-ca@byrequest.com> writes:
> Column | Type | Modifiers
> ----------------+---------+------------------------------------------------------
> a | integer | not null default
> nextval('{table3}_seq'::regclass)
> b | integer |
> c | integer |
> d | text |
> Indexes:
> … PRIMARY KEY, btree (a)
> … UNIQUE CONSTRAINT, btree (b, c)
> … btree (b)
> … btree (c)
> … btree (lower("substring"(d, 1, 1000)) text_pattern_ops, b)
> … btree (lower("substring"(d, 1, 1000)), b)
> Foreign-key constraints:
> … FOREIGN KEY (b) REFERENCES {table2}(b)
> … FOREIGN KEY (c) REFERENCES {table1}(c)
Hm, so this query is probably using the last of those indexes ---
could we see EXPLAIN output to confirm that?
If so, a plausible explanation is that a portion of that index is corrupt,
although it's certainly not very nice that you're getting a crash rather
than an error report.
If you're in a hurry to restore functionality, dropping and recreating
that index would likely make the problem go away ... but it would also
destroy the evidence we'd need to find the cause of the crash. So if
you can hold off till we see the stack trace, that'd be nice.
regards, tom lane
Steve <postgres-ca@byrequest.com> writes: > BINGO, An AWS Development Manager just stepped in… They've identified the > problem and deploying a patch release > https://forums.aws.amazon.com/thread.jspa?messageID=901775󜊏 Oh, so it was their bug not ours? Sure wish there was more detail there. regards, tom lane
Re: BUG #15815: Upgraded from 9.6.8 > 9.6.12 on AWS Aurora: SELECTscausing segmentation fault
От
Michael Paquier
Дата:
On Tue, May 21, 2019 at 03:25:11PM -0400, Tom Lane wrote: > Steve <postgres-ca@byrequest.com> writes: >> BINGO, An AWS Development Manager just stepped in… They've identified the >> problem and deploying a patch release >> https://forums.aws.amazon.com/thread.jspa?messageID=901775󜊏 > > Oh, so it was their bug not ours? Sure wish there was more detail there. Aurora uses a different engine than Postgres as far as I understood, so we may likely not be impacted by that.. Let's see if we get any feedback. -- Michael
Вложения
Patched Aurora from 1.5.0 to 1.5.1 (which fixes an issue with index prefetch). The issue appears to be fully resolved.
Thanks for the help leading into this.
On Tue, May 21, 2019 at 7:21 PM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, May 21, 2019 at 03:25:11PM -0400, Tom Lane wrote:
> Steve <postgres-ca@byrequest.com> writes:
>> BINGO, An AWS Development Manager just stepped in… They've identified the
>> problem and deploying a patch release
>> https://forums.aws.amazon.com/thread.jspa?messageID=901775󜊏
>
> Oh, so it was their bug not ours? Sure wish there was more detail there.
Aurora uses a different engine than Postgres as far as I understood,
so we may likely not be impacted by that.. Let's see if we get any
feedback.
--
Michael