Обсуждение: SIGSEGV happens over once a day

Поиск
Список
Период
Сортировка

SIGSEGV happens over once a day

От
Richard Yen
Дата:
Hi all,

I'm experiencing signal 11 (segmentation fault) failures on the
master node of a 3-node Slony-I cluster.  In the past week, we've
averaged a little more than one segfault per day (11 times in the
past 10, including today).  Any ideas what's going on?

Would anyone know how to track this issue?

Don't know if attaching log output might help, but it's very similar
to the following (the responses to those threads didn't help us,
though):
http://archives.postgresql.org/pgsql-general/2004-06/msg01204.php
http://www.thescripts.com/forum/thread422225.html

Here's the machine where postgres is faulting:
db1 (Dell 6650):
master Slony-I node
postgreSQL version: 7.4.6
OS: Debian Linux 3.1
CPU: Xeon 4 X 2.5GHz
RAM: 8 GB
DISK:
      / 4 x 18 GB drive: raid 10
      /db/data/base 12 x 36 GB: raid 10
      /db/data/pg_xlog 2 x 73 GB: raid 1

The other two machines don't die, but they're set up pretty much the
same way.  The only difference is that db2 is running 8.1.3.

So what seems odd to me is that db1 and db3 are pretty much identical
(db3 has a 1.40GHz Xeon instead of a 2.5GHz, and some RAM
differences), yet postgres dies all the time on db1, but has yet to
die on db2 or db3, so I'm guessing maybe it's an UPDATE/INSERT/etc.?

Everything was running fine until last Tuesday, when this happened.
We've created no new stored procedures, made no changes, or anything
of the sort.

We've rebooted the db1 machine, but to no avail.  Any other suggestions?

Let me know if you need other info...

Any help would be greatly appreciated!
--Richard


Re: SIGSEGV happens over once a day

От
"Joshua D. Drake"
Дата:
Richard Yen wrote:
> Hi all,
>
> I'm experiencing signal 11 (segmentation fault) failures on the master
> node of a 3-node Slony-I cluster.  In the past week, we've averaged a
> little more than one segfault per day (11 times in the past 10,
> including today).  Any ideas what's going on?
>
> Would anyone know how to track this issue?
>
> Don't know if attaching log output might help, but it's very similar to
> the following (the responses to those threads didn't help us, though):
> http://archives.postgresql.org/pgsql-general/2004-06/msg01204.php
> http://www.thescripts.com/forum/thread422225.html
>
> Here's the machine where postgres is faulting:
> db1 (Dell 6650):
> master Slony-I node
> postgreSQL version: 7.4.6
> OS: Debian Linux 3.1
> CPU: Xeon 4 X 2.5GHz
> RAM: 8 GB
> DISK:
>      / 4 x 18 GB drive: raid 10
>      /db/data/base 12 x 36 GB: raid 10
>      /db/data/pg_xlog 2 x 73 GB: raid 1
>
> The other two machines don't die, but they're set up pretty much the
> same way.  The only difference is that db2 is running 8.1.3.
>
> So what seems odd to me is that db1 and db3 are pretty much identical
> (db3 has a 1.40GHz Xeon instead of a 2.5GHz, and some RAM differences),
> yet postgres dies all the time on db1, but has yet to die on db2 or db3,
> so I'm guessing maybe it's an UPDATE/INSERT/etc.?
>
> Everything was running fine until last Tuesday, when this happened.
> We've created no new stored procedures, made no changes, or anything of
> the sort.
>
> We've rebooted the db1 machine, but to no avail.  Any other suggestions?
>
> Let me know if you need other info...
>
> Any help would be greatly appreciated!

It sounds like a hardware problem. You could run GDB against a core dump
and see what it produces.

Sincerely,

Joshua D. Drake


> --Richard
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>


--

            === The PostgreSQL Company: Command Prompt, Inc. ===
      Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
      Providing the most comprehensive  PostgreSQL solutions since 1997
                     http://www.commandprompt.com/



Re: SIGSEGV happens over once a day

От
Martijn van Oosterhout
Дата:
On Thu, May 11, 2006 at 12:07:04PM -0700, Richard Yen wrote:
> Hi all,
>
> I'm experiencing signal 11 (segmentation fault) failures on the
> master node of a 3-node Slony-I cluster.  In the past week, we've
> averaged a little more than one segfault per day (11 times in the
> past 10, including today).  Any ideas what's going on?
>
> Would anyone know how to track this issue?

Best thing to do is:

- Make sure your postgres is compiled with debug info, either in the
binary or split out.
- Enable core dumps prior to starting the postmaster.
"ulimit -S -c unlimited" should do it.

When it crashes next you should get a core file in the data directory.
Run: gdb /path/to/postmaster /path/to/corefile
Type: bt
and post the result. That should pin-point it.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения