Обсуждение: server process segfaulting
Hi all, I have a bit of a problem I need some help with. I have a piece of software (a web thing) that uses postgresql, and there is a particular piece of code in it which seems to crash the postgres server with SIGSEGV. The logs are below. So first of all, are there any common gotchas that make postgres crash that I'm not aware of? The logs from the point where it is dying are below. The last queries before the segfault are coming from a trigger I wrote in plpython to do referential integrity checking for inherited tables (I posted about it before writing said code). Which leads me to believe that this is probably a problem with plpython. So does anyone know anything about plpython and segfaults? Next qn. I found this: http://snaga.org/pgsql/cvsweb.cgi/pgsql/src/pl/plpython/TODO?rev=1.1.1.1&content-type=text/x-cvsweb-markup&hideattic=0&only_with_tag=DT0_0 In point 3 it seems to suggest that if the schema of any of the tables change, then the plpython functions will need to be recreated. It doesn't actually say whether or not "making postgres unhappy" == segfault. I would like to try this and see if it will fix my problem, but I'm more than a little concerned about postgres removing all my triggers if I drop the function. Will postgres drop the triggers? If it does is there an easy way to what that document is suggesting and rebuild the triggers as I go? And ultimately, if plpython can't be made to work for this task, what's the best way forward? I had a quick look at the plpython source and I don't think it's something I'll be able to hack on in the short term. Am I better off writing a C module to do what I need to do? Any feedback much appreciated. oh, and PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.2 (Mandrake Linux 9.1 3.2.2-3mdk) from $ rpm -q postgresql postgresql-server postgresql-contrib postgresql-7.3.2-5mdk postgresql-server-7.3.2-5mdk postgresql-contrib-7.3.2-5mdk Thanks, James. The log: May 14 18:11:31 pirate postgres[11599]: [43] NOTICE: ('running foreign key check',) May 14 18:11:31 pirate postgres[11599]: [44] LOG: query: select * from referential_constraint where foreign_key ilike $1 May 14 18:11:31 pirate postgres[11599]: [45] LOG: query: select count(*) as count from "domain" where "id" = $1 May 14 18:11:31 pirate postgres[11421]: [7] LOG: server process (pid 11599) was terminated by signal 11 May 14 18:11:31 pirate postgres[11421]: [8] LOG: terminating any other active server processes May 14 18:11:31 pirate postgres[11421]: [9] LOG: all server processes terminated; reinitializing shared memory and semaphores May 14 18:11:31 pirate postgres[11600]: [10] LOG: database system was interrupted at 2003-05-14 18:02:23 EST May 14 18:11:31 pirate postgres[11600]: [11] LOG: checkpoint record is at 0/345728C May 14 18:11:31 pirate postgres[11600]: [12] LOG: redo record is at 0/345728C; undo record is at 0/0; shutdown TRUE May 14 18:11:31 pirate postgres[11600]: [13] LOG: next transaction id: 55444; next oid: 157552 May 14 18:11:31 pirate postgres[11600]: [14] LOG: database system was not properly shut down; automatic recovery in progress May 14 18:11:32 pirate postgres[11600]: [15] LOG: redo starts at 0/34572CC May 14 18:11:32 pirate postgres[11600]: [16] LOG: ReadRecord: record with zero length at 0/345F590 May 14 18:11:32 pirate postgres[11600]: [17] LOG: redo done at 0/345F4DC May 14 18:11:34 pirate postgres[11600]: [18] LOG: database system is ready
James Gregory <james@anchor.net.au> writes: > The logs from the point where it is dying are below. The last queries > before the segfault are coming from a trigger I wrote in plpython to do > referential integrity checking for inherited tables (I posted about it > before writing said code). Um. There was a report that plpython triggers get confused if you try to apply the same trigger procedure to multiple tables (it tries to use the first table's row descriptor with all the other tables, and yes that can lead to a segfault). AFAIR this is still unfixed in CVS tip --- someone had volunteered to produce a fix, but it has not materialized yet. In the meantime, you need to make a separate trigger function for each table :-( > In point 3 it seems to suggest that if the schema of any of the tables > change, then the plpython functions will need to be recreated. I don't think you need to recreate them, just start a fresh session. The cached row descriptors are only cached within a backend. regards, tom lane
On Thu, 2003-05-15 at 01:53, Tom Lane wrote: > James Gregory <james@anchor.net.au> writes: > > The logs from the point where it is dying are below. The last queries > > before the segfault are coming from a trigger I wrote in plpython to do > > referential integrity checking for inherited tables (I posted about it > > before writing said code). > > Um. There was a report that plpython triggers get confused if you try > to apply the same trigger procedure to multiple tables (it tries to use > the first table's row descriptor with all the other tables, and yes that > can lead to a segfault). Is it only plpython that has the problem? If I wanted to fix this where would I start looking? presumably pgsql/src/plpython/plpython.c. Do you have a link with more info about the bug by any chance? Many thanks for your help. My code is exhibiting exactly that behaviour, so it sounds like that's what the problem is. James.
James Gregory <james@anchor.net.au> writes: > On Thu, 2003-05-15 at 01:53, Tom Lane wrote: >> Um. There was a report that plpython triggers get confused if you try >> to apply the same trigger procedure to multiple tables (it tries to use >> the first table's row descriptor with all the other tables, and yes that >> can lead to a segfault). > Is it only plpython that has the problem? I'm not sure. It's only been reported against plpython, but it seems possible that our other PLs might have the same bug. I'd only be willing to bet that plpgsql doesn't have it, because that's the most heavily used PL and someone woulda noticed by now... > If I wanted to fix this where > would I start looking? presumably pgsql/src/plpython/plpython.c. Do you > have a link with more info about the bug by any chance? Not offhand, but if you search the PG list archives you will find the bug report. I think it was back around the beginning of this year. If fading memory serves, I suggested a quick-hack solution of including the target table's OID into the Python name of the function (so that triggers on different tables are automatically different Python objects) but whoever it was that was promising to do the legwork wanted to look for a cleaner approach. At this point I've lost faith in whoever-it-was, and would gladly accept a patch based on the quick-hack approach. regards, tom lane
The problem is that the information in the dictionary element TD[] that is used to store information is probably shared by all invocations of the function within the transaction. It is similar to the problem where all invokations share a common SD[] for a particular function in the scope of a connection. That this is a bug or a feature is debateable. Handling the memory scope is very tricky. This is an educated guess. I have not looked at the plpython code itself, altough I can vouch for the behaviour. elein On Wednesday 14 May 2003 19:39, Tom Lane wrote: > James Gregory <james@anchor.net.au> writes: > > On Thu, 2003-05-15 at 01:53, Tom Lane wrote: > >> Um. There was a report that plpython triggers get confused if you try > >> to apply the same trigger procedure to multiple tables (it tries to use > >> the first table's row descriptor with all the other tables, and yes that > >> can lead to a segfault). > > > Is it only plpython that has the problem? > > I'm not sure. It's only been reported against plpython, but it seems > possible that our other PLs might have the same bug. I'd only be > willing to bet that plpgsql doesn't have it, because that's the most > heavily used PL and someone woulda noticed by now... > > > If I wanted to fix this where > > would I start looking? presumably pgsql/src/plpython/plpython.c. Do you > > have a link with more info about the bug by any chance? > > Not offhand, but if you search the PG list archives you will find the bug > report. I think it was back around the beginning of this year. > > If fading memory serves, I suggested a quick-hack solution of including > the target table's OID into the Python name of the function (so that > triggers on different tables are automatically different Python objects) > but whoever it was that was promising to do the legwork wanted to look > for a cleaner approach. > > At this point I've lost faith in whoever-it-was, and would gladly accept > a patch based on the quick-hack approach. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > -- ============================================================= elein@varlena.com Database Consulting www.varlena.com PostgreSQL General Bits http:/www.varlena.com/GeneralBits/ "Free your mind the rest will follow" -- en vogue
We had this problem in spades when implementing the C interface for informix. The problem is in allocation, but it is a design problem, not a bug. The problem is local to plpython and not plpgsql because only plpython creates storage that can be accessed between function calls. Special care must be taken to ensure NEW and OLD are local to the instance and not the statement. And so must some of the context storage. There are several possible scopes for memory: session, connection, transaction, statement*, one call The semantics for statement are very hairy because of the theoretical infinite nesting of subselects and function calls. The place to attack the problem starts with the C interfaces. When allocating memory and attaching it to the function call info, how long does it last? Is the function writer expected to clean up? Allocated data's scope should be well defined to be the single instance of the function. Understanding these things and then examining the semantics of python dictionaries should lead to an understanding of sorts. I take advantage of the plpython SD storage as it stands and work around its limitations. This will be fodder for my talk at oscon on running aggregates. If anyone really wants to tackle this, be prepared. The memory scope issues are not simple, but they should be easier in postgresql than in informix because of the fe-be model. D'Arcy should be involved and I'd really like to go over scoping issues in more detail and perhaps help avoid some of the worst pitfalls since I've already done them. Hmmm. I think I could be clearer. If anyone is interested I can write something up. elein From tgl@sss.pgh.pa.us Tue Jun 3 19:48:28 2003 >X-UIDL: >Yc"!&F+!!6nR!!p2F!! >To: James Gregory <james@anchor.net.au> >cc: elein@varlena.com, pgsql-general@postgresql.org >Subject: Re: [GENERAL] server process segfaulting >In-reply-to: <1054690041.3891.56.camel@pirate.bridge.anchor.net.au> >References: <1052902703.6429.50.camel@pirate.bridge.anchor.net.au> <1052965133.6435.57.camel@pirate.bridge.anchor.net.au><4994.1052966347@sss.pgh.pa.us> <200306011555.19760.elein@varlena.com><1054690041.3891.56.camel@pirate.bridge.anchor.net.au> >Comments: In-reply-to James Gregory <james@anchor.net.au> > message dated "04 Jun 2003 11:27:22 +1000" >Date: Tue, 03 Jun 2003 22:47:59 -0400 >From: Tom Lane <tgl@sss.pgh.pa.us> >Content-Length: 897 >Lines: 20 > >James Gregory <james@anchor.net.au> writes: >> Is it worth tracing that through or is this not the problem? > >I believe the problem has been diagnosed as follows: the plpython stuff >is assuming that any one trigger function will be used with only one >tuple descriptor. Apply the same trigger function to two relations with >different rowtypes, and you get a crash, because the initially-cached >tuple descriptor is wrong for the second relation. It has nothing to do >with storage allocation. > >I'm not sure whether the problem occurs with any PL languages besides >plpython --- it seems like it could be a generic issue. I don't believe >that plpgsql suffers from it, because it's too widely used: we'd have >heard reports if it had the problem. But pltcl etc could have the same >problem for all I know. > >You can check the archives once Marc gets the pieces put back >together... > > regards, tom lane >
Elein Mustain <elein@tulip.norcov.com> writes: > The problem > is local to plpython and not plpgsql because only > plpython creates storage that can be accessed between > function calls. When I looked at it, I thought that it could be solved trivially by instantiating a separate Python object for each per-relation version of a trigger function. But not being a Python user, I didn't try to fix it because I couldn't test it very well. regards, tom lane