Обсуждение: AtAbort_Portsl problem
Hi, While ago, I reported a problem regarding exec_execute_message crash in transaction abort state if sync message issued right after parse, bind and execute message (which is normal if used with pgpool). After further investigation, I concluded that there's a serious problem with unnamed portal handling. The previous unnamed portal remains until next exec_bind_message call(it calls CreatePortal which drops unnamed portal to replace it with new one). If the next call to exec_parse_message fails by some reasons (for example, parse error), the previous unnamed portal remains with some data trashed by AtAbort_Portals. The reason why I see exec_execute_message crash is, it looks into the broken portal, especially portal->stmts: is_xact_command = IsTransactionStmtList(portal->stmts); unfortunately which was already freed by PortalReleaseCachedPlan call in AtAbort_Portals because portal->cplan and portal->stmts belong to the same memory context in some cases. The reason why we don't see the problem until now is probably a) just lucky b) libpq/drivers do not issue execute if parse fails. But bug is bug anyway, I believe we need to fix this. One of fixes I can think of is, set NULL to portal->stmts in AtAbort_Portsl(see attached patches against CVS Head). This will not result in memory leak since either portal->stmts belongs to the same memory context of portal->cplan (in this case the memory context will be dropped by PortalReleaseCachedPlan) or belongs to the same one of portal heap meory, which will be dropped at the same time when the portal itself dropped. -- Tatsuo Ishii SRA OSS, Inc. Japan Index: src/backend/utils/mmgr/portalmem.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/mmgr/portalmem.c,v retrieving revision 1.115 diff -c -r1.115 portalmem.c *** src/backend/utils/mmgr/portalmem.c 2 Jan 2010 16:57:58 -0000 1.115 --- src/backend/utils/mmgr/portalmem.c 17 Jan 2010 13:19:07 -0000 *************** *** 684,689 **** --- 684,700 ---- PortalReleaseCachedPlan(portal); /* + * Remove the reference to PlannedStmts and/or utility + * statements. This is neccessary since + * PortalReleaseCachedPlan might release memory context which + * stmts belogs to and subsequent call to exec_execute_message + * looks into stmts to check if it's a transaction + * command(which is allow to call even if in an aborted + * transaction state), which might refer to freed memory. + */ + portal->stmts = NULL; + + /* * Any resources belonging to the portal will be released in the * upcoming transaction-widecleanup; they will be gone before we run * PortalDrop.
Tatsuo Ishii <ishii@postgresql.org> writes: > While ago, I reported a problem regarding exec_execute_message crash > in transaction abort state if sync message issued right after parse, > bind and execute message (which is normal if used with pgpool). After > further investigation, I concluded that there's a serious problem with > unnamed portal handling. I was never able to get your test case to work (or, rather, crash). Does this patch address the original issue, or is this something different? regards, tom lane
> Tatsuo Ishii <ishii@postgresql.org> writes: > > While ago, I reported a problem regarding exec_execute_message crash > > in transaction abort state if sync message issued right after parse, > > bind and execute message (which is normal if used with pgpool). After > > further investigation, I concluded that there's a serious problem with > > unnamed portal handling. > > I was never able to get your test case to work (or, rather, crash). > Does this patch address the original issue, or is this something > different? The patch addresses the original issue. The reason why you didn't see crash was just you were lucky, I believe. I'm sure that your exec_execute_message looks into already-freed-memory. -- Tatsuo Ishii SRA OSS, Inc. Japan
Tatsuo Ishii <ishii@postgresql.org> writes: > The patch addresses the original issue. The reason why you didn't see > crash was just you were lucky, I believe. I'm sure that your > exec_execute_message looks into already-freed-memory. [ shrug... ] If it did, it would have crashed, because I invariably build with --cassert-enabled -> CLOBBER_FREED_MEMORY. I do now see the risk path you are talking about, I think: 1. bind to some fully_planned prepared statement, causing the Portal to link directly to a CachedPlan's statement list; 2. invalidate the prepared statement, so that the CachedPlanSource drops its reference to the CachedPlan; 3. AbortTransaction, so that AtAbort_Portals runs PortalReleaseCachedPlan. This makes the CachedPlan's reference countgo to zero, so it drops its memory. Now the Portal's stmts pointer is dangling; 4. try to execute the Portal. I do not believe it's possible to make this happen through libpq alone, at least not without a great deal more hacking than you did on it. libpq doesn't ever put very much in between binding a portal and executing it. But a non-libpq-based client could easily do it if it intermixed binding and executing a few different portals. Also step 2 might happen unluckily through a SI reset triggered by some concurrent session. I don't like the proposed patch though since it closes only one of the paths by which a Portal might drop its reference to a CachedPlan. I think the right place to clear portal->stmts is in PortalReleaseCachedPlan. regards, tom lane