Обсуждение: 9.6, background worker processes, and PL/Java
Hi, I have a report of a PL/Java crash in 9.6 where the stack trace suggests it was trying to initialize in a background worker process (not sure why that even happened, yet), and by my first glance, it seems to have crashed dereferencing MyProcPort, which I am guessing a BGW might not always have (?). So, as I try to get up to speed on this PostgreSQL feature, it seems to me that I have up to three different cases that I may need to make PL/Java detect and respond appropriately to. (If you see me veering into any misconceptions, please let me know.) 1. A worker explicitly created with Register... or RegisterDynamic... that has not called ...InitializeConnection... andso isn't any particular user or connected to any database. 2. A worker explicitly created that has called ...Initialize... and therefore is connected to some database as some user. (So, is there a MyProcPort in this case?) 3. A worker implicitly created for a parallel query plan (and therefore associated with a database and a user). Does thishave a MyProcPort? Case 1, I think I at most need to detect and ereport. It is hard to imagine how it could even arise, as without a database connection there's no pg_extension, pg_language, or pg_proc, but I suppose it could happen if someone misguidedly puts libpljava in shared_preload_libraries, or some other bgw code inexplicably loads it. It's a non-useful case as PL/Java has nothing to do without a database connection and sqlj schema. Case 2 might be worth supporting, but I may need to account for anything that differs in this environment from a normal connected backend. Case 3 seems most likely. It should only be possible by invoking a declared Java function that somebody marked parallel-safe, right? In the parallel-unsafe or -restricted cases, PL/Java can only find itself invoked within the leader process? Such a leader process can only be a normal backend? Or perhaps also a case-2 explicitly created BGW that is executing a query? My main question is, what state do I need to examine at startup in order to distinguish these cases? Do I detect I'm in a BGW by a non-null MyBgworkerEntry? If it's there, do I detect whether I have a database and an identity by checking for a MyProcPort, or some other way? As for declaring functions parallel-unsafe, -restricted, or -safe, I assume there should be no problems with PL/Java functions with the default designation of unsafe. There should be no essential problem if someone declares a function -restricted - provided PL/Java itself can be audited to make sure it doesn't do any of the things restricted functions can't do - as it will only be running in the leader process anyway. Even should somebody mark a PL/Java function safe, while hard to imagine a good case for, shouldn't really break anything; as the workers are separate processes, this should be safe. Any imagined speed advantage of the parallel query is likely to evaporate while the several processes load their own JVMs, but nothing should outright break. That leads me to: Are BGWs for parallel queries born fresh for each query, or do they get pooled and reused? If pooled, can they be reused across backends/database connections/ identities, or only by the backend that created them? If reusable across contexts, that's a dealbreaker and I'd have to have PL/Java reject any parallel-safe declaration, but a pool tied to a connection should be ok (and better yet, allow amortizing the JVM startup cost). If pooled, and tied to the backend that started them, do they need to do anything special to detect when the leader has executed SET ROLE or SET SESSION AUTHORIZATION? If all of this is covered to death in some document I obviously haven't read, please feel free to point me to it. Thanks! -Chap
On 10/25/16 18:56, Chapman Flack wrote: > If pooled, and tied to the backend that started them, do they need > to do anything special to detect when the leader has executed > SET ROLE or SET SESSION AUTHORIZATION? Let me guess ... such information is *not* synchronized across workers, and that'd be why the manual says "functions must be marked PARALLEL RESTRICTED if they access ... client connection state ..."? That's probably a resounding 'no' for declaring any PL/Java function SAFE, then. And if changing "the transaction state even temporarily (e.g. a PL/pgsql function which establishes an EXCEPTION block to catch errors)" is enough to require UNSAFE, then it may be that RESTRICTED is off limits too, as there are places PL/Java does that internally. I take it that example refers not to just any use of PG_TRY/PG_CATCH, but only to those uses where an internal subtransaction is used to allow execution to continue? If a person writes a function in some language (SQL, for example), declares it PARALLEL SAFE but is lying because it calls another function (in Java, say) that is PARALLEL UNSAFE or RESTRICTED, does PostgreSQL detect or prevent that, or is it just considered an unfortunate mistake by the goofball who declared the first function safe? And if that's not already prevented, could it be worth adding code in the PL/Java call handler to detect such a situation and make sure it ends in a meaningful ereport and not something worse? -Chap
On Wed, Oct 26, 2016 at 4:26 AM, Chapman Flack <chap@anastigmatix.net> wrote: > Hi, > > I have a report of a PL/Java crash in 9.6 where the stack trace > suggests it was trying to initialize in a background worker > process (not sure why that even happened, yet), and by my first > glance, it seems to have crashed dereferencing MyProcPort, which > I am guessing a BGW might not always have (?). > > So, as I try to get up to speed on this PostgreSQL feature, it > seems to me that I have up to three different cases that I may > need to make PL/Java detect and respond appropriately to. (If > you see me veering into any misconceptions, please let me know.) > > 1. A worker explicitly created with Register... or RegisterDynamic... > that has not called ...InitializeConnection... and so isn't > any particular user or connected to any database. > > 2. A worker explicitly created that has called ...Initialize... > and therefore is connected to some database as some user. > (So, is there a MyProcPort in this case?) > > 3. A worker implicitly created for a parallel query plan (and therefore > associated with a database and a user). Does this have a MyProcPort? > No, parallel workers in parallel query doesn't have MyProcPort. > > Case 1, I think I at most need to detect and ereport. It is hard to > imagine how it could even arise, as without a database connection > there's no pg_extension, pg_language, or pg_proc, but I suppose it > could happen if someone misguidedly puts libpljava in > shared_preload_libraries, or some other bgw code inexplicably loads > it. It's a non-useful case as PL/Java has nothing to do without > a database connection and sqlj schema. > > Case 2 might be worth supporting, but I may need to account for > anything that differs in this environment from a normal connected > backend. > > Case 3 seems most likely. It should only be possible by invoking > a declared Java function that somebody marked parallel-safe, right? > In the parallel-unsafe or -restricted cases, PL/Java can only find > itself invoked within the leader process? > > Such a leader process can only be a normal backend? Or perhaps also > a case-2 explicitly created BGW that is executing a query? > > My main question is, what state do I need to examine at startup > in order to distinguish these cases? Do I detect I'm in a BGW by > a non-null MyBgworkerEntry? If it's there, do I detect whether > I have a database and an identity by checking for a MyProcPort, > or some other way? > > As for declaring functions parallel-unsafe, -restricted, or -safe, > I assume there should be no problems with PL/Java functions with > the default designation of unsafe. There should be no essential > problem if someone declares a function -restricted - provided PL/Java > itself can be audited to make sure it doesn't do any of the things > restricted functions can't do - as it will only be running in the > leader process anyway. > > Even should somebody mark a PL/Java function safe, while hard to > imagine a good case for, shouldn't really break anything; as the > workers are separate processes, this should be safe. Any imagined > speed advantage of the parallel query is likely to evaporate while > the several processes load their own JVMs, but nothing should > outright break. > > That leads me to: > > Are BGWs for parallel queries born fresh for each query, or do they > get pooled and reused? > born fresh for each query. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 26, 2016 at 7:39 AM, Chapman Flack <chap@anastigmatix.net> wrote: > On 10/25/16 18:56, Chapman Flack wrote: > >> If pooled, and tied to the backend that started them, do they need >> to do anything special to detect when the leader has executed >> SET ROLE or SET SESSION AUTHORIZATION? > > Let me guess ... such information is *not* synchronized across workers, > and that'd be why the manual says "functions must be marked PARALLEL > RESTRICTED if they access ... client connection state ..."? > All the GUCs are synchronised between leader and worker backends. > That's probably a resounding 'no' for declaring any PL/Java function > SAFE, then. > > And if changing "the transaction state even temporarily (e.g. a PL/pgsql > function which establishes an EXCEPTION block to catch errors)" is enough > to require UNSAFE, then it may be that RESTRICTED is off limits too, as > there are places PL/Java does that internally. > > I take it that example refers not to just any use of PG_TRY/PG_CATCH, > but only to those uses where an internal subtransaction is used to > allow execution to continue? > > If a person writes a function in some language (SQL, for example), > declares it PARALLEL SAFE but is lying because it calls another > function (in Java, say) that is PARALLEL UNSAFE or RESTRICTED, > does PostgreSQL detect or prevent that, or is it just considered > an unfortunate mistake by the goofball who declared the first > function safe? > No, we don't detect that explicitly before initiating parallelism, however there are checks in code which will report error if you do something unsafe in worker, example perform any write operation in worker. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 26 October 2016 at 06:56, Chapman Flack <chap@anastigmatix.net> wrote: > My main question is, what state do I need to examine at startup > in order to distinguish these cases? For loaded in shared_preload_libraries, test IsPostmasterEnvironment && !IsUnderPostmaster See src/backend/utils/init/globals.c > Do I detect I'm in a BGW by > a non-null MyBgworkerEntry? Use IsBackgroundWorker, same place as above. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10/26/16 07:04, Amit Kapila wrote: > No, parallel workers in parallel query doesn't have MyProcPort. Ok ... it turns out I was using MyProcPort as a quick way to grab database_name and user_name (very early in startup, for a purpose analogous to setting a 'ps' process title), and that seemed more lightweight than other methods of getting the database and user Oids and mapping those to the corresponding names. But I guess I can change that easily enough. > ... >> Are BGWs for parallel queries born fresh for each query, or do they >> get pooled and reused? > > born fresh for each query. Yikes. But ok, if there's ever a reason to try to make a "safe" Java function, I see there is a parallel_setup_cost GUC that could be used to inform the planner of the higher cost when BGWs have to start JVMs, so it probably wouldn't make parallel plans often, but still could if analysis showed a sufficient advantage. On 10/26/16 07:15, Amit Kapila wrote: > All the GUCs are synchronised between leader and worker backends. Ah, thanks. I have now found README.parallel, so I much better understand what is synchronized, and what operations are allowed or not. :) On 10/26/16 07:42, Craig Ringer wrote: > > For loaded in shared_preload_libraries, test > > IsPostmasterEnvironment && !IsUnderPostmaster Hmm, IsUnderPostmaster is PGDLLIMPORTed but IsPostmasterEnvironment isn't, so I'm out of luck on Windows. Is there another way I can check? >> Do I detect I'm in a BGW by a non-null MyBgworkerEntry? > > Use IsBackgroundWorker, same place as above. Also not PGDLLIMPORTed. MyBgworkerEntry is, though. It does appear to be initialized to NULL. Can I get away with checking that, since I can't see IsBackgroundWorker? I now see what caused the reported crash. It was a parallel query that did not make any use of PL/Java functions, but the group leader had used them before so the library was loaded, so ParallelWorkerMain loaded it in the worker process, so _PG_init got called and was going to refer to stuff that wasn't set up yet, because the library loading comes pretty early in ParallelWorkerMain. I think I could easily fix that by having the library init code just bail right after defining the custom GUCs, if InitializingParallelWorker is true. Alas, InitializingParallelWorker isn't PGDLLIMPORTed either. This isn't my day. Is there a way I can successfully infer that on Windows? I guess I can just bail from initialization early when in *any* kind of background worker, and just leave the rest to be done when called through the language handler, if ever. This would be so much easier if Visual Studio were not a thing. -Chap
On 27 October 2016 at 09:22, Chapman Flack <chap@anastigmatix.net> wrote: > Hmm, IsUnderPostmaster is PGDLLIMPORTed but IsPostmasterEnvironment isn't, > so I'm out of luck on Windows. Is there another way I can check? > >>> Do I detect I'm in a BGW by a non-null MyBgworkerEntry? >> >> Use IsBackgroundWorker, same place as above. > > Also not PGDLLIMPORTed. MyBgworkerEntry is, though. It does appear to be > initialized to NULL. Can I get away with checking that, since I can't see > IsBackgroundWorker? > > I now see what caused the reported crash. It was a parallel query that > did not make any use of PL/Java functions, but the group leader had used > them before so the library was loaded, so ParallelWorkerMain loaded it > in the worker process, so _PG_init got called and was going to refer to > stuff that wasn't set up yet, because the library loading comes pretty > early in ParallelWorkerMain. > > I think I could easily fix that by having the library init code just bail > right after defining the custom GUCs, if InitializingParallelWorker > is true. > > Alas, InitializingParallelWorker isn't PGDLLIMPORTed either. This isn't > my day. Is there a way I can successfully infer that on Windows? Please submit a patch to make them all PGDLLIMPORT. They clearly should be, for use in bgworkers. I'd consider that a bugfix personally and hope it can be backpatched to the stable branches. It's not going to break anything since nothing external that runs on Windows can previously have been referring to these symbols. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services