Re: Perform streaming logical transactions by background workers and parallel apply
От | Masahiko Sawada |
---|---|
Тема | Re: Perform streaming logical transactions by background workers and parallel apply |
Дата | |
Msg-id | CAD21AoDo+yUwNq6nTrvE2h9bB2vZfcag=jxWc7QxuWCmkDAqcA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Perform streaming logical transactions by background workers and parallel apply (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: Perform streaming logical transactions by background workers and parallel apply
(Amit Kapila <amit.kapila16@gmail.com>)
Re: Perform streaming logical transactions by background workers and parallel apply (Amit Kapila <amit.kapila16@gmail.com>) RE: Perform streaming logical transactions by background workers and parallel apply ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
Список | pgsql-hackers |
On Fri, Apr 28, 2023 at 11:51 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Apr 26, 2023 at 4:11 PM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > On Wednesday, April 26, 2023 5:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > > > > > > IIUC, that assert will fail in case of any error raised between > > > ApplyWorkerMain()->logicalrep_worker_attach()->before_shmem_exit() and > > > ApplyWorkerMain()->InitializeApplyWorker()->BackgroundWorkerInitializeC > > > onnectionByOid()->InitPostgres(). > > > > Thanks for reporting the issue. > > > > I think the problem is that it tried to release locks in > > logicalrep_worker_onexit() before the initialization of the process is complete > > because this callback function was registered before the init phase. So I think we > > can add a conditional statement before releasing locks. Please find an attached > > patch. > > > > Alexander, does the proposed patch fix the problem you are facing? > Sawada-San, and others, do you see any better way to fix it than what > has been proposed? I'm concerned that the idea of relying on IsNormalProcessingMode() might not be robust since if we change the meaning of IsNormalProcessingMode() some day it would silently break again. So I prefer using something like InitializingApplyWorker, or another idea would be to do cleanup work (e.g., fileset deletion and lock release) in a separate callback that is registered after connecting to the database. While investigating this issue, I've reviewed the code around callbacks and worker termination etc and I found a problem. A parallel apply worker calls the before_shmem_exit callbacks in the following order: 1. ShutdownPostgres() 2. logicalrep_worker_onexit() 3. pa_shutdown() Since the worker is detached during logicalrep_worker_onexit(), MyLogicalReplication->leader_pid is an invalid when we call pa_shutdown(): static void pa_shutdown(int code, Datum arg) { Assert(MyLogicalRepWorker->leader_pid != InvalidPid); SendProcSignal(MyLogicalRepWorker->leader_pid, PROCSIG_PARALLEL_APPLY_MESSAGE, InvalidBackendId); Also, if the parallel apply worker fails shm_toc_lookup() during the initialization, it raises an error (because of noError = false) but ends up a SEGV as MyLogicalRepWorker is still NULL. I think that we should not use MyLogicalRepWorker->leader_pid in pa_shutdown() but instead store the leader's pid to a static variable before registering pa_shutdown() callback. And probably we can remember the backend id of the leader apply worker to speed up SendProcSignal(). FWIW, we might need to be careful about the timing when we call logicalrep_worker_detach() in the worker's termination process. Since we rely on IsLogicalParallelApplyWorker() for the parallel apply worker to send ERROR messages to the leader apply worker, if an ERROR happens after logicalrep_worker_detach(), we will end up with the assertion failure. if (IsLogicalParallelApplyWorker()) SendProcSignal(pq_mq_parallel_leader_pid, PROCSIG_PARALLEL_APPLY_MESSAGE, pq_mq_parallel_leader_backend_id); else { Assert(IsParallelWorker()); It normally would be a should-no-happen case, though. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления:
Следующее
От: Masahiko SawadaДата:
Сообщение: Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode