Re: Transactions involving multiple postgres foreign servers, take 2
От | Masahiko Sawada |
---|---|
Тема | Re: Transactions involving multiple postgres foreign servers, take 2 |
Дата | |
Msg-id | CAD21AoAyJnnuEt1i0X=aQ7aBFNqCgujToi_U3f8VzNS2ayLU0w@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Transactions involving multiple postgres foreign servers, take 2 (Masahiro Ikeda <ikedamsh@oss.nttdata.com>) |
Список | pgsql-hackers |
Sorry for the late reply. On Mon, Jul 5, 2021 at 3:29 PM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > > > > On 2021/06/30 10:05, Masahiko Sawada wrote: > > On Fri, Jun 25, 2021 at 9:53 AM Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote: > >> > >> Hi Jamison-san, sawada-san, > >> > >> Thanks for testing! > >> > >> FWIF, I tested using pgbench with "--rate=" option to know the server > >> can execute transactions with stable throughput. As sawada-san said, > >> the latest patch resolved second phase of 2PC asynchronously. So, > >> it's difficult to control the stable throughput without "--rate=" option. > >> > >> I also worried what I should do when the error happened because to increase > >> "max_prepared_foreign_transaction" doesn't work. Since too overloading may > >> show the error, is it better to add the case to the HINT message? > >> > >> BTW, if sawada-san already develop to run the resolver processes in parallel, > >> why don't you measure performance improvement? Although Robert-san, > >> Tunakawa-san and so on are discussing what architecture is best, one > >> discussion point is that there is a performance risk if adopting asynchronous > >> approach. If we have promising solutions, I think we can make the discussion > >> forward. > > > > Yeah, if we can asynchronously resolve the distributed transactions > > without worrying about max_prepared_foreign_transaction error, it > > would be good. But we will need synchronous resolution at some point. > > I think we at least need to discuss it at this point. > > > > I've attached the new version patch that incorporates the comments > > from Fujii-san and Ikeda-san I got so far. We launch a resolver > > process per foreign server, committing prepared foreign transactions > > on foreign servers in parallel. To get a better performance based on > > the current architecture, we can have multiple resolver processes per > > foreign server but it seems not easy to tune it in practice. Perhaps > > is it better if we simply have a pool of resolver processes and we > > assign a resolver process to the resolution of one distributed > > transaction one by one? That way, we need to launch resolver processes > > as many as the concurrent backends using 2PC. > > Thanks for updating the patches. > > I have tested in my local laptop and summary is the following. Thank you for testing! > > (1) The latest patch(v37) can improve throughput by 1.5 times compared to v36. > > Although I expected it improves by 2.0 times because the workload is that one > transaction access two remote servers... I think the reason is that the disk > is bottleneck and I couldn't prepare disks for each postgresql servers. If I > could, I think the performance can be improved by 2.0 times. > > > (2) The latest patch(v37) throughput of foreign_twophase_commit = required is > about 36% compared to the case if foreign_twophase_commit = disabled. > > Although the throughput is improved, the absolute performance is not good. It > may be the fate of 2PC. I think the reason is that the number of WAL writes is > much increase and, the disk writes in my laptop is the bottleneck. I want to > know the result testing in richer environments if someone can do so. > > > (3) The latest patch(v37) has no overhead if foreign_twophase_commit = > disabled. On the contrary, the performance improved by 3%. It may be within > the margin of error. > > > > The test detail is following. > > # condition > > * 1 coordinator and 3 foreign servers > > * 4 instance shared one ssd disk. > > * one transaction queries different two foreign servers. > > ``` fxact_update.pgbench > \set id random(1, 1000000) > > \set partnum 3 > \set p1 random(1, :partnum) > \set p2 ((:p1 + 1) % :partnum) + 1 > > BEGIN; > UPDATE part:p1 SET md5 = md5(clock_timestamp()::text) WHERE id = :id; > UPDATE part:p2 SET md5 = md5(clock_timestamp()::text) WHERE id = :id; > COMMIT; > ``` > > * pgbench generates load. I increased ${RATE} little by little until "maximum > number of foreign transactions reached" error happens. > > ``` > pgbench -f fxact_update.pgbench -R ${RATE} -c 8 -j 8 -T 180 > ``` > > * parameters > max_prepared_transactions = 100 > max_prepared_foreign_transactions = 200 > max_foreign_transaction_resolvers = 4 > > > # test source code patterns > > 1. 2pc patches(v36) based on 6d0eb385 (foreign_twophase_commit = required). > 2. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = required). > 3. 2pc patches(v37) based on 2595e039 (foreign_twophase_commit = disabled). > 4. 2595e039 without 2pc patches(v37). > > > # results > > 1. tps = 241.8000TPS > latency average = 10.413ms > > 2. tps = 359.017519 ( by 1.5 times compared to 1. by 0.36% compared to 3.) > latency average = 15.427ms > > 3. tps = 987.372220 ( by 1.03% compared to 4. ) > latency average = 8.102ms > > 4. tps = 955.984574 > latency average = 8.368ms > > The disk is the bottleneck in my environment because disk util is almost 100% > in every pattern. If disks for each instance can be prepared, I think we can > expect more performance improvements. It seems still not good performance. I'll also test using your script. > > > >> In my understanding, there are three improvement idea. First is that to make > >> the resolver processes run in parallel. Second is that to send "COMMIT/ABORT > >> PREPARED" remote servers in bulk. Third is to stop syncing the WAL > >> remove_fdwxact() after resolving is done, which I addressed in the mail sent > >> at June 3rd, 13:56. Since third idea is not yet discussed, there may > >> be my misunderstanding. > > > > Yes, those optimizations are promising. On the other hand, they could > > introduce complexity to the code and APIs. I'd like to keep the first > > version simple. I think we need to discuss them at this stage but can > > leave the implementation of both parallel execution and batch > > execution as future improvements. > > OK, I agree. > > > > For the third idea, I think the implementation was wrong; it removes > > the state file then flushes the WAL record. I think these should be > > performed in the reverse order. Otherwise, FdwXactState entry could be > > left on the standby if the server crashes between them. I might be > > missing something though. > > Oh, I see. I think you're right though what you wanted to say is that it > flushes the WAL records then removes the state file. If "COMMIT/ABORT > PREPARED" statements execute in bulk, it seems enough to sync the wal only > once, then remove all related state files. > > > BTW, I tested the binary building with -O2, and I got the following warnings. > It's needed to be fixed. > > ``` > fdwxact.c: In function 'PrepareAllFdwXacts': > fdwxact.c:897:13: warning: 'flush_lsn' may be used uninitialized in this > function [-Wmaybe-uninitialized] > 897 | canceled = SyncRepWaitForLSN(flush_lsn, false); > | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ``` Thank you for the report. I'll fix it in the next version patch. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
В списке pgsql-hackers по дате отправления:
Следующее
От: Masahiko SawadaДата:
Сообщение: Re: Transactions involving multiple postgres foreign servers, take 2