On Thu, Feb 8, 2024 at 6:33 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Sawada-san,
>
> Thanks for making v3 patchset. I have also benchmarked the case [1].
> Below results are the average of 5th, there are almost the same result
> even when median is used for the comparison. On my env, the regression
> cannot be seen.
>
> HEAD (1e285a5) HEAD + v3 patches difference
> 10910.722 ms 10714.540 ms around 1.8%
>
Thank you for doing the performance test!
> Also, here are mino comments for v3 set.
>
> 01.
> bh_nodeidx_entry and ReorderBufferMemTrackState is missing in typedefs.list.
Will add them.
>
> 02. ReorderBufferTXNSizeCompare
> Should we assert {ta, tb} are not NULL?
Not sure we really need it as other binaryheap users don't have such checks.
On Tue, Feb 6, 2024 at 2:45 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > I've run a benchmark test that I shared before[1]. Here are results of
> > decoding a transaction that has 1M subtransaction each of which has 1
> > INSERT:
> >
> > HEAD:
> > 1810.192 ms
> >
> > HEAD w/ patch:
> > 2001.094 ms
> >
> > I set a large enough value to logical_decoding_work_mem not to evict
> > any transactions. I can see about about 10% performance regression in
> > this case.
>
> Thanks for running. I think this workload is the worst and an extreme case which
> would not be occurred on the real system (Such a system should be fixed), so we
> can say that the regression is up to -10%. I felt it could be negligible but how
> do other think?
I think this performance regression is not acceptable. In this
workload, one transaction has 10k subtransactions and the logical
decoding becomes quite slow if logical_decoding_work_mem is not big
enough. Therefore, it's a legitimate and common approach to increase
logical_decoding_work_mem to speedup the decoding. However, with thie
patch, the decoding becomes slower than today. It's a bad idea in
general to optimize an extreme case while sacrificing the normal (or
more common) cases.
Therefore, I've improved the algorithm so that we don't touch the
max-heap at all if the number of transactions is small enough. I've
run benchmark test with two workloads:
workload-1, decode single transaction with 800k tuples (normal.sql):
* without spill
HEAD: 13235.136 ms
v3 patch: 14320.082 ms
v4 patch: 13300.665 ms
* with spill
HEAD: 22970.204 ms
v3 patch: 23625.649 ms
v4 patch: 23304.366
workload-2, decode one transaction with 100k subtransaction (many-subtxn.sql):
* without spill
HEAD: 345.718 ms
v3 patch: 409.686 ms
v4 patch: 353.026 ms
* with spill
HEAD: 136718.313 ms
v3 patch: 2675.539 ms
v4 patch: 2734.981 ms
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com