Re: Possible performance regression in version 10.1 with pgbenchread-write tests.

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: Possible performance regression in version 10.1 with pgbenchread-write tests.
Дата
Msg-id CAEepm=31A_tpsgdHP8evosoesq9qBNt5_dTk4CR+TYs5Wzr4AQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Possible performance regression in version 10.1 with pgbench read-write tests.  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Possible performance regression in version 10.1 with pgbenchread-write tests.  (Thomas Munro <thomas.munro@enterprisedb.com>)
Re: Possible performance regression in version 10.1 with pgbenchread-write tests.  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
On Sun, Jul 22, 2018 at 8:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2018-07-20 16:43:33 -0400, Tom Lane wrote:
>>> On my RHEL6 machine, with unmodified HEAD and 8 sessions (since I've
>>> only got 8 cores) but other parameters matching Mithun's example,
>>> I just got
>
>> It's *really* common to have more actual clients than cpus for oltp
>> workloads, so I don't think it's insane to test with more clients.
>
> I finished a set of runs using similar parameters to Mithun's test except
> for using 8 clients, and another set using 72 clients (but, being
> impatient, 5-minute runtime) just to verify that the results wouldn't
> be markedly different.  I got TPS numbers like this:
>
>                                 8 clients       72 clients
>
> unmodified HEAD                 16112           16284
> with padding patch              16096           16283
> with SysV semas                 15926           16064
> with padding+SysV               15949           16085
>
> This is on RHEL6 (kernel 2.6.32-754.2.1.el6.x86_64), hardware is dual
> 4-core Intel E5-2609 (Sandy Bridge era).  This hardware does show NUMA
> effects, although no doubt less strongly than Mithun's machine.
>
> I would like to see some other results with a newer kernel.  I tried to
> repeat this test on a laptop running Fedora 28, but soon concluded that
> anything beyond very short runs was mainly going to tell me about thermal
> throttling :-(.  I could possibly get repeatable numbers from, say,
> 1-minute SELECT-only runs, but that would be a different test scenario,
> likely one with a lot less lock contention.

I did some testing on 2-node, 4-node and 8-node systems running Linux
3.10.something (slightly newer but still ancient).  Only the 8-node
box (= same one Mithun used) shows the large effect (the 2-node box
may be a tiny bit faster patched but I'm calling that noise for now...
it's not slower, anyway).

On the problematic box, I also tried some different strides (char
padding[N - sizeof(sem_t)]) and was surprised by the result:

Unpatched = ~35k TPS
64 byte stride = ~35k TPS
128 byte stride = ~42k TPS
4096 byte stride = ~47k TPS

Huh.  PG_CACHE_LINE_SIZE is 128, but the true cache line size on this
system is 64 bytes.  That exaggeration turned out to do something
useful, though I can't explain it.

While looking for discussion of 128 byte cache effects I came across
the Intel "L2 adjacent cache line prefetcher"[1].  Maybe this, or some
of the other prefetchers (enabled in the BIOS) or related stuff could
be at work here.  It could be microarchitecture-dependent (this is an
old Westmere box), though I found a fairly recent discussion about a
similar effect[2] that mentions more recent hardware.  The spatial
prefetcher reference can be found in the Optimization Manual[3].

[1] https://software.intel.com/en-us/articles/disclosure-of-hw-prefetcher-control-on-some-intel-processors
[2] https://groups.google.com/forum/#!msg/mechanical-sympathy/i3-M2uCYTJE/P7vyoOTIAgAJ
[3]
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

-- 
Thomas Munro
http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Non-portable shell code in pg_upgrade tap tests
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Get Columns from Plan