Обсуждение: Reduce ProcArrayLock contention

Поиск

Список

Период

Сортировка

Reduce ProcArrayLock contention

От

Amit Kapila

Дата:

29 июня 2015 г., 18:27:43

I have been working on to analyze different ways to reduce

the contention around ProcArrayLock. I have evaluated mainly

2 ideas, first one is to partition the ProcArrayLock (the basic idea

is to allow multiple clients (equal to number of ProcArrayLock partitions)

to perform ProcArrayEndTransaction and then wait for all of them at

GetSnapshotData time) and second one is to have a mechanism to

GroupClear the Xid during ProcArrayEndTransaction() and the second

idea clearly stands out in my tests, so I have prepared the patch for that

to further discuss here.

The idea behind second approach (GroupClear Xid) is, first try to get

ProcArrayLock conditionally in ProcArrayEndTransaction(), if we get

the lock then clear the advertised XID, else set the flag (which indicates

that advertised XID needs to be clear for this proc) and push this

proc to pendingClearXidList. Except one proc, all other proc's will

wait for their Xid to be cleared. The only allowed proc will attempt the

lock acquiration, after acquring the lock, pop all of the requests off the

list using compare-and-swap, servicing each one before moving to

next proc, and clearing their Xids. After servicing all the requests

on pendingClearXidList, release the lock and once again go through
the saved pendingClearXidList and wake all the processes waiting
for their Xid to be cleared. To set the appropriate value for
ShmemVariableCache->latestCompletedXid, we need to advertise
latestXid incase proc needs to be pushed to pendingClearXidList.

Attached patch implements the above idea.

Performance Data

-----------------------------

RAM - 500GB

8 sockets, 64 cores(Hyperthreaded128 threads total)

Non-default parameters

------------------------------------

max_connections = 150

shared_buffers=8GB

min_wal_size=10GB

max_wal_size=15GB

checkpoint_timeout =35min

maintenance_work_mem = 1GB

checkpoint_completion_target = 0.9

wal_buffers = 256MB

pgbench setup

------------------------

scale factor - 300

Data is on magnetic disk and WAL on ssd.

pgbench -M prepared tpc-b

Head : commit 51d0fe5d

Patch -1 : group_xid_clearing_at_trans_end_rel_v1

Client Count/TPS	1	8	16	32	64	128
HEAD	814	6092	10899	19926	23636	17812
Patch-1	1086	6483	11093	19908	31220	28237

The graph for the data is attached.

Points about performance data

---------------------------------------------

1. Gives good performance improvement at or greater than 64 clients

and give somewhat moderate improvement at lower client count. The

reason is that because the contention around ProcArrayLock is mainly

seen at higher client count. I have checked that at higher client-count,

it started behaving lockless (which means performance with patch is

equivivalent to if we just comment out ProcArrayLock in

ProcArrayEndTransaction()).

2. There is some noise in this data (at 1 client count, I don't expect

much difference).

3. I have done similar tests on power-8 m/c and found similar gains.

4. The gains are visible when the data fits in shared_buffers as for other

workloads I/O starts dominating.

5. I have seen that effect of Patch is much more visible if we keep

autovacuum = off (do manual vacuum after each run) and keep

wal_writer_delay to lower value (say 20ms). I have not included that

data here, but if somebody is interested, I can do the detailed tests

against HEAD with those settings and share the results.

Here are steps used to take data (there are repeated for each reading)

--------------------------------------------------------------------------------------------------------

1. Start Server

2. dropdb postgres

3. createdb posters

4. pgbench -i -s 300 postgres

5. pgbench -c $threads -j $threads -T 1800 -M prepared postgres

6. checkpoint

7. Stop Server

Thanks to Robert Haas for having discussion (offlist) about the idea

and suggestions to improve it and also Andres Freund for having

discussion and sharing thoughts about this idea at PGCon.

Suggestions?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Client_Count/Patch_ver (TPS)	1	8	16	32	64	128
HEAD	972	6004	11060	20074	23839	17798
Patch	1005	6260	11368	20318	30775	30215

Commitid – 253de7e1
Client Count/No. Of Runs (tps)	128
Run-1	208011
Run-2	471598
Run-3	218295

Commitid – 0e141c0f
Client Count/No. Of Runs (tps)	128
Run-1	222839
Run-2	469483
Run-3	215791

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Reduce ProcArrayLock contention

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения