Обсуждение: Dead Space Map version 3 (simplified)
Attached is an updated DSM patch. I've left the core function of DSM only and dropped other complicated features in this release. VACUUM finishs faster with the patch, but it's obvious... DSM vacuum sweeps only pages that have many dead tuples and leave some of them after vacuum. I'll examine the sweep behavior and the performance from now. * Features - DSM tracks pages worth vacuuming using 1bit/page bit. The threshold is two dead tuples or 2kB of deadspaces. - DSM is constructed at page flush. Almost of the works are done by bgwriter if it is properly configured. - 'VACUUM' command uses DSM. 'VACUUM ALL' always scans all pages. - This is including n_dead_tuples statistics fix. http://momjian.us/mhonarc/patches/msg00002.html * Configuration - max_dsm_relations (=1000) Counterpart to max_fsm_relations, but count tables only; Indexes are not tracked by DSM. - max_dsm_pages (=1024000) Counterpart to max_dsm_pages. Default values are configurated to 5 times of max_fsm_pages at initdb. - min_dsm_target (=8MB) Minimum size of tables of which dead space is tracked to avoid tracking small tables, including system catalogs. * Limitation - XID-wraparound vacuum is still required. VACUUM with DSM cannot update relfrozenxid, so we sometimes needs full-scan. - No recovery support. All contents of DSM and FSM are lost on crash. - DSM uses fixed size memory allocated at server start. We cannot change the value on-the-fly. If we want the feature, we need something like shared-memory-allocator or swap-supported memory management module. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
Вложения
Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --------------------------------------------------------------------------- ITAGAKI Takahiro wrote: > Attached is an updated DSM patch. I've left the core function of DSM only > and dropped other complicated features in this release. > > VACUUM finishs faster with the patch, but it's obvious... DSM vacuum > sweeps only pages that have many dead tuples and leave some of them > after vacuum. > > I'll examine the sweep behavior and the performance from now. > > > * Features > - DSM tracks pages worth vacuuming using 1bit/page bit. > The threshold is two dead tuples or 2kB of deadspaces. > - DSM is constructed at page flush. Almost of the works are done by > bgwriter if it is properly configured. > - 'VACUUM' command uses DSM. 'VACUUM ALL' always scans all pages. > - This is including n_dead_tuples statistics fix. > http://momjian.us/mhonarc/patches/msg00002.html > > * Configuration > - max_dsm_relations (=1000) > Counterpart to max_fsm_relations, but count tables only; > Indexes are not tracked by DSM. > - max_dsm_pages (=1024000) > Counterpart to max_dsm_pages. Default values are configurated to > 5 times of max_fsm_pages at initdb. > - min_dsm_target (=8MB) > Minimum size of tables of which dead space is tracked > to avoid tracking small tables, including system catalogs. > > * Limitation > - XID-wraparound vacuum is still required. VACUUM with DSM cannot > update relfrozenxid, so we sometimes needs full-scan. > - No recovery support. All contents of DSM and FSM are lost on crash. > - DSM uses fixed size memory allocated at server start. We cannot change > the value on-the-fly. If we want the feature, we need something like > shared-memory-allocator or swap-supported memory management module. > > Regards, > --- > ITAGAKI Takahiro > NTT Open Source Software Center [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On 3/30/07, ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote:
Attached is an updated DSM patch. I've left the core function of DSM only
and dropped other complicated features in this release.
I was testing this patch when got this server crash. The patch is applied
on the current CVS HEAD. I thought you would be interested in this.
The patch worked for smaller scaling factor and its reproducible.
Test: pgbench -s 90 -i -F 95 postgres
Stack:
(gdb) bt
#0 0x001d37a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00213955 in raise () from /lib/tls/libc.so.6
#2 0x00215319 in abort () from /lib/tls/libc.so.6
#3 0x082dc04f in ExceptionalCondition (conditionName=0x83a7ad7 "!(victim)", errorType=0x83a7622 "FailedAssertion",
fileName=0x83a7487 "deadspace.c", lineNumber=1080) at assert.c:51
#4 0x0821eb29 in dsm_create_chunk (dsmrel=0xb7bcd744, key=0xbff589c0) at deadspace.c:1080
#5 0x0821d473 in dsm_record_state (rnode=0xaee02698, pageno=98304, state=DSM_LOW) at deadspace.c:333
#6 0x0821d29e in DegradeDeadSpaceState (rel=0xaee02698, buffer=10645) at deadspace.c:254
#7 0x0817b542 in lazy_scan_heap (onerel=0xaee02698, vacrelstats=0x9f5f2e0, Irel=0x9f5f4dc, nindexes=1, iter=0x9f5f57c)
at vacuumlazy.c:586
#8 0x0817a733 in lazy_vacuum_rel (onerel=0xaee02698, vacstmt=0x9f39c94) at vacuumlazy.c:209
#9 0x08174e5c in vacuum_rel (relid=16388, vacstmt=0x9f39c94, expected_relkind=114 'r') at vacuum.c:1107
#10 0x0817421c in vacuum (vacstmt=0x9f39c94, relids=0x0, isTopLevel=1 '\001') at vacuum.c:401
#11 0x0823d90b in ProcessUtility (parsetree=0x9f39c94, queryString=0x9f62f94 "vacuum analyze", params=0x0,
isTopLevel=1 '\001', dest=0x9f39cf0, completionTag=0xbff5a040 "") at utility.c:929
#12 0x0823bdd6 in PortalRunUtility (portal=0x9f60f8c, utilityStmt=0x9f39c94, isTopLevel=1 '\001', dest=0x9f39cf0,
completionTag=0xbff5a040 "") at pquery.c:1170
#13 0x0823bf0a in PortalRunMulti (portal=0x9f60f8c, isTopLevel=1 '\001', dest=0x9f39cf0, altdest=0x9f39cf0,
completionTag=0xbff5a040 "") at pquery.c:1262
#14 0x0823b6df in PortalRun (portal=0x9f60f8c, count=2147483647, isTopLevel=1 '\001', dest=0x9f39cf0, altdest=0x9f39cf0,
completionTag=0xbff5a040 "") at pquery.c:809
#15 0x082365df in exec_simple_query (query_string=0x9f399d4 "vacuum analyze") at postgres.c:956
#16 0x08239e43 in PostgresMain (argc=4, argv=0x9ecfc94, username=0x9ecfc64 "perf") at postgres.c:3503
#17 0x08204e84 in BackendRun (port=0x9ee3628) at postmaster.c:2987
#18 0x08204493 in BackendStartup (port=0x9ee3628) at postmaster.c:2614
#19 0x0820228b in ServerLoop () at postmaster.c:1214
#20 0x08201c66 in PostmasterMain (argc=3, argv=0x9ecdc50) at postmaster.c:967
#21 0x081a9e0b in main (argc=3, argv=0x9ecdc50) at main.c:188
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com
Thank you for reporting! I noticed that I need more examination the case when dsm relations or dsm chunks are exhausted. I'll do more tests for DSM. "Pavan Deolasee" <pavan.deolasee@gmail.com> wrote: > I was testing this patch when got this server crash. The patch is applied > on the current CVS HEAD. I thought you would be interested in this. > > The patch worked for smaller scaling factor and its reproducible. > > Test: pgbench -s 90 -i -F 95 postgres > > #3 0x082dc04f in ExceptionalCondition (conditionName=0x83a7ad7 "!(victim)", > errorType=0x83a7622 "FailedAssertion", > fileName=0x83a7487 "deadspace.c", lineNumber=1080) at assert.c:51 Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
ITAGAKI Takahiro wrote: > Attached is an updated DSM patch. I've left the core function of DSM only > and dropped other complicated features in this release. We discussed it a long time ago already, but I really wished the DSM wouldn't need a fixed size shared memory area. It's one more thing the DBA needs to tune manually. It also means we need to have an algorithm for deciding what to keep in the DSM and what to leave out. And I don't see a good way to extend the current approach to implement the index-only-scans that we've been talking about, and the same goes for recovery. :( The way you update the DSM is quite interesting. When a page is dirtied, the BM_DSM_DIRTY flag is set in the buffer descriptor. The corresponding bit in the DSM is set lazily in FlushBuffer whenever BM_DSM_DIRTY is set. That's a clever way to avoid contention on updates. But does it work for tables that have a small hot part that's updated very frequently? That's exactly the scenario where the DSM is the most useful. Hot pages stay in the buffer cache because they're frequently accessed, which means that FlushBuffer isn't getting called for them and the bits in the DSM aren't getting set until checkpoint. This could lead to unnecessary bloating of the hot part. A straightforward fix would be to scan the buffer cache for buffers marked with BM_DSM_DIRTY to update the DSM before starting the vacuum scan. It might not be a problem in practice, but it bothers me that the DSM isn't 100% accurate. You end up having a page with dead tuples on it marked as non-dirty in the DSM at least when a page is vacuumed but there's some RECENTLY_DEAD tuples on it that become dead later on. There might be other scenarios as well. If I'm reading the code correctly, DSM makes no attempt to keep the chunks ordered by block number. If that's the case, vacuum needs to be modified because it currently relies on the fact that blocks are scanned and the dead tuple list is therefore populated in order. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > The way you update the DSM is quite interesting. When a page is dirtied, > the BM_DSM_DIRTY flag is set in the buffer descriptor. The corresponding > bit in the DSM is set lazily in FlushBuffer whenever BM_DSM_DIRTY is > set. That's a clever way to avoid contention on updates. But does it > work for tables that have a small hot part that's updated very > frequently? I think there is no problem. Bloating will make pages including the unnecessary area which will not be accessed. Soon, those pages will be registered into DSM. Or, though it expands however, do you assume accessing all pages equally? -- Hiroki Kataoka <kataoka@interwiz.jp>
"Hiroki Kataoka" <kataoka@interwiz.jp> writes: > I think there is no problem. Bloating will make pages including the > unnecessary area which will not be accessed. Soon, those pages will be > registered into DSM. Except the whole point of the DSM is to let us vacuum those pages *before* that happens... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote: > "Hiroki Kataoka" <kataoka@interwiz.jp> writes: > >> I think there is no problem. Bloating will make pages including the >> unnecessary area which will not be accessed. Soon, those pages will be >> registered into DSM. > > Except the whole point of the DSM is to let us vacuum those pages *before* > that happens... You are right. However, expecting perfection will often lose performance. Delaying processing to some extent leads to performance. Even if hot page is not vacuumed, it does not mean generating dead tuples boundlessly. About one hot page, the quantity of dead tuple which continues existing unnecessarily is at most 1 page or its extent. Also that page is soon registered into DSM by checkpoint like fail-safe. Isn't some compromise need as first version of DSM vacuum? -- Hiroki Kataoka <kataoka@interwiz.jp>
This needs additional changes for memory mangement and we don't have time to do that for 8.3, Sorry: This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- ITAGAKI Takahiro wrote: > Attached is an updated DSM patch. I've left the core function of DSM only > and dropped other complicated features in this release. > > VACUUM finishs faster with the patch, but it's obvious... DSM vacuum > sweeps only pages that have many dead tuples and leave some of them > after vacuum. > > I'll examine the sweep behavior and the performance from now. > > > * Features > - DSM tracks pages worth vacuuming using 1bit/page bit. > The threshold is two dead tuples or 2kB of deadspaces. > - DSM is constructed at page flush. Almost of the works are done by > bgwriter if it is properly configured. > - 'VACUUM' command uses DSM. 'VACUUM ALL' always scans all pages. > - This is including n_dead_tuples statistics fix. > http://momjian.us/mhonarc/patches/msg00002.html > > * Configuration > - max_dsm_relations (=1000) > Counterpart to max_fsm_relations, but count tables only; > Indexes are not tracked by DSM. > - max_dsm_pages (=1024000) > Counterpart to max_dsm_pages. Default values are configurated to > 5 times of max_fsm_pages at initdb. > - min_dsm_target (=8MB) > Minimum size of tables of which dead space is tracked > to avoid tracking small tables, including system catalogs. > > * Limitation > - XID-wraparound vacuum is still required. VACUUM with DSM cannot > update relfrozenxid, so we sometimes needs full-scan. > - No recovery support. All contents of DSM and FSM are lost on crash. > - DSM uses fixed size memory allocated at server start. We cannot change > the value on-the-fly. If we want the feature, we need something like > shared-memory-allocator or swap-supported memory management module. > > Regards, > --- > ITAGAKI Takahiro > NTT Open Source Software Center [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +