Обсуждение: HOT: Incomplete issues

Поиск
Список
Период
Сортировка

HOT: Incomplete issues

От
ITAGAKI Takahiro
Дата:
Hi,

I'm testing HOT patches, applying to CVS HEAD.
http://archives.postgresql.org/pgsql-patches/2007-05/msg00065.php
I found a few issues in the patch. Some of them might already have
been fixed, but anyway I'll report them for information.

I don't see any problems excluding the following in TPC-C workload
and pgbench power test. 


- MVCC-safe CLUSTER
When I clustered a table with HOT-updated tuples, I saw the following error
message. The HOT patch latest posted does not support MVCC-safe CLUSTER.
| ERROR: unexpected HeapTupleSatisfiesVacuum result

- Number of unremovable tuples reported by VACUUM VERBOSE
HOT-updated tuples (HEAPTUPLE_DEAD_CHAIN) are counted as "keeped" and
VACUUM VERBOSE prints them as "cannot be removed yet". However, we can
actually remove them. We can reuse the data space of HOT-updated tuples,
but need to keep their item pointers. We'd better to show them as two
different messages -- for example, unremovable tuples and unreusable
item pointers.

- ANALYZE and statistics of dead rows
Since redirected or redirect-dead item pointers are counted as "dead rows",
we overestimates the number of dead rows. It confuses statistics and
ill-affects to autovacuums; If autovacuum does ANALYZE, the number of
dead tuples looks suddenly increased and it triggers unnecessary VACUUMs
by the next autovacuum.

- Trigger of auto-analyze
HOT updates does not affect pgstat_info->trans->tuples_inserted/deleted
fields, so auto-analyze will be triggered less frequently. However,
it might be rather proper because HOT-updates means the indexed columns
were not changed in the updates. If the values used in WHERE-clauses,
we don't have to re-analyze the relation.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



Re: HOT: Incomplete issues

От
"Pavan Deolasee"
Дата:


On 6/26/07, ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> wrote:
Hi,

I'm testing HOT patches, applying to CVS HEAD.


Thanks a lot for your tests. I am posting a revised patch on -patches.
Please use that for further testing.

In the last few days, many people have reviewed the patch including
Simon, Heikki, Greg and Korry. I shall post a separate mail summarizing
the changes since the last revision.



- MVCC-safe CLUSTER
When I clustered a table with HOT-updated tuples, I saw the following error
message. The HOT patch latest posted does not support MVCC-safe CLUSTER.
| ERROR: unexpected HeapTupleSatisfiesVacuum result


Yes, this is a known issue. Heikki had posted a patch to resolve this
conflict.
 

- Number of unremovable tuples reported by VACUUM VERBOSE
HOT-updated tuples (HEAPTUPLE_DEAD_CHAIN) are counted as "keeped" and
VACUUM VERBOSE prints them as "cannot be removed yet". However, we can
actually remove them. We can reuse the data space of HOT-updated tuples,
but need to keep their item pointers. We'd better to show them as two
different messages -- for example, unremovable tuples and unreusable
item pointers.

We can not remove a HEAPTUPLE_DEAD_CHAIN tuple because even if
it is dead, its might be the only way to reach to the live tuple at the end of the chain.
Chain pruning logic would ensure that we remove most of such tuples before
running vacuum on the page, but few might still be left. We can not
reuse the data space just yet because then we loose the xmax/xmin check.
Also with several redirecting line pointers, the HOT chain becomes very complex
and unmanageable.

There are in fact quite a few scenarios here:

1. A dead tuple which is part of a HOT chain can not be removed
2. A dead tuple which is marked LP_DELETE is removed and reported as "removable"
3. A redirect-dead line pointer is removed and reported as "removable"
 
In case 3, no real tuple is being removed. The tuple might have been
already reused or vacuumed. So it could be slight misleading.

Another problem with the current reporting is that if the original dead tuple
is tracked with a separate lp-deleted line pointer and the original root
offset is redirect-dead then it might be reported twice as "removable".
Once for lp-deleted tuple and again for the redirect-dead line pointer.
May be we should report the the redirect-dead offsets as
"removable redirected offsets" and not count them in "removable" tuples ?


- ANALYZE and statistics of dead rows
Since redirected or redirect-dead item pointers are counted as "dead rows",
we overestimates the number of dead rows. It confuses statistics and
ill-affects to autovacuums; If autovacuum does ANALYZE, the number of
dead tuples looks suddenly increased and it triggers unnecessary VACUUMs
by the next autovacuum.


A redirect-dead line pointer consumes 4 bytes of dead space in a page. If a table is full of
redirect-dead line pointers, we should trigger vacuum on the table. May be we can maintain
separate stats about redirect-dead line pointers and give them lower significance
while deciding whether to vacuum or not.
 

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com