Обсуждение: Adjusting index special storage for pg_filedump's convenience

Поиск

Список

Период

Сортировка

Adjusting index special storage for pg_filedump's convenience

От

Tom Lane

Дата:

09 апреля 2007 г., 16:10:53

Historically, pg_filedump
http://sources.redhat.com/rhdb/utilities.html
has relied on the size of a page's special space to determine which kind
of index it is looking at (btree, hash, etc) so that it can dump the
contents of the special space nicely.  This is pretty ugly of course,
but there isn't a whole lot of other context available.  (Before you
say "why not look at the metapage?", consider that we may be looking at
a segment file that doesn't contain the metapage, and gist and gin don't
use one anyway.)  As of 8.2 it's entirely broken for gist because gist
and btree now have the same-size special space, ie 16 bytes; and it
looks like bitmap indexes will too.

We put in a workaround a long time ago to make it possible to tell the
difference between btree and hash special space, which are also the same
size: there's an unused 16 bits in hash special space that we fill with
a specific value.  As of 8.2 this doesn't work as well as it used to,
because the corresponding space in a btree page is now used for a vacuum
cycle ID and so there's 1 chance in 65536 of a false match.  Still, it's
a lot better than nothing.

I'd like to tweak things for 8.3 so that pg_filedump can work reasonably
well again.  It looks like the hash solution would work for gist, gin,
and bitmap: rearranging fields would allow us to put in a 16-bit ID
field in all three cases.  (For bitmap, I'm assuming that
bm_hrl_words_used could be reduced to 16 bits without problem --- it is
a per-page count not something larger, right?)

One problem with that is that with four special values, there'd be 1
chance in 16384 of misidentifying a btree page because of chance values
of the vacuum cycle ID.  This can be improved a bit if we put the flags
fields (for those index types that have 'em) in a consistent place too:
we can disbelieve that an index is of type X if it doesn't have a flags
value that fits.  I don't see any way to make it completely bulletproof
without enlarging the special space, which seems an unreasonable price
to pay.  But even one chance in 16K is way better than the current
situation.

Thoughts, objections, better ideas?
        regards, tom lane

Re: Adjusting index special storage for pg_filedump's convenience

От

Gavin Sherry

Дата:

10 апреля 2007 г., 00:56:03

On Mon, 9 Apr 2007, Tom Lane wrote:

> We put in a workaround a long time ago to make it possible to tell the
> difference between btree and hash special space, which are also the same
> size: there's an unused 16 bits in hash special space that we fill with
> a specific value.  As of 8.2 this doesn't work as well as it used to,
> because the corresponding space in a btree page is now used for a vacuum
> cycle ID and so there's 1 chance in 65536 of a false match.  Still, it's
> a lot better than nothing.

Sounds... reasonable. Especially if you add the flags test below.

>
> I'd like to tweak things for 8.3 so that pg_filedump can work reasonably
> well again.  It looks like the hash solution would work for gist, gin,
> and bitmap: rearranging fields would allow us to put in a 16-bit ID
> field in all three cases.  (For bitmap, I'm assuming that
> bm_hrl_words_used could be reduced to 16 bits without problem --- it is
> a per-page count not something larger, right?)

Yes, I've reduced this already but hadn't in previous patches, from
memory. I'd add a filler of uint16 now. Got a number I should use?

> One problem with that is that with four special values, there'd be 1
> chance in 16384 of misidentifying a btree page because of chance values
> of the vacuum cycle ID.  This can be improved a bit if we put the flags
> fields (for those index types that have 'em) in a consistent place too:
> we can disbelieve that an index is of type X if it doesn't have a flags
> value that fits.  I don't see any way to make it completely bulletproof
> without enlarging the special space, which seems an unreasonable price
> to pay.  But even one chance in 16K is way better than the current
> situation.

Sounds like the only workable approach.

Thanks,

Gavin

Re: Adjusting index special storage for pg_filedump's convenience

От

Tom Lane

Дата:

10 апреля 2007 г., 01:17:22

Gavin Sherry <swm@alcove.com.au> writes:
> On Mon, 9 Apr 2007, Tom Lane wrote:
>> ... I don't see any way to make it completely bulletproof
>> without enlarging the special space, which seems an unreasonable price
>> to pay.  But even one chance in 16K is way better than the current
>> situation.

> Sounds like the only workable approach.

Actually, I realized after writing that that it *is* possible to make it
bulletproof: all we have to do is make the BTCycleId wrap around at a
little less than 64K, which adds about one line of code and doesn't
materially change its reliability.  That leaves a few bitpatterns free
for IDs of other index types with no chance of collision.  I made hash
use 0xFF80 and gist 0xFF81; please use 0xFF82 for bitmaps.  (GIN turns
out not to need a code because its special space is a different size,
so we can tell it apart anyway.)

See patch already committed here:
http://archives.postgresql.org/pgsql-committers/2007-04/msg00125.php
        regards, tom lane

Re: Adjusting index special storage for pg_filedump's convenience

От

Heikki Linnakangas

Дата:

16 апреля 2007 г., 08:42:18

Tom Lane wrote:
> Gavin Sherry <swm@alcove.com.au> writes:
>> On Mon, 9 Apr 2007, Tom Lane wrote:
>>> ... I don't see any way to make it completely bulletproof
>>> without enlarging the special space, which seems an unreasonable price
>>> to pay.  But even one chance in 16K is way better than the current
>>> situation.
> 
>> Sounds like the only workable approach.
> 
> Actually, I realized after writing that that it *is* possible to make it
> bulletproof: all we have to do is make the BTCycleId wrap around at a
> little less than 64K, which adds about one line of code and doesn't
> materially change its reliability.  That leaves a few bitpatterns free
> for IDs of other index types with no chance of collision.  I made hash
> use 0xFF80 and gist 0xFF81; please use 0xFF82 for bitmaps.  (GIN turns
> out not to need a code because its special space is a different size,
> so we can tell it apart anyway.)
> 
> See patch already committed here:
> http://archives.postgresql.org/pgsql-committers/2007-04/msg00125.php

That's a clever trick, but I can't help thinking we really should have 
an explicit field in the page header to indicate what kind of a page it 
is. It would make life simpler for any external tools that want to peek 
into pages, including migration utilities after a release or two. We've 
also been talking about setting hint bits and doing some kind of retail 
vacuuming in bgwriter with HOT. To do that, we need to identify heap 
pages in the bgwriter. While heap pages can currently be identified by 
the fact that they don't have a special area, it feels hackish, and we 
might want to do something like that for index pages too in the future.

We now have a 16-bit pd_flags field in the page header. We could use a 
few bits from that.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Adjusting index special storage for pg_filedump's convenience

От

Tom Lane

Дата:

16 апреля 2007 г., 15:13:21

Heikki Linnakangas <heikki@enterprisedb.com> writes:
> That's a clever trick, but I can't help thinking we really should have 
> an explicit field in the page header to indicate what kind of a page it 
> is.

I think we should save the pd_flags field for cases where we really need
it ...
        regards, tom lane

Re: Adjusting index special storage for pg_filedump's convenience

От

Zdenek Kotala

Дата:

16 апреля 2007 г., 19:09:26

Heikki Linnakangas wrote:

> 
> That's a clever trick, but I can't help thinking we really should have 
> an explicit field in the page header to indicate what kind of a page it 
> is. It would make life simpler for any external tools that want to peek 
> into pages, including migration utilities after a release or two. We've 
> also been talking about setting hint bits and doing some kind of retail 
> vacuuming in bgwriter with HOT. To do that, we need to identify heap 
> pages in the bgwriter. While heap pages can currently be identified by 
> the fact that they don't have a special area, it feels hackish, and we 
> might want to do something like that for index pages too in the future.
> 
> We now have a 16-bit pd_flags field in the page header. We could use a 
> few bits from that.
> 

+1

or add one extra field
    Zdenek

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Adjusting index special storage for pg_filedump's convenience

Adjusting index special storage for pg_filedump's convenience

Re: Adjusting index special storage for pg_filedump's convenience

Re: Adjusting index special storage for pg_filedump's convenience

Re: Adjusting index special storage for pg_filedump's convenience

Re: Adjusting index special storage for pg_filedump's convenience

Re: Adjusting index special storage for pg_filedump's convenience