Re: queries on xmin

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: queries on xmin
Дата
Msg-id 4136ffa0906110410i471a238fp68748af73711d9d1@mail.gmail.com
обсуждение исходный текст
Ответ на queries on xmin  (Matt Amos <zerebubuth@gmail.com>)
Ответы Re: queries on xmin  (Brett Henderson <brett@bretth.com>)
Список pgsql-general
On Thu, Jun 11, 2009 at 11:25 AM, Matt Amos<zerebubuth@gmail.com> wrote:
>
> what problems are we going to create for ourselves if we create a
> btree index on xmin casted to int4? would it be as efficient to use a
> hash index, create a temporary table of txids that we're querying with
> a hash index and do an explicit join? have i missed the point
> entirely?

Wow, I'm quite shocked that we don't already detect attempts to create
an index on xmin or xmax. There's no way that'll work properly since
those fields can change spontaneously when, for example vacuum runs or
for xmax when things like UPDATE or SELET FOR SHARE or SELECT FOR
UPDATE are used.

Incidentally the reason there's no btree opclass is because xids don't
monotonically increase. They wrap around. So periodically you would
lose updates or see them repeatedly whenever the xid wrapped around
and the old transactions appear to be in the future.

If you never run updates and are never interested in tuples that are
old enough to be frozen then perhaps you could mostly get away with
it. But I really don't think it's a good idea.

Much better would be to store a user-space column with somethin like
txid or a timestamp and use that directly. That way you have control
over the behaviour of the column.

Another option to consider would be including a boolean column
"dumped" defaulted to false. Then you could have a partial index on
the primary key or date "WHERE NOT dumped". Then when you dump you can
"SELECT FOR UPDATE * WHERE NOT dumped" and then when you're done
"UPDATE SET dumped = 't' ". Alternately you could use "UPDATE SET
dumped='t' WHERE NOT dumped RETURNING *" which is basically
equivalent.

That would create twice as much traffic in the table which would make
vacuums much more important. But it would mean you could quickly acces
undumped records using the index and know that your process doesn't
depend on a following a strict clock schedule.


--
Gregory Stark
http://mit.edu/~gsstark/resume.pdf

В списке pgsql-general по дате отправления:

Предыдущее
От: hubert depesz lubaczewski
Дата:
Сообщение: Re: Different handling of PL/pgSQL for-loop variables in 8.3.7 vs. 8.2.7 ???
Следующее
От: Marko Kreen
Дата:
Сообщение: Re: queries on xmin