Обсуждение: Bulkdelete and Vacuum operations on custom index

Поиск
Список
Период
Сортировка

Bulkdelete and Vacuum operations on custom index

От
Carsten Kropf
Дата:
Hi all,
I am currently implementing some index access methods on top of PostgreSQL. Until now, it is pretty fine and working
properly.However, I am now doing the implementation of bulk deletion and vacuum of the structure. I don't know exactly,
howto achieve this because it would be much easier to just collect statistics in bulkdelete and to implement the "real
deal"of deleting the particular entries from my structures when vacuum is called on the index. Is it legitimate to do
this:just collect statistics and pass the statistics and items to be deleted in main memory back to the caller and
performthe real deletion of entries in vacuum? It would be much easier for me, if I would do this, because of the
generalstructure I have here. 
As far as I understand the documentation, vacuum is called when bulkdelete does return some statistics that some
entrieshave been removed. If I would now put some additional information (namely the tuples) that have been deleted to
thestatistics and pass all of these data to vacuum (like in GiST whereas GiST only sets up a boolean flag) and then
woulddelete the whole collected entries (stored somewhere in main memory), would this be still OK? 
Actually I don't know exactly, if this could be done properly, but it would help me much because if I would use a
standardapproach of deleting in bulk delete and reorganizing in vacuum, I would get into some trouble probably and
wouldhave to rethink my whole algorithms (which I was testing in main memory in some outstanding project, before). 
Thanks in advance.

Best regards
    Carsten Kropf

Re: Bulkdelete and Vacuum operations on custom index

От
Tom Lane
Дата:
Carsten Kropf <ckropf2@fh-hof.de> writes:
> I am currently implementing some index access methods on top of
> PostgreSQL. Until now, it is pretty fine and working
> properly. However, I am now doing the implementation of bulk deletion
> and vacuum of the structure. I don't know exactly, how to achieve this
> because it would be much easier to just collect statistics in
> bulkdelete and to implement the "real deal" of deleting the particular
> entries from my structures when vacuum is called on the index. Is it
> legitimate to do this: just collect statistics and pass the statistics
> and items to be deleted in main memory back to the caller and perform
> the real deletion of entries in vacuum?

No.  You *must* make the index entries go away during bulkdelete,
because the heap tuples they are pointing at will be deleted as soon
as it returns.  If you don't do this, and there's a crash before the
vacuum finishes, you have dangling index entries pointing at nonexistent
heap entries, which will lead to big trouble later.  I think you
probably don't even need a crash to have trouble --- consider a
concurrent indexscan query that finds one of those index entries and
tries to visit the heap tuple from it.

The other problem with your sketch is that you can't assume you have an
indefinitely large amount of working memory available.

Perhaps you could set a flag on each deleted index tuple during
bulkdelete (with scans knowing to ignore marked tuples) and then do the
physical reorganization at vacuum cleanup.  This would imply doing a
full scan of the index during cleanup (to find the dead entries) but we
do similar things in btree indexes and the performance seems to be OK.

BTW, this seems a bit off-topic for pgsql-general.  You'd be better
off asking such questions in -hackers.

            regards, tom lane