Making hash indexes worthwhile
От | Jeff Janes |
---|---|
Тема | Making hash indexes worthwhile |
Дата | |
Msg-id | f67928030910041741n7f9b6bd8u9ef3b9fa0852ba6@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Making hash indexes worthwhile
|
Список | pgsql-hackers |
I see that the docs were recently changed from discouraging hash indexes both because they were no known uses in which they were meaningfully better than btree, and because they aren't recoverable; to now just discouraging them because they are not recoverable. Does that mean there are now uses in which hash indexes are provably better than btree if one is willing to overlook the lack of recoverability? If so, what are those situations? I've played around a bit with hash indexes, and it seems to me that making them generally worthwhile will take (at least) reducing or entirely doing away with the heavy-wait locks. I have a few ideas, which are not necessarily independent of each other. ===metablock=== If each bucket knows how many bits of the hash-value are significant for the entries in that bucket, then there will be a way to detect races from the metablock to the main bucket block. Rather than chaining locks from the metablock to the bucket, the process can compute the bucket and remember the number of significant bits in that bucket according to the metablock, then release the metablock lock before attempting to acquire the the buck lock. When the bucket lock is obtained, if the number of bits is greater than the number remembered from the metablock, then it can drop the bucket lock and try again. Once you don't need to chain locks from the metablock to the bucket, the lock on the metablock can probably be lowered to just a buffer content lock rather than a heavy lock. Even better, maybe the metablock can be cached in the relcache. Since the changes to the metablock seem to be entirely unidirectional (other than things like index dropping and rebuilding, which I think maybe the normal relcache flushing mechanisms will take care of), it should always be possible to detect when you have a too-stale copy and have to go get the current one. ===get rid of heavy locks=== While there are several potential places for deadlock, it seems like the main one is that the hash index scan code returns control to the executor while holding locks on the hash buckets/pages, so you can get hybrid deadlocks where one side of the blockage is in hash-code space and the other in executor-space. The obvious way to fix this would be to read the entire bucket worth of qualifying tids into local memory and release the bucket lock before returning control to the executor, kind of like btree does with pages. the main problem here is that a logical bucket can have an unbounded number of overflow pages. So if a bucket has too many tids with the sought-for full hash value (more than will fit in work_mem), we would have to prepared to spill to disk. Is that an acceptable trade-off? Obviously there are a lot of subtleties to work out with insertions and bucket-splits, but if having read-only scans occasionally need to spill to disk is a no-go then there is no point in trying to work the other things out. Thanks for any feedback, Jeff
В списке pgsql-hackers по дате отправления: