Shreya Bhargava wrote:
> 1. Populate the table with 80 million tuples.
> 2. Create HASH index on the table.
> 3. clear both linux cache & psql buffers.
> (exiting psql and restarting it cleared the psql buffers;
> to clear linux cache, we used drop_cache command)
> 4. start psql
> 5. select on an integer in the range of values in the table.
> (all test numbers were big ones, like 98934599)
> 6. record the time.
> 7. exit psql.
> 8. drop caches.(as described above)
> 9. repeat 4-8 for different numbers.
> 10. Drop Hash index.
> 11. Create Btree index and repeat 3-9.
It seems you're mostly measuring the overhead of starting a backend,
populating the relcache etc.
Restarting psql doesn't clear the postgres shared buffer cache. Or did
you mean that you restarted postgres?
Anyway, I don't think it's interesting to test with cleared caches.
Surely the metapage and first 1-2 levels of the b-tree would stay cached
all the time in real life.
> From the results obtained, the average of all the hash probes is 141.8ms, the average for btree is 168.5, a
differenceof about 27.The standard deviations are about 23, so this is a statistically significant difference.
I don't trust those numbers much, but in any case I don't think that
edge is big enough to justify the existence of hash indexes.
If you're looking for a use case where hash index is faster, I'd suggest
using a data type with an expensive comparison function. Like long
multi-byte strings in UTF-8 encoding.
-- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com