Index on TEXT versus CHAR(32)... fast exact TEXT matching

Поиск
Список
Период
Сортировка
От Jon Lapham
Тема Index on TEXT versus CHAR(32)... fast exact TEXT matching
Дата
Msg-id 4139BD78.40903@jandr.org
обсуждение исходный текст
Ответы Re: Index on TEXT versus CHAR(32)... fast exact TEXT matching  (Pierre-Frédéric Caillaud<lists@boutiquenumerique.com>)
Список pgsql-general
I have a table that stores TEXT information.  I need query this table to
find *exact* matches to the TEXT... no regular expressions, no LIKE
queries, etc.  The TEXT could be from 1 to 10000+ characters in length,
quite variable.  If it matters, the TEXT may contain UNICODE characters...

Example:
CREATE TABLE a (id SERIAL, thetext TEXT);
SELECT id FROM a WHERE thetext='Some other text';

One way I thought to optimize this process would be to store an MD5 hash
of the "thetext" column and query on that:

CREATE TABLE a (id SERIAL, thetext TEXT, thetext_md5 CHAR(32));
SELECT id FROM a WHERE thetext_m5d=md5('Some other text');

Now, obviously I would want to build an INDEX on either "thetext" or
"thetext_md5", depending on which way I decide to make the table.

My question is, what is the absolute fastest way to find the exact match
to a TEXT column?  Any amount of pre-processing is fine (such as
calculating the MD5's of all the TEXT tuples), but the query must be
extremely fast.

Has anyone compared (theoretical or practical) the performance of
querying a TEXT-based INDEX versus a CHAR(32)-based INDEX?  Is my MD5
idea a waste of time?  Is there something better than MD5?  Would it be
better to store the "fingerprint" of the TEXT as an integer somehow, so
that the INDEX could be based on a INT* column?

Thanks for any help!

-Jon

--
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
  Jon Lapham  <lapham@jandr.org>                Rio de Janeiro, Brasil
  Personal: http://www.jandr.org/
***-*--*----*-------*------------*--------------------*---------------


В списке pgsql-general по дате отправления:

Предыдущее
От: Jan Wieck
Дата:
Сообщение: Re: How to determine a database is intact?
Следующее
От: Lincoln Yeoh
Дата:
Сообщение: Re: postgres "on in the internet"