poor pefrormance with regexp searches on large tables

Поиск
Список
Период
Сортировка
От Grzegorz Blinowski
Тема poor pefrormance with regexp searches on large tables
Дата
Msg-id CAF=aNMHHzBKE8_P0LykDb8Bvk7eHKdytcZ9xAkz=gtpeTj87+w@mail.gmail.com
обсуждение исходный текст
Ответы Re: poor pefrormance with regexp searches on large tables  ("Tomas Vondra" <tv@fuzzy.cz>)
Re: poor pefrormance with regexp searches on large tables  (pasman pasmański <pasman.p@gmail.com>)
Re: poor pefrormance with regexp searches on large tables  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список pgsql-performance
Dear All,

I have some problems with regexp queries performance - common sense tells me that my queries should run faster than they do.

The database - table in question has 590 K records, table's size is 3.5GB. I am effectively querying a single attribute "subject" which has an average size of 2KB, so we are doing a query on ~1GB of data. The query looks more or less like this:

SELECT T.tender_id FROM archive_tender T WHERE
(( T.subject !~* '\\mpattern1.*\\M' ) AND ( T.subject ~* '\\mpattern2\\M' OR [4-5 more similar terms]  ) AND T.erased = 0 AND T.rejected = 0
ORDER BY
   tender_id DESC
LIMIT
    10000;

The planner shows seq scan on subject which is OK with regexp match.

Now, the query above takes about 60sec to execute; exactly: 70s for the first run and 60s for the next runs. In my opinion this is too long: It should take 35 s to read the whole table into RAM (assuming 100 MB/s transfers - half the HDD  benchmarked speed). With 12 GB of RAM the whole table should be easily buffered on the operating system level. The regexp match on 1 GB of data takes 1-2 s (I benchmarked it with a simple pcre test). The system is not in the production mode, so there is no additional database activity (no reads, no updates, effectively db is  read-only)

To summarize: any idea how to speed up this query? (please, don't suggest regexp indexing - in this application it would be too time consuming to implement them, and besides - as above - I think that Postgres should do better here even with seq-scan).

Server parameters:
RAM: 12 GB
Cores: 8
HDD: SATA; shows 200 MB/s transfer speed
OS: Linux 64bit; Postgres 8.4


Some performance params from postgresql.conf:
max_connections = 16
shared_buffers = 24MB                 
temp_buffers = 128MB                  
max_prepared_transactions = 50        
work_mem = 128MB                         
maintenance_work_mem = 1GB 
effective_cache_size = 8GB

Database is vacuumed.


Regards,

Greg

В списке pgsql-performance по дате отправления:

Предыдущее
От: Robert Ayrapetyan
Дата:
Сообщение: Re: Performance die when COPYing to table with bigint PK
Следующее
От: "Tomas Vondra"
Дата:
Сообщение: Re: poor pefrormance with regexp searches on large tables