Performance regression between 8.3 and 8.4 on heavy text indexing

Поиск
Список
Период
Сортировка
От gael@pilotsystems.net (Gaël Le Mignot)
Тема Performance regression between 8.3 and 8.4 on heavy text indexing
Дата
Msg-id plop871vn5b84g.fsf@aoskar.kilobug.org
обсуждение исходный текст
Ответы Re: Performance regression between 8.3 and 8.4 on heavy text indexing  (Guillaume Smet <guillaume.smet@gmail.com>)
Список pgsql-performance
Hello,

We are using PostgreSQL to index a huge collection (570 000) of articles for a french daily newspaper (Libération). We
usemassively the full text search feature. I attach to this mail the schema of the database we use. 

Overall, we have very interesting performances, except in a few cases, when combining a full text match with a lot of
matcheswith a date order and a limit (which is a very common use case, asking for the 50 more recent articles speaking
abouta famous person, for example). 

The reason of this mail is what we noticed a performance drop from PostgreSQL 8.3 to PostgreSQL 8.4.

In order to try to locate the performance cost, I changed a few settings in 8.4 to have the same values than in 8.3
(andrerun analyze after) :: 

 cursor_tuple_fraction = 1.0
 default_statistics_target = 10

We the modified settings, the peformance drop is much lower, but still
present. Here are the statistics  on replaying sequentially a bunch of
real-life queries to the two versions of the database :

With 8.3 ::

 7334 queries, average time is 0.20 s
 6 queries above 20.00 s (0.08 %)
 20 queries above 10.00 s (0.27 %)
 116 queries above 2.00 s (1.58 %)
 top ten:  15.09 15.15 15.19 16.60 20.40 63.05 67.89 78.21 90.30 97.56

With 8.4 ::

 7334 queries, average time is 0.23 s
 12 queries above 20.00 s (0.16 %)
 24 queries above 10.00 s (0.33 %)
 112 queries above 2.00 s (1.53 %)
 top ten:  31.76 31.94 32.63 47.21 48.80 63.50 79.57 83.36 96.44 113.61


Here is an example query that is significantly slower in 8.4 (31.76 seconds) than in 8.3 (10.52 seconds) ::

 SELECT classname, id FROM libeindex WHERE (classname = 'article' AND (source IN ('methode','nica') AND (keywords_tsv
@@plainto_tsquery('french', 'assassinat') AND fulltext_tsv @@ to_tsquery('french', 'claude & duviau')))) ORDER BY
publicationDateDESC,pageNumber ASC LIMIT 50 

And the explain on it :

With 8.3 ::

 Limit  (cost=752.67..752.67 rows=1 width=24)
   ->  Sort  (cost=752.67..752.67 rows=1 width=24)
         Sort Key: publicationdate, pagenumber
         ->  Bitmap Heap Scan on libeindex  (cost=748.64..752.66 rows=1 width=24)
               Recheck Cond: ((keywords_tsv @@ '''assassinat'''::tsquery) AND (fulltext_tsv @@ '''claud'' &
''duviau'''::tsquery))
               Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND ((classname)::text = 'article'::text))
               ->  BitmapAnd  (cost=748.64..748.64 rows=1 width=0)
                     ->  Bitmap Index Scan on keywords_index  (cost=0.00..48.97 rows=574 width=0)
                           Index Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
                     ->  Bitmap Index Scan on fulltext_index  (cost=0.00..699.42 rows=574 width=0)
                           Index Cond: (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery)
 (11 rows)

With 8.4 ::

 Limit  (cost=758.51..758.51 rows=1 width=24)
   ->  Sort  (cost=758.51..758.51 rows=1 width=24)
         Sort Key: publicationdate, pagenumber
         ->  Bitmap Heap Scan on libeindex  (cost=14.03..758.50 rows=1 width=24)
               Recheck Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
               Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND (fulltext_tsv @@ '''claud'' &
''duviau'''::tsquery)AND ((classname)::text = 'article'::text)) 
               ->  Bitmap Index Scan on keywords_index  (cost=0.00..14.03 rows=192 width=0)
                     Index Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
 (8 rows)

More informations on the setup :

- postgresql 8.3.7 from Debian Lenny ;

- postgresql 8.4.0 from Debian Lenny backports ;

- rurnning in a Xen virtual machine, using 64-bits kernel ;

- 2 cores of a 2GHz Core2Quad and 2Gb of RAM dedicated to the VM.

If you need additional informations, we'll gladly provide them. If you have any tips or advises so we could make the
8.4behave as least as good as the 8.3 it would be very nice. 

Hoping this can help you to improve this great software.

Regards,

--
Gaël Le Mignot - gael@pilotsystems.net
Pilot Systems - 9, rue Desargues - 75011 Paris
Tel : +33 1 44 53 05 55 - www.pilotsystems.net
Gérez vos contacts et vos newsletters : www.cockpit-mailing.com

В списке pgsql-performance по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: [PERFORMANCE] how to set wal_buffers
Следующее
От: Robert Haas
Дата:
Сообщение: Re: number of rows estimation for bit-AND operation