Re: pg_trgm word_similarity query does not use index for input strings longer than 8 characters

Поиск
Список
Период
Сортировка
От pgsql-performance@jhacker.de
Тема Re: pg_trgm word_similarity query does not use index for input strings longer than 8 characters
Дата
Msg-id 144fe629-99b1-c773-7d37-5c1fca5d911a@mailbox.org
обсуждение исходный текст
Ответ на Re: pg_trgm word_similarity query does not use index for input strings longer than 8 characters  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
Thank you both a lot for the insights and your input.

 > Yeah, this test case seems very unrealistic, both as to table size
 > and as to the lack of variability of the table entries.

The example was based on real data with a more complicated query which
prompted me to investigate the issue. The distinction between slow and
fast queries is not as clear cut as with the generated data, but the
general problem remains.

 >> Since you have SSDs, you should tune "random_page_cost = 1.1".

I tested different values of random_page_cost with various queries. Too
small values increased the execution time again, due to too eager index
usage. I identified the optimum for my use case at 1.4. This solved my
problem, thanks.

Regards
Jonathan

On 07.12.21 18:08, Tom Lane wrote:
> Laurenz Albe <laurenz.albe@cybertec.at> writes:
>> On Tue, 2021-11-30 at 22:38 +0100, pgsql-performance@jhacker.de wrote:
>>> INSERT INTO song (artist, title)
>>> SELECT 'artist','title'
>>> FROM generate_series(1,10000);
>>>
>>> \set query '12345678'
>>>
>>> -- This query is slow
>>> EXPLAIN ANALYZE
>>> SELECT song.artist, song.title
>>> FROM song
>>> WHERE (song.artist %> :'query' OR song.title %> :'query')
>>> ;
> 
>> The table is quite small; with a bigger table, the test would be more meaningful.
> 
> Yeah, this test case seems very unrealistic, both as to table size
> and as to the lack of variability of the table entries.  I think the
> latter is causing the indexscans to take less time than they otherwise
> might, because none of the extracted trigrams find any matches.
> 
>> Since you have SSDs, you should tune "random_page_cost = 1.1".
> 
> Right.  Poking at gincostestimate a bit, I see that for this
> operator the indexscan cost estimate is basically driven by the
> number of trigrams extracted from the query string (nine in this
> test case) and the index size; those lead to a predicted number
> of index page fetches that's then scaled by random_page_cost.
> That's coming out to make it look more expensive than the seqscan.
> It's actually not more expensive, but that's partially because
> page fetch costs are really zero in this test case (everything
> will stay in shared buffers the whole time), and partially because
> the unrealistic data pattern is leading to not having to look at
> as much of the index as gincostestimate expected.
> 
> In general, it appears correct that longer query strings lead to a
> higher index cost estimate, because they produce more trigrams so
> there's more work for the index match to do.  (At some level, a
> longer query means more work in the seqscan case too; but our cost
> models are inadequate to predict that.)
> 
>             regards, tom lane
> 



В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: pg_trgm word_similarity query does not use index for input strings longer than 8 characters
Следующее
От: Imre Samu
Дата:
Сообщение: Re: PostgreSQLv14 TPC-H performance GCC vs Clang