Re: Fix gin index cost estimation

Поиск
Список
Период
Сортировка
От Ronan Dunklau
Тема Re: Fix gin index cost estimation
Дата
Msg-id 1924159.usQuhbGJ8B@aivenronan
обсуждение исходный текст
Ответ на Re: Fix gin index cost estimation  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Fix gin index cost estimation  (Ronan Dunklau <ronan.dunklau@aiven.io>)
Список pgsql-hackers
Thank you for looking at it.

> I looked this over briefly.  I think you are correct to charge an
> initial-search cost per searchEntries count, but don't we also need to
> scale up by arrayScans, similar to the "corrections for cache effects"?
> 
> +     * We model index descent costs similarly to those for btree, but 
we also
> +     * need an idea of the tree_height.
> +     * We use numEntries / numEntryPages as the fanout factor.
> 
> I'm not following that calculation?  It seems like it'd be correct
> only for a tree height of 1, although maybe I'm just misunderstanding
> this (overly terse, perhaps) comment.

I don't really understand why that would work only with a tree height of one ? 
Every entry page contains a certain amount of entries, and as such computing 
the average number of entries per page seems to be a good approximation for 
the fanout. But I may have misunderstood what was done in other index types.

For consistency, maybe we should just use a hard coded value of 100 for the 
fanout factor, similarly to what we do for other index types.

But I realised that another approach might be better suited: since we want to 
charge a cpu cost for every page visited, actually basing that on the already 
estimated entryPagesFetched and dataPagesFetched would be better, instead of 
copying what is done for other indexes type and estimating the tree height. It 
would be simpler, as we don't need to estimate the tree height anymore.

I will submit a patch doing that.

> 
> +     * We charge descentCost once for every entry
> +     */
> +    if (numTuples > 1)
> +    {
> +        descentCost = ceil(log(numTuples) / log(2.0)) * 
cpu_operator_cost;
> +        *indexStartupCost += descentCost * 
counts.searchEntries;
> +    }
> 
> I had to read this twice before absorbing the point of the numTuples
> test.  Maybe help the reader a bit:
> 
> +    if (numTuples > 1)                /* ensure positive log() */
> 

Ok. On second read, I think that part was actually wrong: what we care about 
is not the number of tuples here, but the number of entries. 

> Personally I'd duplicate the comments from nbtreecostestimate rather
> than just assuming the reader will go consult them.  For that matter,
> why didn't you duplicate nbtree's logic for charging for SA scans?
> This bit seems just as relevant for GIN:
> 
>      * If there are ScalarArrayOpExprs, charge this once per SA scan.  
The
>      * ones after the first one are not startup cost so far as the 
overall
>      * plan is concerned, so add them only to "total" cost.
> 

You're right. So what we need to do here is scale up whatever we charge for 
the startup cost by the number of arrayscans for the total cost.

> Keep in mind also that pgindent will have its own opinions about how to
> format these comments, and it can out-stubborn you.  Either run the
> comments into single paragraphs, or if you really want them to be two
> paras then leave an empty comment line between.  Another formatting
> nitpick is that you seem to have added a number of unnecessary blank
> lines.

Thanks, noted.

I'll submit a new patch soon, as soon as i've resolved some of the problems I 
have when accounting for scalararrayops.

Best regards,

-- 
Ronan Dunklau





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Erik Rijkers
Дата:
Сообщение: Re: proposal: possibility to read dumped table's name from file
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: list of acknowledgments for PG15