Re: Performance on inserts

Поиск
Список
Период
Сортировка
От Alfred Perlstein
Тема Re: Performance on inserts
Дата
Msg-id 20000825085219.V1209@fw.wintelcom.net
обсуждение исходный текст
Ответ на Re: Performance on inserts  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Performance on inserts  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
* Tom Lane <tgl@sss.pgh.pa.us> [000825 08:25] wrote:
> 
> In other words, we do a linear scan over all the pages containing equal
> key values, in the hope of finding one where we can shoehorn in the new
> entry.  If the keys are roughly equal-sized, that hope is usually vain,
> and we end up splitting the rightmost page that contains any keys
> equal to the target key.  Subsequent inserts of the same key value fill
> first the left-half split page, then the right-half, then split the
> right-half page again and repeat --- but for each insert we first had
> to scan over all the preceding pages containing equal keys.  That means
> inserting N equal keys takes O(N^2) time, for large enough N.
> 
> After thinking about this for a while, I propose a probabilistic
> solution.  Suppose that we add to the above loop a random-number check
> that causes us to fall out of the loop with probability 1/3 or so;
> that is, if the current page is full of the target key, then we move
> right to check the next page with probability 2/3, but with probability
> 1/3 we give up and split the current page to insert the new key.
> 
> With this approach we'd hardly ever move more than, say, 10 pages before
> splitting, so the time to insert a new key is bounded no matter how many
> duplicates it has.
> 
> The reason for using a random decision is that if we always stopped
> after scanning a fixed number of pages, then the space freed up by the
> previous decision to split would always be just out of reach, and we'd
> end up with lots of half-full pages of the same key.  With a random
> decision we will visit and fill both the left and right halves of a
> previously split page.
> 
> Comments?  Is there a better way?  What's the best probability to use?


I'm unsure if it's possible, but somehow storing the last place one
'gave up' and decided to split the page could offer a useful next-start
for the next insert.  Sort of attempting the split work amongst multiple
requests.  For some reason it looks like your algorithm might cause
problems because it plain gives up after 10 pages?                       
hope this helps,
-Alfred


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: advice on extensions needed
Следующее
От: Brook Milligan
Дата:
Сообщение: Re: advice on extensions needed