Re: Hash Anti Join performance degradation

Поиск
Список
Период
Сортировка
От panam
Тема Re: Hash Anti Join performance degradation
Дата
Msg-id 1306426087061-4429125.post@n5.nabble.com
обсуждение исходный текст
Ответ на Re: Hash Anti Join performance degradation  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Ответы Re: Hash Anti Join performance degradation  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Re: Hash Anti Join performance degradation  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
Список pgsql-performance
Hi all,


Cédric Villemain-3 wrote:
>
> without explaining further why the antijoin has bad performance
> without cluster, I wonder why you don't use this query :
>
> SELECT  b.id,
>                   max(m.id)
> FROM box b, message m
> WHERE m.box_id = b.id
> GROUP BY b.id;
>
> looks similar and fastest.
>
I actually did use a similar strategy in the meantime (during my problem
with the "left join" query we are talking about here all the time) for
mitigation.
It was
SELECT MAX(e.id) FROM event_message e WHERE e.box_id = id
and it performed worse in comparison to the "left join" query in the general
case (i.e. before my problems began).
At the end of this post is an explanation why I think I cannot use the
solution you suggested above.


Kevin Grittner wrote:
>
>  Each connection can allocate work_mem, potentially several times.
> On a machines without hundreds of GB of RAM, that pair of settings
> could cause severe swapping.
>
Indeed, thanks for the warning. These settings are not for production but to
exclude a performance degradation because of small cache sizes.


Kevin Grittner wrote:
>
> I think you would need a left join to actually get identical
> results:
>
> SELECT  b.id, max(m.id)
>   FROM box b
>   LEFT JOIN message m ON m.box_id = b.id
>   GROUP BY b.id;
>
> But yeah, I would expect this approach to be much faster.  Rather
> easier to understand and harder to get wrong, too.
>
>
Correct, it is much faster, even with unclustered ids.
However, I think I cannot use it because of the way that query is generated
(by hibernate).
The (simplyfied) base query is just

SELECT b.id from box

the subquery

(SELECT  m1.id FROM message m1
   LEFT JOIN message m2
      ON (m1.box_id = m2.box_id  AND m1.id < m2.id )
   WHERE m2.id IS NULL AND m1.box_id = b.id) as lastMessageId

is due to a hibernate formula (containing more or less plain SQL) to
determine the last message id for that box. It ought to return just one row,
not multiple. So I am constrained to the subquery in all optimization
attemps (I cannot combine them as you did), at least I do not see how. If
you have an idea for a more performant subquery though, let me know, as this
can easily be replaced.

Thanks for your help and suggestions
panam

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Hash-Anti-Join-performance-degradation-tp4420974p4429125.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: The shared buffers challenge
Следующее
От: Claudio Freire
Дата:
Сообщение: Re: The shared buffers challenge