Question about subselect/IN performance

Поиск
Список
Период
Сортировка
От T.H.
Тема Question about subselect/IN performance
Дата
Msg-id id3d3s$c8s$1@news.hub.org
обсуждение исходный текст
Ответы Re: Question about subselect/IN performance  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список pgsql-performance
I have a query that's running an IN/Subselect that joins three different
tables and gets a list of IDs to compare against... the subselect
basically looks for records through a join table based on the 3rd
table's name, similar to:

... IN (SELECT id FROM foo, foo_bar, bar
         WHERE foo.id = foo_bar.foo_id
             AND bar.id = foo_bar.bar_id
             AND bar.name = "something") ...

This is all nested in a fairly complex query, and several of these
subselects operate on different tables within the query. The whole
thing, on some high-cardinality cases, can take 2.5 seconds to run
(clearly something can be done about that).

So in this example, the cardinality of the bar table is very low, and
fairly constant, something on the order of 5-7 records. In an
optimization attempt, I reduced the joins in the subselect from 2 to 1
by passing in the ID of the bar with the correct name, which I can
easily cache application-side or pre-fetch in a single query. Now it
looks like this:

... IN (SELECT id FROM foo, foo_bar
          WHERE foo.id = foo_bar.foo_id
              AND foo_bar.bar_id = 1) ...

Crazy thing is, that single optimization reduced the query time
significantly, from 2.5-3 seconds down to 40-60ms.

Does anyone have any kind of explanation for this? Are the inner
workings of the IN clause taking the plan for the subselect into account
when running, and doing something clever with it? Any insight on the
internal mechanisms of IN or subselects in Postgres would be greatly
appreciated if anyone knows more.

Also, are there any better ways you can think of doing such an IN query,
using non-subselect means that might be more efficient?

Thanks in advance, any advice/help understanding this better is greatly
appreciated.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Maciek Sakrejda
Дата:
Сообщение: Re: Simple database, multiple instances?
Следующее
От: "Pierre C"
Дата:
Сообщение: Re: Simple database, multiple instances?