Re: Use cases for lateral that do not involve a set returning function

Поиск

Список

Период

Сортировка

От	AJ Welch
Тема	Re: Use cases for lateral that do not involve a set returning function
Дата	10 декабря 2014 г. 04:21:18
Msg-id	CAO-RzR+Yw1uOMVLn6zKGk6+pnsJVmCDN9V2okK-rNxeD8T4jMA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Use cases for lateral that do not involve a set returning function (Albe Laurenz <laurenz.albe@wien.gv.at>)
Список	pgsql-general

Дерево обсуждения

Thanks for the response. Yea, using lateral there definitely reads better to me than using a correlated subquery. And it makes sense that performance is ok since you're filtering on a specific person's id (as you hinted at with `WHERE p.id = ...`) and the nested loop forced by `order by...limit 1` presumably only loops once.

However, I would consider that more of an OLTP style query where I'm kind of more interested in OLAP style queries as the referenced post was building a funnel analysis over an events table. I don't think their approach will scale. I guess it's kind of a specific question but that post got me wondering if there are any use cases for lateral outside of SRFs that do not generate a nested loop AND can not be achieved without lateral? Basically, something I could use in an OLAP query that I couldn't use prior to 9.3.

Also, just for the heck of it, I took a look at the explain plans for both of your queries without `WHERE p.id = ...` to see how they would scale.

Correlated subquery:

QUERY PLAN

Hash Right Join (cost=163943.00..5932603426739.67 rows=7499298 width=24)

Hash Cond: (n.people_id = p.people_id)

Join Filter: (NOT (SubPlan 1))

-> Seq Scan on names n (cost=0.00..320543.41 rows=14998595 width=18)

Filter: (now() > validfrom)

-> Hash (cost=77028.00..77028.00 rows=5000000 width=10)

-> Seq Scan on people p (cost=0.00..77028.00 rows=5000000 width=10)

SubPlan 1

-> Seq Scan on names n2 (cost=0.00..395543.88 rows=1 width=0)

Filter: ((validfrom > n.validfrom) AND (people_id = p.people_id) AND (now() > validfrom))

Lateral:

Nested Loop Left Join (cost=358043.67..1790218514528.00 rows=5000000 width=24)

-> Seq Scan on people p (cost=0.00..77028.00 rows=5000000 width=10)

-> Limit (cost=358043.67..358043.67 rows=1 width=18)

-> Sort (cost=358043.67..358043.68 rows=4 width=18)

Sort Key: n.validfrom

-> Seq Scan on names n (cost=0.00..358043.65 rows=4 width=18)

Filter: ((people_id = p.people_id) AND (now() > validfrom))

Granted I haven't set up any indexes, but it looks like the correlated subquery, after an initial access of the names table before the join, accesses the names table again for each (person, name) pair after the join (in the join filter). So it's worse than just 2 scans per person. Indeed, the lateral subquery seems better because it accesses the person table and then the names table once for each person. However, I might instead do something like this to access each table just once:

explain

select *

from people p

left join (

select *, rank() over(partition by people_id

order by validfrom desc) as rank

from names

) n on p.people_id = n.people_id

and n.rank = 1

QUERY PLAN

Hash Right Join (cost=3427870.26..3942220.56 rows=5000000 width=36)

Hash Cond: (n.people_id = p.people_id)

-> Subquery Scan on n (cost=3263927.26..3751430.31 rows=75000 width=26)

Filter: (n.rank = 1)

-> WindowAgg (cost=3263927.26..3563929.14 rows=15000094 width=18)

-> Sort (cost=3263927.26..3301427.49 rows=15000094 width=18)

Sort Key: names.people_id, names.validfrom

-> Seq Scan on names (cost=0.00..245542.94 rows=15000094 width=18)

-> Hash (cost=77028.00..77028.00 rows=5000000 width=10)

-> Seq Scan on people p (cost=0.00..77028.00 rows=5000000 width=10)

Thanks,

http://chartio.com/

https://www.linkedin.com/in/ajw0100

On Tue, Dec 9, 2014 at 2:24 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

AJ Welch wrote:
> http://blog.heapanalytics.com/postgresqls-powerful-new-join-type-lateral/
>
> I suspected some of the claims in the post may not have been accurate. This one in particular:
>
> "Without lateral joins, we would need to resort to PL/pgSQL to do this analysis. Or, if our data set
> were small, we could get away with complex, inefficient queries."
>
>
> The sum(1) and order by time limit 1 approach seemed less than ideal to me and I thought this analysis
> could be done with normal left joins instead of lateral left joins. So I came up with a proof of
> concept:
>
> https://github.com/ajw0100/snippets/tree/master/SQL/lateral
>
>
> Is my conclusion in the README correct? Does anything beyond select...from...where force a nested
> loop? In that case, is lateral really only useful with set returning functions as the docs suggest?
> Does anyone know of any use cases for lateral that do not involve a set returning function?

Only recently I used lateral joins to optimize a query.

This is a sample of how the query looked bfore:

SELECT ...
FROM people p
LEFT JOIN names n
ON (n.people_id = p.people_id
AND current_timestamp > n.validfrom
AND NOT EXISTS (SELECT 1 FROM names n2
WHERE n2.people_id = p.people_id
AND current_timestamp > n2.validfrom
AND n2.validfrom > n.validfrom)
)
WHERE p.id = ...

So basically it is supposed to find the latest valid name for a person.

This required two scans of the "names" table per "person" record.

I rewrote it as

SELECT ...
FROM people p
LEFT JOIN LATERAL (SELECT * FROM names n
WHERE n.people_id = p.people_id
AND current_timestamp > n.validfrom
ORDER BY n.validfrom DESC LIMIT 1) n
ON TRUE
WHERE p.id = ...

With the correct index this touched fewer blocks and worked faster.
Also, though this is of course a matter of taste, it is more readable.

Of course this forces a nested loop, but that is not bad as such.
In my case it was not problem (I tried to hint at that with the WHERE clause).

So yes, I think that LATERAL is useful even without set returning functions.

Yours,
Laurenz Albe

В списке pgsql-general по дате отправления:

Предыдущее

От: Eric Svenson
Дата: 10 декабря 2014 г., 03:51:01
Сообщение: Re: Fwd: Fwd: Problem with pg_dump and decimal mark

Следующее

От: István
Дата: 10 декабря 2014 г., 04:22:52
Сообщение: Weird CPU utilization patterns with Postgres

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Use cases for lateral that do not involve a set returning function

Предыдущее

Следующее