Обсуждение: DISTINCT ON without ORDER BY
Hi, I was going through the queries of an SQL application and came across queries like: SELECT * FROM foo WHERE id in (SELECT max(id) FROM foo GROUP BY bar); I thought, here's a case where this could be better written using DISTINCT ON, since then you avoid the self-join: SELECT DISTINCT ON (bar) * FROM ORDER BY bar, id DESC; However, this was slower because the original query could use a hash aggregate whereas the new query needed to do a sort. The way DISTINCT ON is defined it requires an ORDER BY whereas semantically the ordering on the first attribute is just a by product of the old implementation. Is there a way to acheive the above result without a sort and without a self-join? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Martijn van Oosterhout <kleptog@svana.org> writes: > SELECT * FROM foo > WHERE id in (SELECT max(id) FROM foo GROUP BY bar); > > Is there a way to acheive the above result without a sort and without a > self-join? Something like SELECT bar, (magic_agg_func(foo)).* FROM foo GROUP BY bar where you define an aggregate function magic_agg_func to remember the whole record for the largest value of id. Something like: postgres=# create function magic_transition(a,a) returns a as 'select case when $1.aid > $2.aid then $1 else $2 end' languagesql; postgres=# create aggregate magic (a) (sfunc = magic_transition, stype = a); Not sure it'll be faster though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
On 2009-04-19, Martijn van Oosterhout <kleptog@svana.org> wrote: > Hi, > > I was going through the queries of an SQL application and came across > queries like: > > SELECT * FROM foo > WHERE id in (SELECT max(id) FROM foo GROUP BY bar); > > I thought, here's a case where this could be better written using > DISTINCT ON, since then you avoid the self-join: > > SELECT DISTINCT ON (bar) * FROM > ORDER BY bar, id DESC; > > However, this was slower because the original query could use a hash > aggregate whereas the new query needed to do a sort. The way DISTINCT > ON is defined it requires an ORDER BY whereas semantically the ordering > on the first attribute is just a by product of the old implementation. > > Is there a way to acheive the above result without a sort and without a > self-join? anyway you could possibly write an agregate function that returns a copy of the row with the highest id?
On Tue, Apr 21, 2009 at 12:11:26PM +0000, Jasen Betts wrote: > > Is there a way to acheive the above result without a sort and without a > > self-join? > > anyway you could possibly write an agregate function that returns a > copy of the row with the highest id? Put that way it sounds like something for a window function. Not sure if they can use a HashAggrgate though. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.