Обсуждение: SQL advice needed
Hi, I have a volatile function that returns multiple rows. It may also return nothing. Now, I want to write an SQL statement that calls this function until it returns an empty result set and returns all the rows. So, in principle I want to: WITH RECURSIVE t AS ( SELECT * FROM xx() UNION ALL SELECT * FROM xx() ) SELECT * FROM t; But that's not recursive because the union all part lacks a reference to t. Next I tried this: WITH RECURSIVE t AS ( SELECT * FROM xx() UNION ALL SELECT * FROM xx() WHERE EXISTS (SELECT 1 FROM t) ) SELECT * FROM t; But the reference to t is not allowed in a subquery. What's the best (or at least a working) way to achieve what I want? I can do it in plpgsql. But that would mean to accumulate the complete result in memory first, right? I need to avoid that. Thanks, Torsten
On Mon, Mar 17, 2014 at 3:21 PM, Torsten Förtsch <torsten.foertsch@gmx.net> wrote: > Hi, > > I have a volatile function that returns multiple rows. It may also > return nothing. Now, I want to write an SQL statement that calls this > function until it returns an empty result set and returns all the rows. > > So, in principle I want to: > > WITH RECURSIVE > t AS ( > SELECT * FROM xx() > UNION ALL > SELECT * FROM xx() > ) > SELECT * FROM t; > > But that's not recursive because the union all part lacks a reference to t. > > Next I tried this: > > WITH RECURSIVE > t AS ( > SELECT * FROM xx() > UNION ALL > SELECT * FROM xx() WHERE EXISTS (SELECT 1 FROM t) > ) > SELECT * FROM t; > > But the reference to t is not allowed in a subquery. > > What's the best (or at least a working) way to achieve what I want? > > I can do it in plpgsql. But that would mean to accumulate the complete > result in memory first, right? I need to avoid that. I would test that assumption. This is better handled in loop IMO. LOOP RETURN QUERY SELECT * FROM xx(); IF NOT found THEN RETURN; END IF; END LOOP; merlin
Torsten Förtsch wrote > Hi, > > I have a volatile function that returns multiple rows. It may also > return nothing. Now, I want to write an SQL statement that calls this > function until it returns an empty result set and returns all the rows. > > What's the best (or at least a working) way to achieve what I want? > > I can do it in plpgsql. But that would mean to accumulate the complete > result in memory first, right? I need to avoid that. You are describing procedural logic. If you need intermediate steps before "returns all the rows" then either those intermediate steps stay in memory OR you stick them on a table somewhere and, when your procedure is done, send back a cursor over that, possibly temporary, table. I don't think abusing WITH/RECURSIVE is going to be viable. You should also consider whether you can do what you need using set-logic (i.e., pure SQL). At worse it will be a learning exercise and a performance comparator. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/SQL-advice-needed-tp5796431p5796436.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On 17/03/14 21:42, Merlin Moncure wrote: >> I can do it in plpgsql. But that would mean to accumulate the complete >> > result in memory first, right? I need to avoid that. > I would test that assumption. This is better handled in loop IMO. > > LOOP > RETURN QUERY SELECT * FROM xx(); > IF NOT found > THEN > RETURN; > END IF; > END LOOP; At least according to the manual it is stored in memory: <cite> Note: The current implementation of RETURN NEXT and RETURN QUERY stores the entire result set before returning from the function, as discussed above. That means that if a PL/pgSQL function produces a very large result set, performance might be poor: data will be written to disk to avoid memory exhaustion, but the function itself will not return until the entire result set has been generated. A future version of PL/pgSQL might allow users to define set-returning functions that do not have this limitation. Currently, the point at which data begins being written to disk is controlled by the work_mem configuration variable. Administrators who have sufficient memory to store larger result sets in memory should consider increasing this parameter. </cite> I didn't test that, though. Torsten
On Mon, Mar 17, 2014 at 4:20 PM, Torsten Förtsch <torsten.foertsch@gmx.net> wrote: > On 17/03/14 21:42, Merlin Moncure wrote: >>> I can do it in plpgsql. But that would mean to accumulate the complete >>> > result in memory first, right? I need to avoid that. >> I would test that assumption. This is better handled in loop IMO. >> >> LOOP >> RETURN QUERY SELECT * FROM xx(); >> IF NOT found >> THEN >> RETURN; >> END IF; >> END LOOP; > > At least according to the manual it is stored in memory: > > <cite> > Note: The current implementation of RETURN NEXT and RETURN QUERY stores > the entire result set before returning from the function, as discussed > above. That means that if a PL/pgSQL function produces a very large > result set, performance might be poor: data will be written to disk to > avoid memory exhaustion, but the function itself will not return until > the entire result set has been generated. A future version of PL/pgSQL > might allow users to define set-returning functions that do not have > this limitation. Currently, the point at which data begins being written > to disk is controlled by the work_mem configuration variable. > Administrators who have sufficient memory to store larger result sets in > memory should consider increasing this parameter. > </cite> > > I didn't test that, though. The manual says describes the exact opposite of what you said you thought it does -- large result sets are paged out to disk, not stored in memory (this is a feature). CTEs use a similar tactic so it's a wash. The performance overhead of a tuplestore is probably not as bad as you think -- just test it out some simulated results and monitor performance. Either way, work_mem controls it. It's generally dangerous to crank work_mem to huge values but it's ok to set it temporarily via SET to huge values (say to 1GB) for a query particularly if you know that it's only getting issued by one caller at a time. merlin