Обсуждение: Re: [PATCHES] updated patch for selecting large results sets in psql using cursors
<chrisnospam@1006.org> writes: > here comes the latest version (version 7) of the patch to handle large > result sets with psql. As previously discussed, a cursor is used > for SELECT queries when \set FETCH_COUNT some_value > 0 Wait a minute. What I thought we had agreed to was a patch to make commands sent with \g use a cursor. This patch changes SendQuery so that *every* command executed via psql is treated this way. That's a whole lot bigger behavioral change than I think is warranted. regards, tom lane
Re: [PATCHES] updated patch for selecting large results sets in psql using cursors
От
Peter Eisentraut
Дата:
Tom Lane wrote: > Wait a minute. What I thought we had agreed to was a patch to make > commands sent with \g use a cursor. This patch changes SendQuery > so that *every* command executed via psql is treated this way. That's what I remembered. I don't think we want to introduce a difference between ; and \g. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane wrote: >> Wait a minute. What I thought we had agreed to was a patch to make >> commands sent with \g use a cursor. This patch changes SendQuery >> so that *every* command executed via psql is treated this way. > That's what I remembered. I don't think we want to introduce a > difference between ; and \g. Have we measured the performance impact, then? The last time I profiled psql, GetVariable was already a hotspot, and this introduces another call of it into the basic query loop whether you use the feature or not. regards, tom lane
On Mon, 2006-08-28 at 13:45 -0400, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > Tom Lane wrote: > > > Wait a minute. What I thought we had agreed to was a patch to make > > > commands sent with \g use a cursor. This patch changes SendQuery > > > so that *every* command executed via psql is treated this way. > > > That's what I remembered. I don't think we want to introduce a > > difference between ; and \g. > > Have we measured the performance impact, then? The last time I profiled > psql, GetVariable was already a hotspot, and this introduces another > call of it into the basic query loop whether you use the feature or not. > > regards, tom lane Hi, after agreeing on using a \set variable, I proposed to have it influence "\g" as well as ";", because I thought that would be the most expected behaviour. IMHO I'm with Peter, that introducing a difference between "\g" and ";" would go against the principle of least surprise. Performance-wise I took for granted without checking that GetVariable's running time would be negligible. [looks at the code] I see it's it's basically two function calls with a loop over a linked list of values (in the order of 10) doing strcmps and one strtol. It is not quite clear to me what the impact of this is. I could imagine it would show up only if you perform lots of trivial queries through psql. I'm going to benchmark something now and report back. Anyway, regardless the benchmark, I feel it's somehow not clean to have a variable introduce a difference between "\g" and ";". [goes benchmarking...] Bye, Chris. -- Chris Mair http://www.1006.org
> Performance-wise I took for granted without checking that GetVariable's > running time would be negligible. > > [looks at the code] > > I see it's it's basically two function calls with a loop over a linked > list of values (in the order of 10) doing strcmps and one strtol. > It is not quite clear to me what the impact of this is. I could > imagine it would show up only if you perform lots of trivial queries > through psql. I'm going to benchmark something now and report back. > > Anyway, regardless the benchmark, I feel it's somehow not clean to have > a variable introduce a difference between "\g" and ";". > > [goes benchmarking...] Ok, so I ran a file containing 1 million lines of "select 1;" through psql (discarding the output). 5 runs each with the patch and with the patch removed (the if() in SendQuery commented). These are the results in seconds user time of psql on a Pentium M 2.0 GHz (real time was longer, since the postmaster process was on the same machine). patch | count | avg | stddev -------+-------+---------+-------------------f | 5 | 16.6722 | 0.359759919946455t | 5 | 17.2762 | 0.259528803796329 The conclusion is that, yes, the overhead is measurable, albeit with a very synthetic benchmark (of the type MySQL wins ;). Basically I'm loosing 0.6 usec on each query line (when FETCH_COUNT is not there and therefore psql need to scan the whole variables list in GetVariable() for nothing). Not sure if that's acceptable (I'd say yes, but then again, I'm not a cycle counter type of programmer *cough* Java *cough* ;)... Opinions? Bye, Chris. -- Chris Mair http://www.1006.org
Chris Mair <chrisnospam@1006.org> writes: > The conclusion is that, yes, the overhead is measurable, albeit with > a very synthetic benchmark (of the type MySQL wins ;). OK, so about 3 or 4% overhead added to extremely short queries. That's not enough to kill this patch, but it's still annoying ... and as I mentioned, there are some existing calls of GetVariable that are executed often enough to be a problem too. It strikes me that having to do GetVariable *and* strtol and so on for these special variables is pretty silly; the work should be done during the infrequent times they are set, rather than the frequent times they are read. I propose that we add the equivalent of a GUC assign_hook to psql's variable facility, attach an assign hook function to each special variable, and have the assign hook transpose the value into an internal variable that can be read with no overhead. If we do that then the cost of the FETCH_COUNT patch will be unmeasurable, and I think we'll see a few percent savings overall in psql runtime from improving the existing hotspot uses of GetVariable. Barring objections, I'll hack on this this evening ... regards, tom lane
> > The conclusion is that, yes, the overhead is measurable, albeit with > > a very synthetic benchmark (of the type MySQL wins ;). > > OK, so about 3 or 4% overhead added to extremely short queries. More accurately, that 3 or 4% overhead is added to about all queries (we're talking *psql*-only running time). It's just that for anything but short queries, psql running time totally dwarfs regarding to postmaster running time anyway. > That's not enough to kill this patch, but it's still annoying ... > and as I mentioned, there are some existing calls of GetVariable > that are executed often enough to be a problem too. > > It strikes me that having to do GetVariable *and* strtol and so on > for these special variables is pretty silly; the work should be done > during the infrequent times they are set, rather than the frequent > times they are read. I propose that we add the equivalent of a GUC > assign_hook to psql's variable facility, attach an assign hook function > to each special variable, and have the assign hook transpose the > value into an internal variable that can be read with no overhead. > If we do that then the cost of the FETCH_COUNT patch will be > unmeasurable, and I think we'll see a few percent savings overall in > psql runtime from improving the existing hotspot uses of GetVariable. > > Barring objections, I'll hack on this this evening ... Might help. Take into account the strtol is not critical at all for FETCH_COUNT, since when it's actually set, we're supposed to retrieving big data where a strtol doesn't matter anyway. The overhead comes from scanning the linked list for nothing in the normal case (when it's not set). I don't know how the other uses factor in here, but I see it's called at least twice more on average calls to SendQuery. Bye, Chris. -- Chris Mair http://www.1006.org
Peter Eisentraut wrote: > Tom Lane wrote: > > Wait a minute. What I thought we had agreed to was a patch to make > > commands sent with \g use a cursor. This patch changes SendQuery > > so that *every* command executed via psql is treated this way. > > That's what I remembered. I don't think we want to introduce a > difference between ; and \g. I am confused. I assume \g and ; should be affected, like Peter says. Tom, what *every* command are you talking about? You mean \d? -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Peter Eisentraut wrote: >> Tom Lane wrote: >>> Wait a minute. What I thought we had agreed to was a patch to make >>> commands sent with \g use a cursor. > I am confused. I assume \g and ; should be affected, like Peter says. > Tom, what *every* command are you talking about? You mean \d? Like I said, I thought we were intending to modify \g's behavior only; that was certainly the implication of the discussion of "\gc". regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Peter Eisentraut wrote: > >> Tom Lane wrote: > >>> Wait a minute. What I thought we had agreed to was a patch to make > >>> commands sent with \g use a cursor. > > > I am confused. I assume \g and ; should be affected, like Peter says. > > Tom, what *every* command are you talking about? You mean \d? > > Like I said, I thought we were intending to modify \g's behavior only; > that was certainly the implication of the discussion of "\gc". OK, got it. I just don't see the value to doing \g and not ;. I think the \gc case was a hack when he didn't have \set. Now that we have \set, we should be consistent. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > OK, got it. I just don't see the value to doing \g and not ;. I think > the \gc case was a hack when he didn't have \set. Now that we have > \set, we should be consistent. I'm willing to accept this if we can make sure we aren't adding any overhead --- see my proposal elsewhere in the thread for fixing that. regards, tom lane
> > > I am confused. I assume \g and ; should be affected, like Peter says. > > > Tom, what *every* command are you talking about? You mean \d? > > > > Like I said, I thought we were intending to modify \g's behavior only; > > that was certainly the implication of the discussion of "\gc". At some point you OKed the "\g and ;" proposal. I admit I was quick adding the "and ;" part, but it seemed so natural once we agreed on using a variable. > OK, got it. I just don't see the value to doing \g and not ;. I think > the \gc case was a hack when he didn't have \set. Now that we have > \set, we should be consistent. I agree with this statement. If we have a variable that switches just between two versions of \g, we could have gone with using \u (or whatever) in the first place. In the mean time I have been converted by the variable camp, and I think the variable should change "\g" and ";" together, consistently. If we find we can't live with the performance overhead of that if(FETCH_COUNT), it is still not clear why we would be better off moving it into the \g code path only. Is it because presumably \g is used less often in existing psql scripts? Bye, Chris. -- Chris Mair http://www.1006.org
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > OK, got it. I just don't see the value to doing \g and not ;. I think > > the \gc case was a hack when he didn't have \set. Now that we have > > \set, we should be consistent. > > I'm willing to accept this if we can make sure we aren't adding any > overhead --- see my proposal elsewhere in the thread for fixing that. Right, if \g has overhead, I don't want people to start using ; because it is faster. That is the kind of behavior that makes us look sloppy. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +