Обсуждение: FILTER/WITHIN GROUP vs. expressions; is a HINT possible here?
Folks: SELECT device_id, count(*)::INT as present, count(*)::INT FILTER (WHERE valid) as valid_count, mode()::INT WITHIN GROUP (order by val) as mode, percentile_disc(0.5)::INT WITHIN GROUP (order by val) asmedian FROM dataflow_0913 GROUP BY device_id ORDER BY device_id; ERROR: syntax error at or near "FILTER" LINE 4: count(*)::INT FILTER (WHERE valid) as valid_count, The error is right, that's invalid syntax. I can't insert a ::INT between the aggregate() and FILTER. However, the error message is also rather confusing to the user; they're likely to look for their mistake in the wrong place. The same goes for WITHIN GROUP (and OVER, too, I think). Is there some kind of possible HINT we could add to make this easier to debug? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Folks:
SELECT
device_id,
count(*)::INT as present,
count(*)::INT FILTER (WHERE valid) as valid_count,
mode()::INT WITHIN GROUP (order by val) as mode,
percentile_disc(0.5)::INT WITHIN GROUP (order by val)
as median
FROM dataflow_0913
GROUP BY device_id
ORDER BY device_id;
ERROR: syntax error at or near "FILTER"
LINE 4: count(*)::INT FILTER (WHERE valid)
as valid_count,
The error is right, that's invalid syntax. I can't insert a ::INT
between the aggregate() and FILTER. However, the error message is also
rather confusing to the user; they're likely to look for their mistake
in the wrong place. The same goes for WITHIN GROUP (and OVER, too, I
think).
SELECT count(*)::int OVER ()
FROM ( VALUES (1),(2),(3) ) src;
Is there some kind of possible HINT we could add to make this easier to
debug?
Do you have a suggested hint so that the effort to make it work can be compared to its usefulness?
For kicks I ran the following - since "val::type" is simply another syntax for "type(val)"...
SELECT ceil(count(*)) OVER ()
FROM ( VALUES (1),(2),(3) ) src
SQL Error: ERROR: OVER specified, but ceil is not a window function nor an aggregate function
LINE 1: SELECT ceil(count(*)) OVER ()
The non-hint has been around as long as window functions and hasn't really come up as an issue - not enough so to motivate a change at least. Is the idiomatic usage of FILTER and WITHIN GROUP making this error more likely?
The foot-gun in your blog post is more problematic but also seemingly impossible to avoid except through education of the user. It would not be unreasonable to accept that the current error is acting like a canary and forcing the user to go read the documentation on OVER/FILTER/WITHIN GROUP and learn to write the expression as a single unit.
If this is not covered adequately enough in the documentation then that should be remedied. Did you evaluate the documentation in that light while preparing your blog post?
David J.
On 04/16/2015 01:01 PM, David G. Johnston wrote: > If this is not covered adequately enough in the documentation then that > should be remedied. Did you evaluate the documentation in that light > while preparing your blog post? Your response seems very defensive. I'm unclear on why discussing improving an error message should get such a hostile response; it certainly doesn't invite a serious reply. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 04/16/2015 01:01 PM, David G. Johnston wrote:
> If this is not covered adequately enough in the documentation then that
> should be remedied. Did you evaluate the documentation in that light
> while preparing your blog post?
Your response seems very defensive. I'm unclear on why discussing
improving an error message should get such a hostile response; it
certainly doesn't invite a serious reply.
Mostly because I have no clue how difficult it would be...
I'm all for improvement but obviously if it was that simple you would have just proposed the desired wording and asked someone to commit it. I was trying to understand (and communicate my understanding) why it wasn't done better in the first place.
Beyond that there is still the need to teach the user the correct syntax of these functions - and not only point out when they get it wrong so they blindly fix their typo - in order to better avoid the situation where the syntax is valid but the outcome is unexpected. Two separate problems of which the later seems more important than the former - and probably somewhat easier to implement.
David J.
On 04/16/2015 01:18 PM, David G. Johnston wrote: > On Thu, Apr 16, 2015 at 1:06 PM, Josh Berkus <josh@agliodbs.com > <mailto:josh@agliodbs.com>>wrote: > > On 04/16/2015 01:01 PM, David G. Johnston wrote: > > If this is not covered adequately enough in the documentation then that > > should be remedied. Did you evaluate the documentation in that light > > while preparing your blog post? > > Your response seems very defensive. I'm unclear on why discussing > improving an error message should get such a hostile response; it > certainly doesn't invite a serious reply. > > > Mostly because I have no clue how difficult it would be... > > I'm all for improvement but obviously if it was that simple you would > have just proposed the desired wording and asked someone to commit it. > I was trying to understand (and communicate my understanding) why it > wasn't done better in the first place. Right. First, I don't know that it's possible for the parser to figure out that error as opposed to actual mistakes in the filter clause, or just straight gibberish. Second, I'm not sure how to write a succinct HINT which gets the point across. However, the error we're giving for windowing suggests that we can improve things. In the example you quote: SQL Error: ERROR: OVER specified, but ceil is not a window function nor an aggregate function LINE 1: SELECT ceil(count(*)) OVER () ... that's actually a lot clearer error. However, we still get the same error with casting: josh=# select count(*)::INT OVER (order by device_id, val) from dataflow_0913 ; ERROR: syntax error at or near "OVER" LINE 1: select count(*)::INT OVER (order by device_id, val) from dat... ... so maybe we can't. > Beyond that there is still the need to teach the user the correct syntax > of these functions - and not only point out when they get it wrong so > they blindly fix their typo - in order to better avoid the situation > where the syntax is valid but the outcome is unexpected. Two separate > problems of which the later seems more important than the former - and > probably somewhat easier to implement. Absolutely. Hence my blog post; I'm hoping that people will at least get a google hit on the error text. The problem is, in the main docs, how do we reasonably document *combining* features? In Syntax? My head hurts just thinking about it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > ERROR: syntax error at or near "FILTER" > LINE 4: count(*)::INT FILTER (WHERE valid) as valid_count, > The error is right, that's invalid syntax. I can't insert a ::INT > between the aggregate() and FILTER. However, the error message is also > rather confusing to the user; they're likely to look for their mistake > in the wrong place. The same goes for WITHIN GROUP (and OVER, too, I > think). > Is there some kind of possible HINT we could add to make this easier to > debug? Probably not; or at least, IMO it would be a fundamental misjudgment to go in with the idea of improving this one single case. It'd be cool if we could improve reporting of syntax errors in general, though. Unfortunately Bison doesn't provide a whole heckuva lot of support for that. The only simple thing it offers is %error-verbose, which I've experimented with in the past and come away with the impression that it'd be completely useless for most people :-(. It seems to basically add information about which tokens would be valid next at the point of the error report. That isn't any help for the sort of user conceptual error you're talking about here. You could imagine teaching yyerror() to have some SQL-specific knowledge that it could apply by comparing the current lookahead token to the current parse state stack ... but neither of those things are exposed to it by Bison. We could probably get the lookahead token indirectly by inspecting the lexer's state, but it's not clear that's enough to do much. In this particular example, that would mean showing the exact same hint whenever a syntax error is reported at an OVER token; and I'm afraid that it'd be mighty hard to write a hint for that that wouldn't frequently do more harm than good. (Note in particular that since OVER, FILTER, and WITHIN are unreserved keywords, they might be used as simple column/table/function names.) regards, tom lane
On 04/16/2015 04:36 PM, Tom Lane wrote: > You could imagine teaching yyerror() to have some SQL-specific knowledge > that it could apply by comparing the current lookahead token to the > current parse state stack ... but neither of those things are exposed > to it by Bison. We could probably get the lookahead token indirectly > by inspecting the lexer's state, but it's not clear that's enough to > do much. In this particular example, that would mean showing the exact > same hint whenever a syntax error is reported at an OVER token; and Yeah, and that's what we *don't* want to do. That's why I was wondering if it was even possible to figure out the "extra syntax" case. > I'm afraid that it'd be mighty hard to write a hint for that that > wouldn't frequently do more harm than good. (Note in particular that > since OVER, FILTER, and WITHIN are unreserved keywords, they might be > used as simple column/table/function names.) No kidding. Heck, I just spent 1/2 hour figuring out a bug which was actually caused by the fact that a user created a view called "mode". -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com