Обсуждение: (PATCH) Adding CORRESPONDING to Set Operations
Adding CORRESPONDING to Set Operations Initial patch, filename: corresponding_clause_v2.patch This patch adds CORRESPONDING clause to set operations according to SQL20nn standard draft as Feature F301, "CORRESPONDING in query expressions" Corresponding clause either contains a BY(...) clause or not. If it doesn't have a BY(...) clause the usage is as follows. SELECT 1 a, 2 b, 3 c UNION CORRESPONDING 4 b, 5 d, 6 c, 7 f; with output: b c ----- 2 3 4 6 i.e. matching column names are filtered and are only output from the whole set operation clause. If we introduce a BY(...) clause, then column names are further intersected with that BY clause: SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) 4 b, 5 d, 6 c, 7 f; with output: b -- 2 4 This patch compiles and tests successfully with master branch. It has been tested only on Pardus Linux i686 ( Kernel 2.6.37.6 #1 SMP i686 i686 i386 GNU/Linux) This patch includes documentation and add one regression file. This patch addresses the following TODO item: SQL Commands: Add CORRESPONDING BY to UNION/INTERSECT/EXCEPT Best Regards, Kerem KAT
Вложения
On Wed, October 19, 2011 15:01, Kerem Kat wrote: > Adding CORRESPONDING to Set Operations > Initial patch, filename: corresponding_clause_v2.patch I had a quick look at the behaviour of this patch. Btw, the examples in your email were typoed (one select is missing): > SELECT 1 a, 2 b, 3 c UNION CORRESPONDING 4 b, 5 d, 6 c, 7 f; should be: SELECT 1 a, 2 b, 3 c UNION CORRESPONDING select 4 b, 5 d, 6 c, 7 f; and > SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) 4 b, 5 d, 6 c, 7 f; should be: SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) select 4 b, 5 d, 6 c, 7 f; > But there is also a small bug, I think: the order in the CORRESPONDING BY list should be followed, according to the standard (foundation, p. 408): "2) If <corresponding column list> is specified, then let SL be a <select list> of those <column name>s explicitly appearing in the <corresponding column list> in the order that these <column name>s appear in the <corresponding column list>. Every <column name> in the <corresponding column list> shall be a <column name> of both T1 and T2." That would make this wrong, I think: SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(c,b) select 5 d, 6 c, 7 f, 4 b ; b | c ---+---2 | 34 | 6 (2 rows) i.e., I think it should show columns in the order c, b (and not b, c); the order of the CORRESPONDING BY phrase. (but maybe I'm misreading the text of the standard; I find it often difficult to follow) Thanks, Erik Rijkers
On Mon, Oct 24, 2011 at 20:52, Erik Rijkers <er@xs4all.nl> wrote: > On Wed, October 19, 2011 15:01, Kerem Kat wrote: >> Adding CORRESPONDING to Set Operations >> Initial patch, filename: corresponding_clause_v2.patch > > I had a quick look at the behaviour of this patch. > > Btw, the examples in your email were typoed (one select is missing): > >> SELECT 1 a, 2 b, 3 c UNION CORRESPONDING 4 b, 5 d, 6 c, 7 f; > should be: > SELECT 1 a, 2 b, 3 c UNION CORRESPONDING select 4 b, 5 d, 6 c, 7 f; > > and > >> SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) 4 b, 5 d, 6 c, 7 f; > should be: > SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) select 4 b, 5 d, 6 c, 7 f; >> Yes you are correct, mea culpa. > > > > But there is also a small bug, I think: the order in the CORRESPONDING BY list should be followed, > according to the standard (foundation, p. 408): > > "2) If <corresponding column list> is specified, then let SL be a <select list> of those <column > name>s explicitly appearing in the <corresponding column list> in the order that these > <column name>s appear in the <corresponding column list>. Every <column name> in the > <corresponding column list> shall be a <column name> of both T1 and T2." > > That would make this wrong, I think: > > SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(c,b) select 5 d, 6 c, 7 f, 4 b ; > > b | c > ---+--- > 2 | 3 > 4 | 6 > (2 rows) > > i.e., I think it should show columns in the order c, b (and not b, c); the order of the > CORRESPONDING BY phrase. > > (but maybe I'm misreading the text of the standard; I find it often difficult to follow) > It wasn't a misread, I checked the draft, in my version same explanation is at p.410. I have corrected the ordering of the targetlists of subqueries. And added 12 regression tests for column list ordering. Can you confirm that the order has changed for you? > > Thanks, > > > Erik Rijkers > > Regards, Kerem KAT
Вложения
On Tue, October 25, 2011 19:49, Kerem Kat wrote: > On Mon, Oct 24, 2011 at 20:52, Erik Rijkers <er@xs4all.nl> wrote: >> On Wed, October 19, 2011 15:01, Kerem Kat wrote: >>> Adding CORRESPONDING to Set Operations > I have corrected the ordering of the targetlists of subqueries. And > added 12 regression > tests for column list ordering. Can you confirm that the order has > changed for you? > Yes, this one is OK. thanks, Erik Rijkers
(pgsql 9.2devel (25 oct) with your latest CORRESPONDING patch; linux x86_64 GNU/Linux 2.6.18-274.3.1.el5) Hi, here is another peculiarity, which I think is a bug: -- first without CORRESPONDING: $ psql -Xaf null.sql select 1 a , 2 b union all select null a, 4 b ;a | b ---+---1 | 2 | 4 (2 rows) -- then with CORRESPONDING: select 1 a , 2 b union all corresponding select null a, 4 b ; psql:null.sql:9: ERROR: failed to find conversion function from unknown to integer If the null value is in a table column the error does not occur: drop table if exists t1; create table t1 (a int, b int); insert into t1 values (1,2); drop table if exists t2; create table t2 (a int, b int); insert into t2 values (null,2); select a,b from t1 union all corresponding select a,b from t2 ;a | b ---+---1 | 2 | 2 (2 rows) I'm not sure it is actually a bug; but it seems an unneccessary error. thanks, Erik Rijkers
Hi, Union with NULL error persists without the corresponding patch. Here is the output from postgres without the patch: SELECT a FROM (SELECT 1 a) foo UNION SELECT a FROM (SELECT NULL a) foo2; ERROR: failed to find conversion function from unknown to integer It is thrown from parse_coerce.c:coerce_type method. I will try to dig deep on it. Regards, Kerem KAT On Thu, Oct 27, 2011 at 15:45, Erik Rijkers <er@xs4all.nl> wrote: > (pgsql 9.2devel (25 oct) with your latest CORRESPONDING patch; > linux x86_64 GNU/Linux 2.6.18-274.3.1.el5) > > Hi, > > here is another peculiarity, which I think is a bug: > > -- first without CORRESPONDING: > > $ psql -Xaf null.sql > select 1 a , 2 b > union all > select null a, 4 b ; > a | b > ---+--- > 1 | 2 > | 4 > (2 rows) > > -- then with CORRESPONDING: > > select 1 a , 2 b > union all > corresponding > select null a, 4 b ; > psql:null.sql:9: ERROR: failed to find conversion function from unknown to integer > > > If the null value is in a table column the error does not occur: > > drop table if exists t1; create table t1 (a int, b int); insert into t1 values (1,2); > drop table if exists t2; create table t2 (a int, b int); insert into t2 values (null,2); > select a,b from t1 > union all > corresponding > select a,b from t2 ; > a | b > ---+--- > 1 | 2 > | 2 > (2 rows) > > > I'm not sure it is actually a bug; but it seems an unneccessary error. > > > thanks, > > Erik Rijkers > >
Kerem Kat <keremkat@gmail.com> writes: > Union with NULL error persists without the corresponding patch. Here > is the output from postgres without the patch: > SELECT a FROM (SELECT 1 a) foo > UNION > SELECT a FROM (SELECT NULL a) foo2; > ERROR: failed to find conversion function from unknown to integer Yeah, this is a longstanding issue that is not simple to fix without introducing other unpleasantnesses. It is not something you should try to deal with at the same time as implementing CORRESPONDING. regards, tom lane
I wrote: > Kerem Kat <keremkat@gmail.com> writes: >> Union with NULL error persists without the corresponding patch. Here >> is the output from postgres without the patch: >> SELECT a FROM (SELECT 1 a) foo >> UNION >> SELECT a FROM (SELECT NULL a) foo2; >> ERROR: failed to find conversion function from unknown to integer > Yeah, this is a longstanding issue that is not simple to fix without > introducing other unpleasantnesses. It is not something you should > try to deal with at the same time as implementing CORRESPONDING. BTW, just to clarify: although that case fails, the case Erik was complaining of does work in unmodified Postgres: regression=# select 1 a , 2 b union all select null a, 4 b ;a | b ---+---1 | 2 | 4 (2 rows) and I agree with him that it should still work with CORRESPONDING. Even though the behavior of unlabeled NULLs is less than perfect, we definitely don't want to break cases that work now. I suspect the failure means that you tried to postpone too much work to plan time. You do have to match up the columns honestly at parse time and do the necessary type coercions on them then. regards, tom lane
Kerem Kat <keremkat@gmail.com> writes: > On Thu, Oct 27, 2011 at 23:20, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> BTW, just to clarify: although that case fails, the case Erik was >> complaining of does work in unmodified Postgres: >> ... >> and I agree with him that it should still work with CORRESPONDING. > That is by design, because CORRESPONDING is implemented as subqueries: Well, this appears to me to be a counterexample sufficient to refute that implementation decision. You can inject subqueries at plan time, if that helps you make things match up, but you can't rearrange things that way at parse time, as I gather you're doing or else you would not be seeing this problem. In any case, I already pointed out to you that rearranging the parse tree that way is problematic for reverse-listing the parse tree. We don't want to see subqueries injected in the results of printing parse trees with ruleutils.c. regards, tom lane
On Thu, Oct 27, 2011 at 23:20, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Kerem Kat <keremkat@gmail.com> writes: >>> Union with NULL error persists without the corresponding patch. Here >>> is the output from postgres without the patch: > >>> SELECT a FROM (SELECT 1 a) foo >>> UNION >>> SELECT a FROM (SELECT NULL a) foo2; > >>> ERROR: failed to find conversion function from unknown to integer > >> Yeah, this is a longstanding issue that is not simple to fix without >> introducing other unpleasantnesses. It is not something you should >> try to deal with at the same time as implementing CORRESPONDING. > > BTW, just to clarify: although that case fails, the case Erik was > complaining of does work in unmodified Postgres: > > regression=# select 1 a , 2 b > union all > select null a, 4 b ; > a | b > ---+--- > 1 | 2 > | 4 > (2 rows) > > and I agree with him that it should still work with CORRESPONDING. > Even though the behavior of unlabeled NULLs is less than perfect, > we definitely don't want to break cases that work now. I suspect > the failure means that you tried to postpone too much work to plan > time. You do have to match up the columns honestly at parse time > and do the necessary type coercions on them then. > > regards, tom lane > That is by design, because CORRESPONDING is implemented as subqueries: select 1 a , 2 b union all corresponding select null a, 4 b ; is equivalent to SELECT a, b FROM ( SELECT 1 a, 2 b ) foo UNION ALL SELECT a, b FROM ( SELECT null a, 4 b ) foo2; which gives the same error in unpatched postgres. Regards, Kerem KAT
On 25 October 2011 18:49, Kerem Kat <keremkat@gmail.com> wrote: > On Mon, Oct 24, 2011 at 20:52, Erik Rijkers <er@xs4all.nl> wrote: >> On Wed, October 19, 2011 15:01, Kerem Kat wrote: >>> Adding CORRESPONDING to Set Operations >>> Initial patch, filename: corresponding_clause_v2.patch >> >> I had a quick look at the behaviour of this patch. >> >> Btw, the examples in your email were typoed (one select is missing): >> >>> SELECT 1 a, 2 b, 3 c UNION CORRESPONDING 4 b, 5 d, 6 c, 7 f; >> should be: >> SELECT 1 a, 2 b, 3 c UNION CORRESPONDING select 4 b, 5 d, 6 c, 7 f; >> >> and >> >>> SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) 4 b, 5 d, 6 c, 7 f; >> should be: >> SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) select 4 b, 5 d, 6 c, 7 f; >>> > > Yes you are correct, mea culpa. > >> >> >> >> But there is also a small bug, I think: the order in the CORRESPONDING BY list should be followed, >> according to the standard (foundation, p. 408): >> >> "2) If <corresponding column list> is specified, then let SL be a <select list> of those <column >> name>s explicitly appearing in the <corresponding column list> in the order that these >> <column name>s appear in the <corresponding column list>. Every <column name> in the >> <corresponding column list> shall be a <column name> of both T1 and T2." >> >> That would make this wrong, I think: >> >> SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(c,b) select 5 d, 6 c, 7 f, 4 b ; >> >> b | c >> ---+--- >> 2 | 3 >> 4 | 6 >> (2 rows) >> >> i.e., I think it should show columns in the order c, b (and not b, c); the order of the >> CORRESPONDING BY phrase. >> >> (but maybe I'm misreading the text of the standard; I find it often difficult to follow) >> > > It wasn't a misread, I checked the draft, in my version same > explanation is at p.410. > I have corrected the ordering of the targetlists of subqueries. And > added 12 regression > tests for column list ordering. Can you confirm that the order has > changed for you? > > >> >> Thanks, >> >> >> Erik Rijkers >> >> > > Regards, > > Kerem KAT This explain plan doesn't look right to me: test=# explain select a,b,c from one intersect corresponding by (a,c) select a,b,c from two; QUERY PLAN ---------------------------------------------------------------------------------HashSetOp Intersect (cost=0.00..117.00rows=200 width=8) -> Append (cost=0.00..97.60 rows=3880 width=8) -> Subquery Scan on "*SELECT*3" (cost=0.00..48.80 rows=1940 width=8) -> Seq Scan on one (cost=0.00..29.40 rows=1940 width=8) -> Subquery Scan on "*SELECT* 4" (cost=0.00..48.80 rows=1940 width=8) -> Seq Scan on two (cost=0.00..29.40 rows=1940 width=8) (6 rows) If I do the same thing without the "corresponding...": test=# explain select a,b,c from one intersect select a,b,c from two; QUERY PLAN ----------------------------------------------------------------------------------HashSetOp Intersect (cost=0.00..126.70rows=200 width=12) -> Append (cost=0.00..97.60 rows=3880 width=12) -> Subquery Scan on "*SELECT*1" (cost=0.00..48.80 rows=1940 width=12) -> Seq Scan on one (cost=0.00..29.40 rows=1940 width=12) -> Subquery Scan on "*SELECT*2" (cost=0.00..48.80 rows=1940 width=12) -> Seq Scan on two (cost=0.00..29.40 rows=1940 width=12) (6 rows) So it looks like it's now seeing the two tables as the 3rd and 4th tables, even though there are only 2 tables in total. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935 EnterpriseDB UK: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> This explain plan doesn't look right to me: > > test=# explain select a,b,c from one intersect corresponding by (a,c) > select a,b,c from two; > QUERY PLAN > --------------------------------------------------------------------------------- > HashSetOp Intersect (cost=0.00..117.00 rows=200 width=8) > -> Append (cost=0.00..97.60 rows=3880 width=8) > -> Subquery Scan on "*SELECT* 3" (cost=0.00..48.80 rows=1940 width=8) > -> Seq Scan on one (cost=0.00..29.40 rows=1940 width=8) > -> Subquery Scan on "*SELECT* 4" (cost=0.00..48.80 rows=1940 width=8) > -> Seq Scan on two (cost=0.00..29.40 rows=1940 width=8) > (6 rows) In the current implementation, select a,b,c from one intersect corresponding by (a,c) select a,b,c from two; is translated to equivalent select a, c from (select a,b,c from one) intersect select a, c from (select a,b,c from two); Methinks that's the reason for this explain output. Corresponding is currently implemented in the parse/analyze phase. If it were to be implemented in the planning phase, explain output would likely be as you expect it to be. > If I do the same thing without the "corresponding...": > > test=# explain select a,b,c from one intersect select a,b,c from two; > QUERY PLAN > ---------------------------------------------------------------------------------- > HashSetOp Intersect (cost=0.00..126.70 rows=200 width=12) > -> Append (cost=0.00..97.60 rows=3880 width=12) > -> Subquery Scan on "*SELECT* 1" (cost=0.00..48.80 > rows=1940 width=12) > -> Seq Scan on one (cost=0.00..29.40 rows=1940 width=12) > -> Subquery Scan on "*SELECT* 2" (cost=0.00..48.80 > rows=1940 width=12) > -> Seq Scan on two (cost=0.00..29.40 rows=1940 width=12) > (6 rows) > > So it looks like it's now seeing the two tables as the 3rd and 4th > tables, even though there are only 2 tables in total. > > -- > Thom Brown > Twitter: @darkixion > IRC (freenode): dark_ixion > Registered Linux user: #516935 > > EnterpriseDB UK: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > Regards, Kerem KAT
On 14 November 2011 11:29, Kerem Kat <keremkat@gmail.com> wrote:
> This explain plan doesn't look right to me:In the current implementation,
>
> test=# explain select a,b,c from one intersect corresponding by (a,c)
> select a,b,c from two;
> QUERY PLAN
> ---------------------------------------------------------------------------------
> HashSetOp Intersect (cost=0.00..117.00 rows=200 width=8)
> -> Append (cost=0.00..97.60 rows=3880 width=8)
> -> Subquery Scan on "*SELECT* 3" (cost=0.00..48.80 rows=1940 width=8)
> -> Seq Scan on one (cost=0.00..29.40 rows=1940 width=8)
> -> Subquery Scan on "*SELECT* 4" (cost=0.00..48.80 rows=1940 width=8)
> -> Seq Scan on two (cost=0.00..29.40 rows=1940 width=8)
> (6 rows)is translated to equivalent
select a,b,c from one intersect corresponding by (a,c) select a,b,c from two;
select a, c from (select a,b,c from one)
intersect
select a, c from (select a,b,c from two);
Methinks that's the reason for this explain output.
Corresponding is currently implemented in the parse/analyze phase. If
it were to be implemented in the planning phase, explain output would
likely be as you expect it to be.
I'm certainly no expert on what the right way to represent the plan is, but I'm still uncomfortable with its current representation. And having just tested the translated equivalent, I still don't get the same explain plan:
test=# explain select a, c from (select a,b,c from one) a
intersect
select a, c from (select a,b,c from two) b;
QUERY PLAN
---------------------------------------------------------------------------------
HashSetOp Intersect (cost=0.00..117.00 rows=200 width=8)
-> Append (cost=0.00..97.60 rows=3880 width=8)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..48.80 rows=1940 width=8)
-> Seq Scan on one (cost=0.00..29.40 rows=1940 width=8)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..48.80 rows=1940 width=8)
-> Seq Scan on two (cost=0.00..29.40 rows=1940 width=8)
(6 rows)
Also you probably want to update src/backend/catalog/sql_features.txt so that F301 is marked as "YES" for supporting the standard. :)
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935
EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Kerem Kat <keremkat@gmail.com> writes: > Corresponding is currently implemented in the parse/analyze phase. If > it were to be implemented in the planning phase, explain output would > likely be as you expect it to be. It's already been pointed out to you that doing this at parse time is unacceptable, because of the implications for reverse-listing of rules (views). regards, tom lane
On Mon, Nov 14, 2011 at 15:32, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Kerem Kat <keremkat@gmail.com> writes: >> Corresponding is currently implemented in the parse/analyze phase. If >> it were to be implemented in the planning phase, explain output would >> likely be as you expect it to be. > > It's already been pointed out to you that doing this at parse time is > unacceptable, because of the implications for reverse-listing of rules > (views). > > regards, tom lane > I am well aware of that thank you. Regards, Kerem KAT
On Mon, Nov 14, 2011 at 6:09 AM, Kerem Kat <keremkat@gmail.com> wrote: > On Mon, Nov 14, 2011 at 15:32, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Kerem Kat <keremkat@gmail.com> writes: >>> Corresponding is currently implemented in the parse/analyze phase. If >>> it were to be implemented in the planning phase, explain output would >>> likely be as you expect it to be. >> >> It's already been pointed out to you that doing this at parse time is >> unacceptable, because of the implications for reverse-listing of rules >> (views). >> >> regards, tom lane >> > > I am well aware of that thank you. I took a quick look at the patch and found some miscellaneous points including: - don't use // style comment - keep consistent in terms of space around parenthesis like if and foreach - ereport should have error position as long as possible, especially in syntax error - I'm not convinced that new correspoinding_union.sql test is added. I prefer to include new tests in union.sql - CORRESPONDING BY should have column name list, not expression list. It's not legal to say CORRESPONDING BY (1 + 1) - column list rule should be presented in document, too - I don't see why you call checkWellFormedRecursionWalker on corresponding clause And more than above, Tom's point is the biggest blocker. I'd suggest to rework it so that target list of Query of RangeTblEntry on the top of tree have less columns that match the filter. By that way, I guess you can keep original information as well as filtered top-most target list. Eventually you need to work on the planner, too. Though I've not read all of the patch, the design rework should be done first. I'll mark this as Waiting on Author. Regards, -- Hitoshi Harada