Обсуждение: MULTISET and additional functions for ARRAY
Postgres supports ARRAY data types well, but there are some more array functions in the SQL standard. Also, the standard has MULTISET data type, that is an unordered array. It looks easy to support additional array functions. There might be some confusion to treat multi-dimensional arrays with them, but we could treat all arrays as one-dimensional like as unnest(). MULTISET supports are more difficult. We have corresponding type IDs for each array, but we might not want to add additional IDs for multiset for each type. Any ideas for the issue? If we reuse type IDs of arrays for multisets, the multisets would have some special typmod. For example, typmod = 0 means multiset, and positive value means array with max cardinality. Note that the SQL standard doesn't mention about multi-dimensional arrays. So, we can use typmod = -1 as a free-size and free-dimensional array for backward compatibility. If we have troublesome issues to support multiset data types, I'm thinking to add multiset functions that receives ARRAY types instead at time first time, because an ARRAY is a MULTISET by definition. Some of functions for multisets seems to be useful for arrays, too. Comments and suggestions welcome. === Array functions === - [FUNCTION] cardinality(anyarray) => integer - [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray === Multiset functions === - [FUNCTION] cardinality(anymultiset) => integer - [FUNCTION] element(anymultiset) => anyelement - [FUNCTION] multiset_member_of(anymultiset, anyelement) => boolean [SYNTAX] $2 MEMBER OF $1 - [FUNCTION] multiset_is_a_set(anymultiset) => boolean [SYNTAX] $1 IS A SET - [FUNCTION] multiset_sub_multiset_of(anymultiset, anymultiset) => boolean [SYNTAX] $2 SUB MULTISET OF $1 - [FUNCTION] multiset_union[_all](anymultiset, anymultiset) => anymultiset [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT]$2 - [FUNCTION] multiset_intersect[_all](anymultiset, anymultiset) => anymultiset [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT]$2 - [FUNCTION] multiset_except[_all](anymultiset, anymultiset) => anymultiset [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT]$2 - [AGGREGATE] collect(anyelement) => anymultiset - [AGGREGATE] fusion(anymultiset) => anymultiset - [AGGREGATE] intersection(anymultiset) => anymultiset See also secondary sources. http://waelchatila.com/2005/05/18/1116485743467.html http://farrago.sourceforge.net/design/CollectionTypes.html http://publib.boulder.ibm.com/infocenter/db2luw/v9r8/index.jsp?topic=/com.ibm.db2.luw.apdv.sqlpl.doc/doc/t0053486.html http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/conditions006.htm http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/operators006.htm -- Itagaki Takahiro
On Nov 11, 2010, at 7:02 AM, Itagaki Takahiro wrote: > MULTISET supports are more difficult. We have corresponding > type IDs for each array, but we might not want to add additional > IDs for multiset for each type. Any ideas for the issue? Why not? > If we reuse type IDs of arrays for multisets, the multisets would > have some special typmod. For example, typmod = 0 means multiset, > and positive value means array with max cardinality. Note that > the SQL standard doesn't mention about multi-dimensional arrays. > So, we can use typmod = -1 as a free-size and free-dimensional > array for backward compatibility. > > If we have troublesome issues to support multiset data types, > I'm thinking to add multiset functions that receives ARRAY > types instead at time first time, because an ARRAY is a > MULTISET by definition. An array is a superset of MULTISET, I guess? > Some of functions for multisets seems to be useful for arrays, too. > > Comments and suggestions welcome. So are you planning to implement multisets? It's a feature I'd love to see… Best, David
"David E. Wheeler" <david@kineticode.com> writes: > So are you planning to implement multisets? It's a feature I'd love to see What actual functionality does it buy? AFAICT from Itagaki-san's description, it's an array only you ignore the specific element order. So what? You can write functions that work that way now. regards, tom lane
On Nov 11, 2010, at 10:05 AM, Tom Lane wrote: >> So are you planning to implement multisets? It's a feature I'd love to see > > What actual functionality does it buy? AFAICT from Itagaki-san's > description, it's an array only you ignore the specific element order. > So what? You can write functions that work that way now. Also, no dupes. David
I think that it would be best to implement MULTISET in the same way that a TABLE is implemented. Logically and structurally they are the same thing, but that a MULTISET typically is used as a field value of a table row. Aka, a table and a multiset are just different names for a relation, loosely speaking. The association of a multiset-typed attribute of a table with said table is like the association of a child and parent table in a many-to-one. So reuse your structure for tables to hold multisets. -- Darren Duncan Itagaki Takahiro wrote: > Postgres supports ARRAY data types well, but there are some > more array functions in the SQL standard. Also, the standard > has MULTISET data type, that is an unordered array. > > It looks easy to support additional array functions. There > might be some confusion to treat multi-dimensional arrays > with them, but we could treat all arrays as one-dimensional > like as unnest(). > > MULTISET supports are more difficult. We have corresponding > type IDs for each array, but we might not want to add additional > IDs for multiset for each type. Any ideas for the issue? > > If we reuse type IDs of arrays for multisets, the multisets would > have some special typmod. For example, typmod = 0 means multiset, > and positive value means array with max cardinality. Note that > the SQL standard doesn't mention about multi-dimensional arrays. > So, we can use typmod = -1 as a free-size and free-dimensional > array for backward compatibility. > > If we have troublesome issues to support multiset data types, > I'm thinking to add multiset functions that receives ARRAY > types instead at time first time, because an ARRAY is a > MULTISET by definition. Some of functions for multisets > seems to be useful for arrays, too. > > Comments and suggestions welcome.
2010/11/11 David E. Wheeler <david@kineticode.com>: > On Nov 11, 2010, at 10:05 AM, Tom Lane wrote: > >>> So are you planning to implement multisets? It's a feature I'd love to see >> >> What actual functionality does it buy? AFAICT from Itagaki-san's >> description, it's an array only you ignore the specific element order. >> So what? You can write functions that work that way now. > > Also, no dupes. The "multi" in multiset indicates that duplicate elements are explicitly allowed and tracked. Nicolas
On Nov 11, 2010, at 10:19 AM, Darren Duncan wrote: > I think that it would be best to implement MULTISET in the same way that a TABLE is implemented. Logically and structurallythey are the same thing, but that a MULTISET typically is used as a field value of a table row. Aka, a tableand a multiset are just different names for a relation, loosely speaking. That sounds like a composite type to me. Best, David
On Nov 11, 2010, at 10:24 AM, Nicolas Barbier wrote: >> Also, no dupes. > > The "multi" in multiset indicates that duplicate elements are > explicitly allowed and tracked. D'oh! Right. D
Excerpts from David E. Wheeler's message of jue nov 11 15:45:55 -0300 2010: > On Nov 11, 2010, at 10:19 AM, Darren Duncan wrote: > > > I think that it would be best to implement MULTISET in the same way that a TABLE is implemented. Logically and structurallythey are the same thing, but that a MULTISET typically is used as a field value of a table row. Aka, a tableand a multiset are just different names for a relation, loosely speaking. > > That sounds like a composite type to me. No, it's "perpendicular" in the sense that while a composite type allows you to have different columns, this multiset thing lets you have "rows" (I initially thought about them as sets of scalars, but AFAIU they could in turn be rows) -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Nov 11, 2010, at 12:08 PM, Alvaro Herrera wrote: >> That sounds like a composite type to me. > > No, it's "perpendicular" in the sense that while a composite type allows > you to have different columns, this multiset thing lets you have "rows" > (I initially thought about them as sets of scalars, but AFAIU they could > in turn be rows) How is that different from an array of RECORDs? Best, David
On Thu, Nov 11, 2010 at 3:42 PM, David E. Wheeler <david@kineticode.com> wrote: > On Nov 11, 2010, at 12:08 PM, Alvaro Herrera wrote: > >>> That sounds like a composite type to me. >> >> No, it's "perpendicular" in the sense that while a composite type allows >> you to have different columns, this multiset thing lets you have "rows" >> (I initially thought about them as sets of scalars, but AFAIU they could >> in turn be rows) > > How is that different from an array of RECORDs? not a whole lot outside of syntax details...there is a decent summary here: http://waelchatila.com/2005/05/18/1116485743467.html I like this part: "Alternatively the SQL standard also permits the same construct with the bracket trigraphs ??( and ??)" :-D merlin
Merlin Moncure wrote: > On Thu, Nov 11, 2010 at 3:42 PM, David E. Wheeler <david@kineticode.com> wrote: >> On Nov 11, 2010, at 12:08 PM, Alvaro Herrera wrote: >> >>>> That sounds like a composite type to me. >>> No, it's "perpendicular" in the sense that while a composite type allows >>> you to have different columns, this multiset thing lets you have "rows" >>> (I initially thought about them as sets of scalars, but AFAIU they could >>> in turn be rows) >> How is that different from an array of RECORDs? I could ask the same question about a TABLE, the ordering issue aside. This is one place that SQL made things more complicated than they needed to be. Multisets have generally the same structure*and* operators (union, etc) as tables, but they use different syntax for each. A better design would be to make tables and multisets interchangeable. Its an unnecessary distinction. > not a whole lot outside of syntax details...there is a decent summary > here: http://waelchatila.com/2005/05/18/1116485743467.html > > I like this part: "Alternatively the SQL standard also permits the > same construct with the bracket trigraphs ??( and ??)" :-D As I recall, the concept of using stuff like ?( or ?) etc was so that SQL could be written in EBCDIC which natively lacks some of the bracketing characters that ASCII has. Hence, such is an alternative way to spell either { } or [ ] (I forget which). -- Darren Duncan
On Fri, Nov 12, 2010 at 03:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "David E. Wheeler" <david@kineticode.com> writes: >> So are you planning to implement multisets? It's a feature I'd love to see > > What actual functionality does it buy? AFAICT from Itagaki-san's > description, it's an array only you ignore the specific element order. > So what? You can write functions that work that way now. I think there are almost no difference between a multiset and an array in terms of functions I described in the first mail. However, if we have separated multiset data type, we could have special comparison operators for them; "array = array" returns true only if they have the same elements in the same order, but "multiset = multiset" only checks elements in them. Also, we could optimize on-disk structure of multiset for fast UNION operations or for dataset that has many duplicates. For example, we could use a sorted array of {value, count} pairs. If we decide to have data type IDs for multiset, I'll go for it (ex. int4, _int4, and an additional $int4), but it consumes +50% of typoids. If it is not preferable, only function support might be better at the first try. -- Itagaki Takahiro
On Fri, Nov 12, 2010 at 06:06, Darren Duncan <darren@darrenduncan.net> wrote: > This is one place that SQL made things more complicated than they needed to > be. Multisets have generally the same structure *and* operators (union, > etc) as tables, but they use different syntax for each. A better design > would be to make tables and multisets interchangeable. Its an unnecessary > distinction. We can use unnest() to convert MULTISET into TABLE, and collect() agg function from TABLE to MULTISET. I don't think they need to have the same on-disk structure; they can share operators and constructor syntax even now. -- Itagaki Takahiro
On Thu, Nov 11, 2010 at 10:02 AM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > If we reuse type IDs of arrays for multisets, the multisets would > have some special typmod. For example, typmod = 0 means multiset, > and positive value means array with max cardinality. Note that > the SQL standard doesn't mention about multi-dimensional arrays. > So, we can use typmod = -1 as a free-size and free-dimensional > array for backward compatibility. I would really like to see us fix our type system so that it doesn't require this type of awful hack. But maybe that's asking too much of a patch to implement this feature. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Nov 11, 2010 at 10:02 AM, Itagaki Takahiro > <itagaki.takahiro@gmail.com> wrote: >> If we reuse type IDs of arrays for multisets, the multisets would >> have some special typmod. For example, typmod = 0 means multiset, >> and positive value means array with max cardinality. Note that >> the SQL standard doesn't mention about multi-dimensional arrays. >> So, we can use typmod = -1 as a free-size and free-dimensional >> array for backward compatibility. > I would really like to see us fix our type system so that it doesn't > require this type of awful hack. But maybe that's asking too much of > a patch to implement this feature. The problem is not with the type system: as long as you give multisets different type OIDs from arrays, everything will work fine. It will absolutely not work to try to use typmod to make the behavior vary like that ... but Itagaki-san knew that already. regards, tom lane
On Fri, Nov 12, 2010 at 12:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Nov 11, 2010 at 10:02 AM, Itagaki Takahiro >> <itagaki.takahiro@gmail.com> wrote: >>> If we reuse type IDs of arrays for multisets, the multisets would >>> have some special typmod. For example, typmod = 0 means multiset, >>> and positive value means array with max cardinality. Note that >>> the SQL standard doesn't mention about multi-dimensional arrays. >>> So, we can use typmod = -1 as a free-size and free-dimensional >>> array for backward compatibility. > >> I would really like to see us fix our type system so that it doesn't >> require this type of awful hack. But maybe that's asking too much of >> a patch to implement this feature. > > The problem is not with the type system: as long as you give multisets > different type OIDs from arrays, everything will work fine. It will > absolutely not work to try to use typmod to make the behavior vary > like that ... but Itagaki-san knew that already. And thus you must create a THIRD copy of every entry in pg_type. That doesn't qualify as a problem? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Nov 12, 2010 at 12:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The problem is not with the type system: as long as you give multisets >> different type OIDs from arrays, everything will work fine. > And thus you must create a THIRD copy of every entry in pg_type. That > doesn't qualify as a problem? [ shrug... ] It's less of a problem than the possible alternatives. IMO anyway. OIDs are cheap ... replacing OIDs with some sort of ill-specified composite key throughout the system is not. But I'm still not convinced that this feature is useful enough to justify the implementation effort. AFAICS there's nothing here that you couldn't get with some non-default operators on regular arrays, with orders of magnitude less work and less impact on the rest of the system. The only reason to consider implementing it as a separate type category is the SQL committee decided to invent some syntax --- and given their lousy taste in syntax, I get less enthused every year about duplicating it. regards, tom lane
On Fri, Nov 12, 2010 at 12:53:09AM -0500, Robert Haas wrote: > On Fri, Nov 12, 2010 at 12:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> On Thu, Nov 11, 2010 at 10:02 AM, Itagaki Takahiro > >> <itagaki.takahiro@gmail.com> wrote: > >>> If we reuse type IDs of arrays for multisets, the multisets would > >>> have some special typmod. For example, typmod = 0 means multiset, > >>> and positive value means array with max cardinality. Note that > >>> the SQL standard doesn't mention about multi-dimensional arrays. > >>> So, we can use typmod = -1 as a free-size and free-dimensional > >>> array for backward compatibility. > > > >> I would really like to see us fix our type system so that it doesn't > >> require this type of awful hack. But maybe that's asking too much of > >> a patch to implement this feature. > > > > The problem is not with the type system: as long as you give multisets > > different type OIDs from arrays, everything will work fine. It will > > absolutely not work to try to use typmod to make the behavior vary > > like that ... but Itagaki-san knew that already. > > And thus you must create a THIRD copy of every entry in pg_type. That > doesn't qualify as a problem? Yes, and I've started a separate thread on this along with a page on the wiki. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Fri, Nov 12, 2010 at 00:02, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > Postgres supports ARRAY data types well, but there are some > more array functions in the SQL standard. Also, the standard > has MULTISET data type, that is an unordered array. Here is a WIP patch for multiset function supports. Note that multiset types are not added in the patch; it just adds functions and syntax. Arguments or result types of those functions are anyarray rather than anymultiset. The result type is always flatten into on-dimensional array because some functions requires per-element operations; I'm not sure how we should treat trim_array( 2D 3x3 array, 2 elements ). So, it is treated as trim_array( 9 elements array, 2 elements ) in the patch. The SQL standard defines special syntax for multiset. I added four unreserved keywords for them; A, MEMBER, MULTISET, and SUB. (I don't like such ad-hoc syntax, but it is the standard...) Some of the new expressions are just syntactic sugar for existing other expressions or new functions. For example, "$1 MEMBER OF $2" is expanded to "$1 = ANY ($2)" and "$1 IS A SET" to "multiset_is_a_set($1)". I have not researched the spec yet enough, especially NULLs in collections. I'll continue to check the details. BTW, some of the intermediate products to implement those features might be useful if exported. like array_sort() and array_unique(). If there is demand, I'll add those functions, too. Any comments for the approach or detailed features? === New features === - [FUNCTION] cardinality(anyarray) => integer - [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray - [FUNCTION] element(anyarray) => anyelement - [SYNTAX] $1 MEMBER OF $2 --> $1 = ANY ($2) - [SYNTAX] $1 SUB MULTISET OF $2 --> $1 <@ $2 - [SYNTAX] $1 IS A SET --> multiset_is_a_set($1) - [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2 --> multiset_union($1, $2, all?) - [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $2 --> multiset_intersect($1, $2, all?) - [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $2 --> multiset_except($1, $2, all?) - [AGGREGATE] collect(anyelement) => anyarray --> same as array_agg() - [AGGREGATE] fusion(anyarray) => anyarray - [AGGREGATE] intersection(anyarray) => anyarray -- Itagaki Takahiro
Вложения
Hello 2010/11/15 Itagaki Takahiro <itagaki.takahiro@gmail.com>: > On Fri, Nov 12, 2010 at 00:02, Itagaki Takahiro > <itagaki.takahiro@gmail.com> wrote: >> Postgres supports ARRAY data types well, but there are some >> more array functions in the SQL standard. Also, the standard >> has MULTISET data type, that is an unordered array. > > Here is a WIP patch for multiset function supports. Note that multiset > types are not added in the patch; it just adds functions and syntax. > Arguments or result types of those functions are anyarray rather than > anymultiset. The result type is always flatten into on-dimensional > array because some functions requires per-element operations; I'm not > sure how we should treat trim_array( 2D 3x3 array, 2 elements ). So, > it is treated as trim_array( 9 elements array, 2 elements ) in the patch. > > The SQL standard defines special syntax for multiset. I added four > unreserved keywords for them; A, MEMBER, MULTISET, and SUB. > (I don't like such ad-hoc syntax, but it is the standard...) > Some of the new expressions are just syntactic sugar for existing > other expressions or new functions. For example, "$1 MEMBER OF $2" is > expanded to "$1 = ANY ($2)" and "$1 IS A SET" to "multiset_is_a_set($1)". > > I have not researched the spec yet enough, especially NULLs in collections. > I'll continue to check the details. > > BTW, some of the intermediate products to implement those features might > be useful if exported. like array_sort() and array_unique(). If there is > demand, I'll add those functions, too. > > Any comments for the approach or detailed features? > I has not a standard, so I can't to speak about conformance with standard, but I must to say, so these functionality should be extremely useful for plpgsql programming. You can see samples a some functions on the net for array sorting, for reduce a redundant values via SQL language. I am sure, so implementation in C will be much faster. Maybe can be useful to implement a searching on sorted array. You can hold a flag if multiset is sorted or not. Regards Pavel Stehule > === New features === > - [FUNCTION] cardinality(anyarray) => integer > - [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray > - [FUNCTION] element(anyarray) => anyelement > - [SYNTAX] $1 MEMBER OF $2 --> $1 = ANY ($2) > - [SYNTAX] $1 SUB MULTISET OF $2 --> $1 <@ $2 > - [SYNTAX] $1 IS A SET --> multiset_is_a_set($1) > - [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2 --> > multiset_union($1, $2, all?) > - [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $2 --> > multiset_intersect($1, $2, all?) > - [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $2 --> > multiset_except($1, $2, all?) > - [AGGREGATE] collect(anyelement) => anyarray --> same as array_agg() > - [AGGREGATE] fusion(anyarray) => anyarray > - [AGGREGATE] intersection(anyarray) => anyarray > > -- > Itagaki Takahiro > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
On Mon, Nov 15, 2010 at 14:37, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Maybe can be useful to implement a searching on sorted array. > You can hold a flag if multiset is sorted or not. Are you suggesting to add an IS_SORTED bit to for each ArrayType value? It might be possible, but I'm not sure how much it is useful. Another issue for sorting is that we have 4 kinds of sorting: ASC/DESC and NULLS FIRST/LAST. I always used the default sort order (ASC NULLS LAST) in the patch, but we might need to support all of them when we export the functionality. Then, we would need 2 bits for IS_SORTED flags. -- Itagaki Takahiro
2010/11/15 Itagaki Takahiro <itagaki.takahiro@gmail.com>: > On Mon, Nov 15, 2010 at 14:37, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Maybe can be useful to implement a searching on sorted array. >> You can hold a flag if multiset is sorted or not. > > Are you suggesting to add an IS_SORTED bit to for each ArrayType value? > It might be possible, but I'm not sure how much it is useful. It's depend on implementation. You can use this flag for large set and operation MEMBER OF. When you know, so array is sorted, then you can use a fast search. > > Another issue for sorting is that we have 4 kinds of sorting: ASC/DESC > and NULLS FIRST/LAST. I always used the default sort order (ASC NULLS LAST) > in the patch, but we might need to support all of them when we export the > functionality. Then, we would need 2 bits for IS_SORTED flags. > yes, probably Pavel > -- > Itagaki Takahiro >
On Monday 15 November 2010 09:47:15 Itagaki Takahiro wrote: > On Mon, Nov 15, 2010 at 14:37, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > Maybe can be useful to implement a searching on sorted array. > > You can hold a flag if multiset is sorted or not. > > Are you suggesting to add an IS_SORTED bit to for each ArrayType value? > It might be possible, but I'm not sure how much it is useful. I actually wished for that several times and solved it by putting up is_sorted() constraints everywhere, but its a pain and restricts generallity. (One was e.g. for implementing an efficent array_union() aggregate). Andres
Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > Another issue for sorting is that we have 4 kinds of sorting: ASC/DESC > and NULLS FIRST/LAST. We have a lot more kinds than that. See USING. regards, tom lane
On Mon, Nov 15, 2010 at 10:13:40AM -0500, Tom Lane wrote: > Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: > > Another issue for sorting is that we have 4 kinds of sorting: ASC/DESC > > and NULLS FIRST/LAST. > > We have a lot more kinds than that. See USING. USING pretty much gives us no chance of optimizing at all. Could we maybe see about optimizing the 99% case, which those two bits cover? Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
David Fetter <david@fetter.org> writes: > On Mon, Nov 15, 2010 at 10:13:40AM -0500, Tom Lane wrote: >> Itagaki Takahiro <itagaki.takahiro@gmail.com> writes: >>> Another issue for sorting is that we have 4 kinds of sorting: ASC/DESC >>> and NULLS FIRST/LAST. >> We have a lot more kinds than that. See USING. > USING pretty much gives us no chance of optimizing at all. Could we > maybe see about optimizing the 99% case, which those two bits cover? The question is why support more than *one* kind, if you're only supporting a subset. I don't see the value of messing with stuff like NULLS FIRST if you're not going to go all the way. What's more, the alleged use for this is strictly as an internal optimization in multiset representation, so there's no reason to support more than one sort order. regards, tom lane
On fre, 2010-11-12 at 09:44 -0500, Tom Lane wrote: > But I'm still not convinced that this feature is useful enough to > justify the implementation effort. AFAICS there's nothing here that > you couldn't get with some non-default operators on regular arrays, Unique constraints would behave differently for arrays and multisets. But I suppose you could get something similar with exclusion constraints nowadays. But not for primary keys. Foreign keys also don't work easily with nondefault operators. JOIN / USING doesn't work. ORDER BY would work but look ugly. And such coding where you'd have to remember the nondefault operator everywhere would also be error prone.
Peter Eisentraut <peter_e@gmx.net> writes: > On fre, 2010-11-12 at 09:44 -0500, Tom Lane wrote: >> But I'm still not convinced that this feature is useful enough to >> justify the implementation effort. AFAICS there's nothing here that >> you couldn't get with some non-default operators on regular arrays, > Unique constraints would behave differently for arrays and multisets. > But I suppose you could get something similar with exclusion constraints > nowadays. But not for primary keys. > Foreign keys also don't work easily with nondefault operators. > JOIN / USING doesn't work. > ORDER BY would work but look ugly. And such coding where you'd have to > remember the nondefault operator everywhere would also be error prone. Hmm. Those are perhaps good arguments --- but you can't get any of those behaviors unless multisets are distinct types. The implementation proposed upthread where they're arrays with a special typmod isn't going to support this. regards, tom lane
On Mon, Nov 15, 2010 at 14:37, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> BTW, some of the intermediate products to implement those features might >> be useful if exported. like array_sort() and array_unique(). If there is >> demand, I'll add those functions, too. > > I has not a standard, so I can't to speak about conformance with > standard, but I must to say, so these functionality should be > extremely useful for plpgsql programming. You can see samples a some > functions on the net for array sorting, for reduce a redundant values > via SQL language. I am sure, so implementation in C will be much > faster. Here is a new WIP patch to support MULTISET functions. It includes no docs yet, but I'll work for it if the design is accepted. I added some non-standard functions: - set() : same as DISTINCT for rows - array_sort() : same as ORDER BY ASC NULLS LAST for rows - array_flatten() : flatten an array into one-dimensional one I chose set() for the function name because of compatibility with Oracle. The name also has been proposed in the working draft. SUBMULTISET OF now uses own submultiset_of() function instead of <@ operator because their behaviors are different at all. For example, - ARRAY[1, 1] <@ ARRAY[1] ==> true - ARRAY[1, 1] SUBMULTISET OF ARRAY[1] ==> false I removed element() function because it is not in the standard though it has been in the working draft. I think the function is not so useful because it is almost same as "array[1]". I need help for the syntax of MEMBER OF and SUBMULTISET OF. OFs in them are omittable in the standard, but I cannot resolve shift/reduce errors by bison for syntax. I added a FIXME comment in gram.y. Suggestions welcome. > Maybe can be useful to implement a searching on sorted array. > You can hold a flag if multiset is sorted or not. I've not added "is sorted" bits into array types yet. It might improve performance, but it should come separately from the patch. I think it's not so simple task because user might modify comparison operators after they store arrays with sorted bits on disks; those arrays have sorted bits, but are not sorted actually with the latest operators. === New features in ver.20101118 === - [FUNCTION] cardinality(anyarray) => integer - [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray - [FUNCTION] array_flatten(anyarray) => anyarray - [FUNCTION] array_sort(anyarray) => anyarray - [FUNCTION] set(anyarray) => anyarray - [SYNTAX] $1 IS [NOT] A SET => boolean - [SYNTAX] $1 [NOT] MEMBER OF $2 => boolean - [SYNTAX] $1 [NOT] SUBMULTISET OF $2 => boolean - [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2 => anyarray - [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $22 => anyarray - [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $22 => anyarray - [AGGREGATE] collect(anyelement) => anyarray - [AGGREGATE] fusion(anyarray) => anyarray - [AGGREGATE] intersection(anyarray) => anyarray === New unreserved keywords === A, MEMBER, MULTISET, SUBMULTISET -- Itagaki Takahiro