Обсуждение: MULTISET and additional functions for ARRAY

Поиск

Список

Период

Сортировка

MULTISET and additional functions for ARRAY

От

Itagaki Takahiro

Дата:

11 ноября 2010 г., 14:03:01

Postgres supports ARRAY data types well, but there are some
more array functions in the SQL standard. Also, the standard
has MULTISET data type, that is an unordered array.

It looks easy to support additional array functions. There
might be some confusion to treat multi-dimensional arrays
with them, but we could treat all arrays as one-dimensional
like as unnest().

MULTISET supports are more difficult. We have corresponding
type IDs for each array, but we might not want to add additional
IDs for multiset for each type. Any ideas for the issue?

If we reuse type IDs of arrays for multisets, the multisets would
have some special typmod. For example, typmod = 0 means multiset,
and positive value means array with max cardinality. Note that
the SQL standard doesn't mention about multi-dimensional arrays.
So, we can use typmod = -1 as a free-size and free-dimensional
array for backward compatibility.

If we have troublesome issues to support multiset data types,
I'm thinking to add multiset functions that receives ARRAY
types instead at time first time, because an ARRAY is a
MULTISET by definition. Some of functions for multisets
seems to be useful for arrays, too.

Comments and suggestions welcome.

=== Array functions ===
- [FUNCTION] cardinality(anyarray) => integer
- [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray

=== Multiset functions ===
- [FUNCTION] cardinality(anymultiset) => integer
- [FUNCTION] element(anymultiset) => anyelement
- [FUNCTION] multiset_member_of(anymultiset, anyelement) => boolean   [SYNTAX] $2 MEMBER OF $1
- [FUNCTION] multiset_is_a_set(anymultiset) => boolean   [SYNTAX] $1 IS A SET
- [FUNCTION] multiset_sub_multiset_of(anymultiset, anymultiset) => boolean   [SYNTAX] $2 SUB MULTISET OF $1
- [FUNCTION] multiset_union[_all](anymultiset, anymultiset) => anymultiset   [SYNTAX] $1 MULTISET UNION [ALL |
DISTINCT]$2
 
- [FUNCTION] multiset_intersect[_all](anymultiset, anymultiset) => anymultiset   [SYNTAX] $1 MULTISET INTERSECT [ALL |
DISTINCT]$2
 
- [FUNCTION] multiset_except[_all](anymultiset, anymultiset) => anymultiset   [SYNTAX] $1 MULTISET EXCEPT [ALL |
DISTINCT]$2
 
- [AGGREGATE] collect(anyelement) => anymultiset
- [AGGREGATE] fusion(anymultiset) => anymultiset
- [AGGREGATE] intersection(anymultiset) => anymultiset


See also secondary sources.
http://waelchatila.com/2005/05/18/1116485743467.html
http://farrago.sourceforge.net/design/CollectionTypes.html
http://publib.boulder.ibm.com/infocenter/db2luw/v9r8/index.jsp?topic=/com.ibm.db2.luw.apdv.sqlpl.doc/doc/t0053486.html
http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/conditions006.htm
http://download.oracle.com/docs/cd/B28359_01/server.111/b28286/operators006.htm

-- 
Itagaki Takahiro

Re: MULTISET and additional functions for ARRAY

От

"David E. Wheeler"

Дата:

11 ноября 2010 г., 16:20:27

On Nov 11, 2010, at 7:02 AM, Itagaki Takahiro wrote:

> MULTISET supports are more difficult. We have corresponding
> type IDs for each array, but we might not want to add additional
> IDs for multiset for each type. Any ideas for the issue?

Why not?

> If we reuse type IDs of arrays for multisets, the multisets would
> have some special typmod. For example, typmod = 0 means multiset,
> and positive value means array with max cardinality. Note that
> the SQL standard doesn't mention about multi-dimensional arrays.
> So, we can use typmod = -1 as a free-size and free-dimensional
> array for backward compatibility.
>
> If we have troublesome issues to support multiset data types,
> I'm thinking to add multiset functions that receives ARRAY
> types instead at time first time, because an ARRAY is a
> MULTISET by definition.

An array is a superset of MULTISET, I guess?

> Some of functions for multisets seems to be useful for arrays, too.
>
> Comments and suggestions welcome.

So are you planning to implement multisets? It's a feature I'd love to see…

Best,

David

Re: MULTISET and additional functions for ARRAY

От

Tom Lane

Дата:

11 ноября 2010 г., 17:05:59

"David E. Wheeler" <david@kineticode.com> writes:
> So are you planning to implement multisets? It's a feature I'd love to see

What actual functionality does it buy?  AFAICT from Itagaki-san's
description, it's an array only you ignore the specific element order.
So what?  You can write functions that work that way now.
        regards, tom lane

Re: MULTISET and additional functions for ARRAY

От

"David E. Wheeler"

Дата:

11 ноября 2010 г., 17:11:33

On Nov 11, 2010, at 10:05 AM, Tom Lane wrote:

>> So are you planning to implement multisets? It's a feature I'd love to see
> 
> What actual functionality does it buy?  AFAICT from Itagaki-san's
> description, it's an array only you ignore the specific element order.
> So what?  You can write functions that work that way now.

Also, no dupes.

David

Re: MULTISET and additional functions for ARRAY

От

Darren Duncan

Дата:

11 ноября 2010 г., 17:19:19

I think that it would be best to implement MULTISET in the same way that a TABLE 
is implemented.  Logically and structurally they are the same thing, but that a 
MULTISET typically is used as a field value of a table row.  Aka, a table and a 
multiset are just different names for a relation, loosely speaking.

The association of a multiset-typed attribute of a table with said table is like 
the association of a child and parent table in a many-to-one.

So reuse your structure for tables to hold multisets.

-- Darren Duncan

Itagaki Takahiro wrote:
> Postgres supports ARRAY data types well, but there are some
> more array functions in the SQL standard. Also, the standard
> has MULTISET data type, that is an unordered array.
> 
> It looks easy to support additional array functions. There
> might be some confusion to treat multi-dimensional arrays
> with them, but we could treat all arrays as one-dimensional
> like as unnest().
> 
> MULTISET supports are more difficult. We have corresponding
> type IDs for each array, but we might not want to add additional
> IDs for multiset for each type. Any ideas for the issue?
> 
> If we reuse type IDs of arrays for multisets, the multisets would
> have some special typmod. For example, typmod = 0 means multiset,
> and positive value means array with max cardinality. Note that
> the SQL standard doesn't mention about multi-dimensional arrays.
> So, we can use typmod = -1 as a free-size and free-dimensional
> array for backward compatibility.
> 
> If we have troublesome issues to support multiset data types,
> I'm thinking to add multiset functions that receives ARRAY
> types instead at time first time, because an ARRAY is a
> MULTISET by definition. Some of functions for multisets
> seems to be useful for arrays, too.
> 
> Comments and suggestions welcome.

Re: MULTISET and additional functions for ARRAY

От

Nicolas Barbier

Дата:

11 ноября 2010 г., 17:24:08

2010/11/11 David E. Wheeler <david@kineticode.com>:

> On Nov 11, 2010, at 10:05 AM, Tom Lane wrote:
>
>>> So are you planning to implement multisets? It's a feature I'd love to see
>>
>> What actual functionality does it buy?  AFAICT from Itagaki-san's
>> description, it's an array only you ignore the specific element order.
>> So what?  You can write functions that work that way now.
>
> Also, no dupes.

The "multi" in multiset indicates that duplicate elements are
explicitly allowed and tracked.

Nicolas

Re: MULTISET and additional functions for ARRAY

От

"David E. Wheeler"

Дата:

11 ноября 2010 г., 17:46:04

On Nov 11, 2010, at 10:19 AM, Darren Duncan wrote:

> I think that it would be best to implement MULTISET in the same way that a TABLE is implemented. Logically and
structurallythey are the same thing, but that a MULTISET typically is used as a field value of a table row.  Aka, a
tableand a multiset are just different names for a relation, loosely speaking. 

That sounds like a composite type to me.

Best,

David

Re: MULTISET and additional functions for ARRAY

От

"David E. Wheeler"

Дата:

11 ноября 2010 г., 17:46:20

On Nov 11, 2010, at 10:24 AM, Nicolas Barbier wrote:

>> Also, no dupes.
> 
> The "multi" in multiset indicates that duplicate elements are
> explicitly allowed and tracked.

D'oh! Right.

D

Re: MULTISET and additional functions for ARRAY

От

Alvaro Herrera

Дата:

11 ноября 2010 г., 19:08:20

Excerpts from David E. Wheeler's message of jue nov 11 15:45:55 -0300 2010:
> On Nov 11, 2010, at 10:19 AM, Darren Duncan wrote:
> 
> > I think that it would be best to implement MULTISET in the same way that a TABLE is implemented. Logically and
structurallythey are the same thing, but that a MULTISET typically is used as a field value of a table row.  Aka, a
tableand a multiset are just different names for a relation, loosely speaking.
 
> 
> That sounds like a composite type to me.

No, it's "perpendicular" in the sense that while a composite type allows
you to have different columns, this multiset thing lets you have "rows"
(I initially thought about them as sets of scalars, but AFAIU they could
in turn be rows)

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: MULTISET and additional functions for ARRAY

От

"David E. Wheeler"

Дата:

11 ноября 2010 г., 19:43:07

On Nov 11, 2010, at 12:08 PM, Alvaro Herrera wrote:

>> That sounds like a composite type to me.
> 
> No, it's "perpendicular" in the sense that while a composite type allows
> you to have different columns, this multiset thing lets you have "rows"
> (I initially thought about them as sets of scalars, but AFAIU they could
> in turn be rows)

How is that different from an array of RECORDs?

Best,

David

Re: MULTISET and additional functions for ARRAY

От

Merlin Moncure

Дата:

11 ноября 2010 г., 19:46:51

On Thu, Nov 11, 2010 at 3:42 PM, David E. Wheeler <david@kineticode.com> wrote:
> On Nov 11, 2010, at 12:08 PM, Alvaro Herrera wrote:
>
>>> That sounds like a composite type to me.
>>
>> No, it's "perpendicular" in the sense that while a composite type allows
>> you to have different columns, this multiset thing lets you have "rows"
>> (I initially thought about them as sets of scalars, but AFAIU they could
>> in turn be rows)
>
> How is that different from an array of RECORDs?

not a whole lot outside of syntax details...there is a decent summary
here: http://waelchatila.com/2005/05/18/1116485743467.html

I like this part: "Alternatively the SQL standard also permits the
same construct with the bracket trigraphs ??( and ??)"  :-D

merlin

Re: MULTISET and additional functions for ARRAY

От

Darren Duncan

Дата:

11 ноября 2010 г., 20:06:41

Merlin Moncure wrote:
> On Thu, Nov 11, 2010 at 3:42 PM, David E. Wheeler <david@kineticode.com> wrote:
>> On Nov 11, 2010, at 12:08 PM, Alvaro Herrera wrote:
>>
>>>> That sounds like a composite type to me.
>>> No, it's "perpendicular" in the sense that while a composite type allows
>>> you to have different columns, this multiset thing lets you have "rows"
>>> (I initially thought about them as sets of scalars, but AFAIU they could
>>> in turn be rows)
>> How is that different from an array of RECORDs?

I could ask the same question about a TABLE, the ordering issue aside.

This is one place that SQL made things more complicated than they needed to be.  Multisets have generally the same
structure*and* operators (union, etc) as 

tables, but they use different syntax for each.  A better design would be to 
make tables and multisets interchangeable.  Its an unnecessary distinction.

> not a whole lot outside of syntax details...there is a decent summary
> here: http://waelchatila.com/2005/05/18/1116485743467.html
> 
> I like this part: "Alternatively the SQL standard also permits the
> same construct with the bracket trigraphs ??( and ??)"  :-D

As I recall, the concept of using stuff like ?( or ?) etc was so that SQL could 
be written in EBCDIC which natively lacks some of the bracketing characters that 
ASCII has.  Hence, such is an alternative way to spell either { } or [ ] (I 
forget which).

-- Darren Duncan

Re: MULTISET and additional functions for ARRAY

От

Itagaki Takahiro

Дата:

12 ноября 2010 г., 00:27:27

On Fri, Nov 12, 2010 at 03:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "David E. Wheeler" <david@kineticode.com> writes:
>> So are you planning to implement multisets? It's a feature I'd love to see
>
> What actual functionality does it buy?  AFAICT from Itagaki-san's
> description, it's an array only you ignore the specific element order.
> So what?  You can write functions that work that way now.

I think there are almost no difference between a multiset and an array
in terms of functions I described in the first mail.

However, if we have separated multiset data type, we could have special
comparison operators for them; "array = array" returns true only if they
have the same elements in the same order, but "multiset = multiset" only
checks elements in them. Also, we could optimize on-disk structure of
multiset for fast UNION operations or for dataset that has many duplicates.
For example, we could use a sorted array of {value, count} pairs.

If we decide to have data type IDs for multiset, I'll go for it (ex. int4,
_int4, and an additional $int4), but it consumes +50% of typoids. If it
is not preferable, only function support might be better at the first try.

--
Itagaki Takahiro

Re: MULTISET and additional functions for ARRAY

От

Itagaki Takahiro

Дата:

12 ноября 2010 г., 00:36:42

On Fri, Nov 12, 2010 at 06:06, Darren Duncan <darren@darrenduncan.net> wrote:
> This is one place that SQL made things more complicated than they needed to
> be.  Multisets have generally the same structure *and* operators (union,
> etc) as tables, but they use different syntax for each.  A better design
> would be to make tables and multisets interchangeable.  Its an unnecessary
> distinction.

We can use unnest() to convert MULTISET into TABLE, and collect() agg
function from TABLE to MULTISET. I don't think they need to have the
same on-disk structure; they can share operators and constructor syntax
even now.

--
Itagaki Takahiro

Re: MULTISET and additional functions for ARRAY

От

Robert Haas

Дата:

12 ноября 2010 г., 03:53:33

On Thu, Nov 11, 2010 at 10:02 AM, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
> If we reuse type IDs of arrays for multisets, the multisets would
> have some special typmod. For example, typmod = 0 means multiset,
> and positive value means array with max cardinality. Note that
> the SQL standard doesn't mention about multi-dimensional arrays.
> So, we can use typmod = -1 as a free-size and free-dimensional
> array for backward compatibility.

I would really like to see us fix our type system so that it doesn't
require this type of awful hack.  But maybe that's asking too much of
a patch to implement this feature.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: MULTISET and additional functions for ARRAY

От

Tom Lane

Дата:

12 ноября 2010 г., 04:21:19

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Nov 11, 2010 at 10:02 AM, Itagaki Takahiro
> <itagaki.takahiro@gmail.com> wrote:
>> If we reuse type IDs of arrays for multisets, the multisets would
>> have some special typmod. For example, typmod = 0 means multiset,
>> and positive value means array with max cardinality. Note that
>> the SQL standard doesn't mention about multi-dimensional arrays.
>> So, we can use typmod = -1 as a free-size and free-dimensional
>> array for backward compatibility.

> I would really like to see us fix our type system so that it doesn't
> require this type of awful hack.  But maybe that's asking too much of
> a patch to implement this feature.

The problem is not with the type system: as long as you give multisets
different type OIDs from arrays, everything will work fine.  It will
absolutely not work to try to use typmod to make the behavior vary
like that ... but Itagaki-san knew that already.
        regards, tom lane

Re: MULTISET and additional functions for ARRAY

От

Robert Haas

Дата:

12 ноября 2010 г., 04:53:18

On Fri, Nov 12, 2010 at 12:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Nov 11, 2010 at 10:02 AM, Itagaki Takahiro
>> <itagaki.takahiro@gmail.com> wrote:
>>> If we reuse type IDs of arrays for multisets, the multisets would
>>> have some special typmod. For example, typmod = 0 means multiset,
>>> and positive value means array with max cardinality. Note that
>>> the SQL standard doesn't mention about multi-dimensional arrays.
>>> So, we can use typmod = -1 as a free-size and free-dimensional
>>> array for backward compatibility.
>
>> I would really like to see us fix our type system so that it doesn't
>> require this type of awful hack.  But maybe that's asking too much of
>> a patch to implement this feature.
>
> The problem is not with the type system: as long as you give multisets
> different type OIDs from arrays, everything will work fine.  It will
> absolutely not work to try to use typmod to make the behavior vary
> like that ... but Itagaki-san knew that already.

And thus you must create a THIRD copy of every entry in pg_type.  That
doesn't qualify as a problem?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: MULTISET and additional functions for ARRAY

От

Tom Lane

Дата:

12 ноября 2010 г., 13:44:43

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Nov 12, 2010 at 12:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The problem is not with the type system: as long as you give multisets
>> different type OIDs from arrays, everything will work fine.

> And thus you must create a THIRD copy of every entry in pg_type.  That
> doesn't qualify as a problem?

[ shrug... ]  It's less of a problem than the possible alternatives.
IMO anyway.  OIDs are cheap ... replacing OIDs with some sort of
ill-specified composite key throughout the system is not.

But I'm still not convinced that this feature is useful enough to
justify the implementation effort.  AFAICS there's nothing here that
you couldn't get with some non-default operators on regular arrays,
with orders of magnitude less work and less impact on the rest of the
system.  The only reason to consider implementing it as a separate type
category is the SQL committee decided to invent some syntax --- and
given their lousy taste in syntax, I get less enthused every year about
duplicating it.
        regards, tom lane

Re: MULTISET and additional functions for ARRAY

От

David Fetter

Дата:

12 ноября 2010 г., 15:41:52

On Fri, Nov 12, 2010 at 12:53:09AM -0500, Robert Haas wrote:
> On Fri, Nov 12, 2010 at 12:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> On Thu, Nov 11, 2010 at 10:02 AM, Itagaki Takahiro
> >> <itagaki.takahiro@gmail.com> wrote:
> >>> If we reuse type IDs of arrays for multisets, the multisets would
> >>> have some special typmod. For example, typmod = 0 means multiset,
> >>> and positive value means array with max cardinality. Note that
> >>> the SQL standard doesn't mention about multi-dimensional arrays.
> >>> So, we can use typmod = -1 as a free-size and free-dimensional
> >>> array for backward compatibility.
> >
> >> I would really like to see us fix our type system so that it doesn't
> >> require this type of awful hack.  But maybe that's asking too much of
> >> a patch to implement this feature.
> >
> > The problem is not with the type system: as long as you give multisets
> > different type OIDs from arrays, everything will work fine.  It will
> > absolutely not work to try to use typmod to make the behavior vary
> > like that ... but Itagaki-san knew that already.
> 
> And thus you must create a THIRD copy of every entry in pg_type.  That
> doesn't qualify as a problem?

Yes, and I've started a separate thread on this along with a page on
the wiki.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: MULTISET and additional functions for ARRAY

От

Itagaki Takahiro

Дата:

15 ноября 2010 г., 01:55:51

On Fri, Nov 12, 2010 at 00:02, Itagaki Takahiro
<itagaki.takahiro@gmail.com> wrote:
> Postgres supports ARRAY data types well, but there are some
> more array functions in the SQL standard. Also, the standard
> has MULTISET data type, that is an unordered array.

Here is a WIP patch for multiset function supports. Note that multiset
types are not added in the patch; it just adds functions and syntax.
Arguments or result types of those functions are anyarray rather than
anymultiset. The result type is always flatten into on-dimensional
array because some functions requires per-element operations; I'm not
sure how we should treat trim_array( 2D 3x3 array, 2 elements ). So,
it is treated as trim_array( 9 elements array, 2 elements ) in the patch.

The SQL standard defines special syntax for multiset. I added four
unreserved keywords for them; A, MEMBER, MULTISET, and SUB.
(I don't like such ad-hoc syntax, but it is the standard...)
Some of the new expressions are just syntactic sugar for existing
other expressions or new functions. For example, "$1 MEMBER OF $2" is
expanded to "$1 = ANY ($2)" and "$1 IS A SET" to "multiset_is_a_set($1)".

I have not researched the spec yet enough, especially NULLs in collections.
I'll continue to check the details.

BTW, some of the intermediate products to implement those features might
be useful if exported. like array_sort() and array_unique(). If there is
demand, I'll add those functions, too.

Any comments for the approach or detailed features?

=== New features ===
- [FUNCTION] cardinality(anyarray) => integer
- [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray
- [FUNCTION] element(anyarray) => anyelement
- [SYNTAX] $1 MEMBER OF $2 --> $1 = ANY ($2)
- [SYNTAX] $1 SUB MULTISET OF $2 --> $1 <@ $2
- [SYNTAX] $1 IS A SET  --> multiset_is_a_set($1)
- [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2 -->
multiset_union($1, $2, all?)
- [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $2 -->
multiset_intersect($1, $2, all?)
- [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $2 -->
multiset_except($1, $2, all?)
- [AGGREGATE] collect(anyelement) => anyarray --> same as array_agg()
- [AGGREGATE] fusion(anyarray) => anyarray
- [AGGREGATE] intersection(anyarray) => anyarray

--
Itagaki Takahiro

Вложения

multiset-20101115.patch

Re: MULTISET and additional functions for ARRAY

От

Pavel Stehule

Дата:

15 ноября 2010 г., 04:38:34

Hello

2010/11/15 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
> On Fri, Nov 12, 2010 at 00:02, Itagaki Takahiro
> <itagaki.takahiro@gmail.com> wrote:
>> Postgres supports ARRAY data types well, but there are some
>> more array functions in the SQL standard. Also, the standard
>> has MULTISET data type, that is an unordered array.
>
> Here is a WIP patch for multiset function supports. Note that multiset
> types are not added in the patch; it just adds functions and syntax.
> Arguments or result types of those functions are anyarray rather than
> anymultiset. The result type is always flatten into on-dimensional
> array because some functions requires per-element operations; I'm not
> sure how we should treat trim_array( 2D 3x3 array, 2 elements ). So,
> it is treated as trim_array( 9 elements array, 2 elements ) in the patch.
>
> The SQL standard defines special syntax for multiset. I added four
> unreserved keywords for them; A, MEMBER, MULTISET, and SUB.
> (I don't like such ad-hoc syntax, but it is the standard...)
> Some of the new expressions are just syntactic sugar for existing
> other expressions or new functions. For example, "$1 MEMBER OF $2" is
> expanded to "$1 = ANY ($2)" and "$1 IS A SET" to "multiset_is_a_set($1)".
>
> I have not researched the spec yet enough, especially NULLs in collections.
> I'll continue to check the details.
>
> BTW, some of the intermediate products to implement those features might
> be useful if exported. like array_sort() and array_unique(). If there is
> demand, I'll add those functions, too.
>
> Any comments for the approach or detailed features?
>

I has not a standard, so I can't to speak about conformance with
standard, but I must to say, so these functionality should be
extremely useful for plpgsql programming. You can see samples a some
functions on the net for array sorting, for reduce a redundant values
via SQL language. I am sure, so implementation in C will be much
faster. Maybe can be useful to implement a searching on sorted array.
You can hold a flag if multiset is sorted or not.

Regards

Pavel Stehule

> === New features ===
> - [FUNCTION] cardinality(anyarray) => integer
> - [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray
> - [FUNCTION] element(anyarray) => anyelement
> - [SYNTAX] $1 MEMBER OF $2 --> $1 = ANY ($2)
> - [SYNTAX] $1 SUB MULTISET OF $2 --> $1 <@ $2
> - [SYNTAX] $1 IS A SET  --> multiset_is_a_set($1)
> - [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2 -->
> multiset_union($1, $2, all?)
> - [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $2 -->
> multiset_intersect($1, $2, all?)
> - [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $2 -->
> multiset_except($1, $2, all?)
> - [AGGREGATE] collect(anyelement) => anyarray --> same as array_agg()
> - [AGGREGATE] fusion(anyarray) => anyarray
> - [AGGREGATE] intersection(anyarray) => anyarray
>
> --
> Itagaki Takahiro
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Re: MULTISET and additional functions for ARRAY

От

Itagaki Takahiro

Дата:

15 ноября 2010 г., 07:47:24

On Mon, Nov 15, 2010 at 14:37, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Maybe can be useful to implement a searching on sorted array.
> You can hold a flag if multiset is sorted or not.

Are you suggesting to add an IS_SORTED bit to for each ArrayType value?
It might be possible, but I'm not sure how much it is useful.

Another issue for sorting is that we have 4 kinds of sorting: ASC/DESC
and NULLS FIRST/LAST. I always used the default sort order (ASC NULLS LAST)
in the patch, but we might need to support all of them when we export the
functionality. Then, we would need 2 bits for IS_SORTED flags.

-- 
Itagaki Takahiro

Re: MULTISET and additional functions for ARRAY

От

Pavel Stehule

Дата:

15 ноября 2010 г., 08:35:31

2010/11/15 Itagaki Takahiro <itagaki.takahiro@gmail.com>:
> On Mon, Nov 15, 2010 at 14:37, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> Maybe can be useful to implement a searching on sorted array.
>> You can hold a flag if multiset is sorted or not.
>
> Are you suggesting to add an IS_SORTED bit to for each ArrayType value?
> It might be possible, but I'm not sure how much it is useful.

It's depend on implementation. You can use this flag for large set and
operation MEMBER OF. When you know, so array is sorted, then you can
use a fast search.

>
> Another issue for sorting is that we have 4 kinds of sorting: ASC/DESC
> and NULLS FIRST/LAST. I always used the default sort order (ASC NULLS LAST)
> in the patch, but we might need to support all of them when we export the
> functionality. Then, we would need 2 bits for IS_SORTED flags.
>

yes, probably

Pavel

> --
> Itagaki Takahiro
>

Re: MULTISET and additional functions for ARRAY

От

Andres Freund

Дата:

15 ноября 2010 г., 08:58:36

On Monday 15 November 2010 09:47:15 Itagaki Takahiro wrote:
> On Mon, Nov 15, 2010 at 14:37, Pavel Stehule <pavel.stehule@gmail.com> 
wrote:
> > Maybe can be useful to implement a searching on sorted array.
> > You can hold a flag if multiset is sorted or not.
> 
> Are you suggesting to add an IS_SORTED bit to for each ArrayType value?
> It might be possible, but I'm not sure how much it is useful.
I actually wished for that several times and solved it by putting up 
is_sorted() constraints everywhere, but its a pain and restricts generallity.

(One was e.g. for implementing an efficent array_union() aggregate).

Andres

Re: MULTISET and additional functions for ARRAY

От

Tom Lane

Дата:

15 ноября 2010 г., 14:13:48

Itagaki Takahiro <itagaki.takahiro@gmail.com> writes:
> Another issue for sorting is that we have 4 kinds of sorting: ASC/DESC
> and NULLS FIRST/LAST.

We have a lot more kinds than that.  See USING.
        regards, tom lane

Re: MULTISET and additional functions for ARRAY

От

David Fetter

Дата:

15 ноября 2010 г., 18:14:46

On Mon, Nov 15, 2010 at 10:13:40AM -0500, Tom Lane wrote:
> Itagaki Takahiro <itagaki.takahiro@gmail.com> writes:
> > Another issue for sorting is that we have 4 kinds of sorting: ASC/DESC
> > and NULLS FIRST/LAST.
> 
> We have a lot more kinds than that.  See USING.

USING pretty much gives us no chance of optimizing at all.  Could we
maybe see about optimizing the 99% case, which those two bits cover?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: MULTISET and additional functions for ARRAY

От

Tom Lane

Дата:

15 ноября 2010 г., 18:38:46

David Fetter <david@fetter.org> writes:
> On Mon, Nov 15, 2010 at 10:13:40AM -0500, Tom Lane wrote:
>> Itagaki Takahiro <itagaki.takahiro@gmail.com> writes:
>>> Another issue for sorting is that we have 4 kinds of sorting: ASC/DESC
>>> and NULLS FIRST/LAST.

>> We have a lot more kinds than that.  See USING.

> USING pretty much gives us no chance of optimizing at all.  Could we
> maybe see about optimizing the 99% case, which those two bits cover?

The question is why support more than *one* kind, if you're only
supporting a subset.  I don't see the value of messing with stuff like
NULLS FIRST if you're not going to go all the way.  What's more, the
alleged use for this is strictly as an internal optimization in multiset
representation, so there's no reason to support more than one sort
order.
        regards, tom lane

Re: MULTISET and additional functions for ARRAY

От

Peter Eisentraut

Дата:

16 ноября 2010 г., 21:19:35

On fre, 2010-11-12 at 09:44 -0500, Tom Lane wrote:
> But I'm still not convinced that this feature is useful enough to
> justify the implementation effort.  AFAICS there's nothing here that
> you couldn't get with some non-default operators on regular arrays,

Unique constraints would behave differently for arrays and multisets.
But I suppose you could get something similar with exclusion constraints
nowadays.  But not for primary keys.

Foreign keys also don't work easily with nondefault operators.

JOIN / USING doesn't work.

ORDER BY would work but look ugly.  And such coding where you'd have to
remember the nondefault operator everywhere would also be error prone.

Re: MULTISET and additional functions for ARRAY

От

Tom Lane

Дата:

16 ноября 2010 г., 21:51:06

Peter Eisentraut <peter_e@gmx.net> writes:
> On fre, 2010-11-12 at 09:44 -0500, Tom Lane wrote:
>> But I'm still not convinced that this feature is useful enough to
>> justify the implementation effort.  AFAICS there's nothing here that
>> you couldn't get with some non-default operators on regular arrays,

> Unique constraints would behave differently for arrays and multisets.
> But I suppose you could get something similar with exclusion constraints
> nowadays.  But not for primary keys.

> Foreign keys also don't work easily with nondefault operators.

> JOIN / USING doesn't work.

> ORDER BY would work but look ugly.  And such coding where you'd have to
> remember the nondefault operator everywhere would also be error prone.

Hmm.  Those are perhaps good arguments --- but you can't get any of
those behaviors unless multisets are distinct types.  The implementation
proposed upthread where they're arrays with a special typmod isn't going
to support this.
        regards, tom lane

Re: MULTISET and additional functions for ARRAY

От

Itagaki Takahiro

Дата:

18 ноября 2010 г., 07:52:26

On Mon, Nov 15, 2010 at 14:37, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> BTW, some of the intermediate products to implement those features might
>> be useful if exported. like array_sort() and array_unique(). If there is
>> demand, I'll add those functions, too.
>
> I has not a standard, so I can't to speak about conformance with
> standard, but I must to say, so these functionality should be
> extremely useful for plpgsql programming. You can see samples a some
> functions on the net for array sorting, for reduce a redundant values
> via SQL language. I am sure, so implementation in C will be much
> faster.

Here is a new WIP patch to support MULTISET functions. It includes
no docs yet, but I'll work for it if the design is accepted.

I added some non-standard functions:
  - set() : same as DISTINCT for rows
  - array_sort() : same as ORDER BY ASC NULLS LAST for rows
  - array_flatten() : flatten an array into one-dimensional one
I chose set() for the function name because of compatibility with
Oracle. The name also has been proposed in the working draft.

SUBMULTISET OF now uses own submultiset_of() function instead
of <@ operator because their behaviors are different at all.
For example,
  - ARRAY[1, 1] <@ ARRAY[1] ==> true
  - ARRAY[1, 1] SUBMULTISET OF ARRAY[1] ==> false

I removed element() function because it is not in the standard
though it has been in the working draft. I think the function
is not so useful because it is almost same as "array[1]".

I need help for the syntax of MEMBER OF and SUBMULTISET OF.
OFs in them are omittable in the standard, but I cannot resolve
shift/reduce errors by bison for syntax. I added a FIXME comment
in gram.y. Suggestions welcome.

> Maybe can be useful to implement a searching on sorted array.
> You can hold a flag if multiset is sorted or not.

I've not added "is sorted" bits into array types yet. It might
improve performance, but it should come separately from the patch.
I think it's not so simple task because user might modify comparison
operators after they store arrays with sorted bits on disks; those
arrays have sorted bits, but are not sorted actually with the latest
operators.

=== New features in ver.20101118 ===
 - [FUNCTION] cardinality(anyarray) => integer
 - [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray
 - [FUNCTION] array_flatten(anyarray) => anyarray
 - [FUNCTION] array_sort(anyarray) => anyarray
 - [FUNCTION] set(anyarray) => anyarray
 - [SYNTAX] $1 IS [NOT] A SET => boolean
 - [SYNTAX] $1 [NOT] MEMBER OF $2 => boolean
 - [SYNTAX] $1 [NOT] SUBMULTISET OF $2 => boolean
 - [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2 => anyarray
 - [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $22 => anyarray
 - [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $22 => anyarray
 - [AGGREGATE] collect(anyelement) => anyarray
 - [AGGREGATE] fusion(anyarray) => anyarray
 - [AGGREGATE] intersection(anyarray) => anyarray

=== New unreserved keywords ===
A, MEMBER, MULTISET, SUBMULTISET

--
Itagaki Takahiro

Вложения

multiset-20101118.patch

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: MULTISET and additional functions for ARRAY

Вложения

Вложения