Обсуждение: Aggregate "rollup"

Поиск
Список
Период
Сортировка

Aggregate "rollup"

От
mlw
Дата:
I had written a piece of code about two years ago that used the 
aggregate feature of PostgreSQL to create an array of integers from an 
aggregate, as:

select int_array_aggregate( column ) from table group by column

While it seems pointless to create an array on a select, it has a 
purpose in OLAP. For instance, suppose you do this:

create table fast_lookup as select reference, 
int_array_aggregate(result) from table group by result

The "fast_lookup" table now has all the result entries as an array in a 
single row. In the systems that I have used this, it has provided a 
dramatic improvement, especially when you have a high number of 
identical "reference" entries in a classic "one to many" table.

The question is, would a more comprehensive solution be wanted? 
Possible? Something like:

create table fast_lookup as select reference, aggregate_array( field ) 
from table group by field

Where the function aggregate_array takes any number of data types.

Any thoughts? I think I need to fix the code in the current 
/contrib/intagg anyway, so is it worth doing the extra work to included 
multiple data types?



Re: Aggregate "rollup"

От
Joe Conway
Дата:
mlw wrote:
> I had written a piece of code about two years ago that used the 
> aggregate feature of PostgreSQL to create an array of integers from an 
> aggregate, as:
> 
> select int_array_aggregate( column ) from table group by column
> 
> While it seems pointless to create an array on a select, it has a 
> purpose in OLAP. For instance, suppose you do this:
> 
> create table fast_lookup as select reference, 
> int_array_aggregate(result) from table group by result
> 
> The "fast_lookup" table now has all the result entries as an array in a 
> single row. In the systems that I have used this, it has provided a 
> dramatic improvement, especially when you have a high number of 
> identical "reference" entries in a classic "one to many" table.
> 
> The question is, would a more comprehensive solution be wanted? 
> Possible? Something like:
> 
> create table fast_lookup as select reference, aggregate_array( field ) 
> from table group by field
> 
> Where the function aggregate_array takes any number of data types.
> 
> Any thoughts? I think I need to fix the code in the current 
> /contrib/intagg anyway, so is it worth doing the extra work to included 
> multiple data types?

It's also useful in conjunction with statistically processing. There is 
a array_accum function in PL/R; I just made a post to the SQL list the 
other day on this.
(http://archives.postgresql.org/pgsql-sql/2003-03/msg00124.php)
Here's the meat of it:

CREATE OR REPLACE FUNCTION array_accum (_name, name)
RETURNS name[]
AS '$libdir/plr','array_accum'
LANGUAGE 'C';

CREATE AGGREGATE accumulate (  sfunc = array_accum,  basetype = name,  stype = _name
);

regression=# SELECT accumulate(tablename) as cruft FROM pg_tables WHERE 
tablename LIKE 'c%';
cruft
---------------------------------------
{connectby_int,connectby_text,ct,cth}
(1 row)

See:  http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html
and download at:  http://www.joeconway.com/plr/

I'd be happy to split the array functions out of PL/R and sumbit them to 
PATCHES if there is any interest.

Joe



Re: Aggregate "rollup"

От
Greg Stark
Дата:
mlw <pgsql@mohawksoft.com> writes:

> I had written a piece of code about two years ago that used the aggregate
> feature of PostgreSQL to create an array of integers from an aggregate, as:
> 
> select int_array_aggregate( column ) from table group by column

I found this and am using it extensively. It's extremely helpful, thank you.

It goes well with either the *= operators in contrib/array or the gist
indexing in contrib/intarray.

One problem I've found though is that the optimizer doesn't have any good
statistics for estimating the number of matches of such operators. It seems
like fixing that would require a lot of changes to the statistics gathered.

> create table fast_lookup as select reference, aggregate_array( field ) from
> table group by field
> 
> Where the function aggregate_array takes any number of data types.

Sure, that seems logical. Actually I already bumped into a situation where I
wanted an array of char(1). I just kludged it to use ascii() of that first
character, but it would be cleaner and perhaps better for unicode later to use
the actual character.

Someone else on the list already asked for an function that gave an array of
varchar. I think they were pointed at a general purpose function from plr.

--
greg



Re: Aggregate "rollup"

От
"Merlin Moncure"
Дата:
> -----Original Message-----
> From: mlw [mailto:pgsql@mohawksoft.com]
> Sent: Wednesday, March 05, 2003 3:47 PM
> To: pgsql-hackers@postgresql.org
> Subject: [HACKERS] Aggregate "rollup"
>
> I had written a piece of code about two years ago that used the
> aggregate feature of PostgreSQL to create an array of integers from an
> aggregate, as:
>
> select int_array_aggregate( column ) from table group by column
>

Do I understand correctly that this still follows the normal rules for
grouping, so that only like values are put in the array?

Example: column has values 1,1,1,2,2 spread over 5 rows.
Your query returns two rows with row1={1,1,1} and row2 = {2,2}...is this
correct?

Also, what if your aggregate column is different from the group column:
Table t with columns c1, c2 with 5 rows:
C1 C2
1, 1
1, 2
1, 3
2, 1
2, 2

Does select C1, int_array_aggregate( C2 ) from table group by C1 return

1, {1, 2, 3}
2, {1, 2}
??

FWIW, I think that's a pretty cool function.  This allows the backend to
telescope 1 dimension (only) out of a dataset, the most detailed one.
In certain situations with large datasets over slow connections, this
could be a big payoff.

Also, all this talk about XML has got me thinking about how to allow
basic query features to provide simple nesting services.  consider:

select C1, C2 from t for xml;  returns:
<t>
<C1>1</C1><C2>1</C2>
<C1>1</C1><C2>2</C2>
<C1>1</C1><C2>3</C2>
<C1>2</C1><C2>1</C2>
<C1>2</C1><C2>2</C2>
</t>
select C1, xml_aggregate(C2) from t for xml; returns:
<t>
<C1 value="1"><C2>1</C2><C2>2</C2><C2>3</C2></C1>
<C1 value="2"><C2>1</C2><C2>2</C2><C2>3</C2></C1>
</t>

> create table fast_lookup as select reference,
> int_array_aggregate(result) from table group by result
>
> The question is, would a more comprehensive solution be wanted?
> Possible? Something like:
>
>
> Any thoughts? I think I need to fix the code in the current
> /contrib/intagg anyway, so is it worth doing the extra work to
included
> multiple data types?

Yes.

Just a thought.
Merlin



Re: Aggregate "rollup"

От
mlw
Дата:

Merlin Moncure wrote:

>>-----Original Message-----
>>From: mlw [mailto:pgsql@mohawksoft.com]
>>Sent: Wednesday, March 05, 2003 3:47 PM
>>To: pgsql-hackers@postgresql.org
>>Subject: [HACKERS] Aggregate "rollup"
>>
>>I had written a piece of code about two years ago that used the
>>aggregate feature of PostgreSQL to create an array of integers from an
>>aggregate, as:
>>
>>select int_array_aggregate( column ) from table group by column
>>
>>    
>>
>
>Do I understand correctly that this still follows the normal rules for
>grouping, so that only like values are put in the array?
>
>Example: column has values 1,1,1,2,2 spread over 5 rows.
>Your query returns two rows with row1={1,1,1} and row2 = {2,2}...is this
>correct?
>
Actually, it is more intended to put all the entries in a "one to many" 
table in a single column, as:

create table classic_one_to_many
(   leftside    integer,   rightside    integer
);

create table fast_lookup as select leftside, 
int_array_aggregate(rightside) from classic_one_to_many group by leftside;