Обсуждение: Arrays ... need clarification....

Поиск
Список
Период
Сортировка

Arrays ... need clarification....

От
Medi Montaseri
Дата:
I can use some clarification on Array usage....

Given the following definition....

create table test ( name varchar(20) , grades integer[]);

How do I insert a new record, I see multiple ways of doing it but if one
does
not do this right, then updates will fail....

method-1:
insert into test ( name , grades) values ( 'user1', '{}');
select * from test where name = 'user1';
name | id
--------+--------
 user1   | {}
update test set grades[1] = 10 where name = 'user1';
ERROR:  Invalid array subscripts


Method-2
insert into test (name, grades) values ('user2', '{0}');
select * from test where name = 'user2';
 user2 | {0}
update test set grades[0] = 10 where name = 'user2';
// checking the result
 user2 | {10,0}
I thought Arrays are indexed from 1 (and not zero), but ....

update test set grades[1] = 20 where name = 'user2';
// check ing the output...
 user2 | {10,20}

Method-3:
insert into test (name, id[0]) values ('user3', '{0}');


So which way is the correct way....also note that I need to do some
avg(), min(), max()
on these grades, so I hope setting first element to zero is not going to
mess my statistics.

It would fee more natural to be able to say

insert into test ( name, grades ) values ( 'joe', '{}');

That is Joe gets an empty set, instead of saying
insert into test ( name, grades[0]) values ( .....
But note that if one does this, it will fail in the update.....check
this....

insert into test (name, grades) values ('foo', '{}');
select * from test where name = 'foo';
foo | {}
update test set grades[1] = 10 where name = 'foo';
ERROR:  Invalid array subscripts

Thanks


Re: Arrays ... need clarification....

От
Joe Conway
Дата:
Medi Montaseri wrote:
> I can use some clarification on Array usage....
>
> Given the following definition....
>
> create table test ( name varchar(20) , grades integer[]);
>
> How do I insert a new record, I see multiple ways of doing it but if one
> does
> not do this right, then updates will fail....

I think this is probably a misuse of arrays. You might better model this
as something like:

create table student (
  student_id serial primary key,
  name text
);
create table test (
  test_id serial primary key,
  date_taken timestamp,
  description text
);
create table grade (
  grade_id serial primary key,
  test_id int references test,
  student_id int references student,
  test_grade int
);

Now you can do something like:
select s.name, avg(g.test_grade), stddev(g.test_grade) as average from
student s, test t, grade g where s.student_id = g.student_id and
g.test_id = t.test_id and t.date_taken between '01-01-2003' and
'03-31-2003' group by s.name;

But in any case, see:
http://www.postgresql.org/docs/view.php?version=7.3&idoc=1&file=arrays.html
for more information on use of arrays.

Joe


Re: Arrays ... need clarification....

От
Medi Montaseri
Дата:
Yes....your proposed method is indeed the traditional approach....but
since PG provides
Arrays, I figured "How Nice"....

Thank anyways though...

And the link does not provide much, I stopped by there first, before
asking....
All the examples on the link are using
insert into table values ( x, x, x , ...)
instead of
insert into table (col, col, ...) values ( val, val, ...)

Joe Conway wrote:

> Medi Montaseri wrote:
>
>> I can use some clarification on Array usage....
>>
>> Given the following definition....
>>
>> create table test ( name varchar(20) , grades integer[]);
>>
>> How do I insert a new record, I see multiple ways of doing it but if
>> one does
>> not do this right, then updates will fail....
>
>
> I think this is probably a misuse of arrays. You might better model
> this as something like:
>
> create table student (
>  student_id serial primary key,
>  name text
> );
> create table test (
>  test_id serial primary key,
>  date_taken timestamp,
>  description text
> );
> create table grade (
>  grade_id serial primary key,
>  test_id int references test,
>  student_id int references student,
>  test_grade int
> );
>
> Now you can do something like:
> select s.name, avg(g.test_grade), stddev(g.test_grade) as average from
> student s, test t, grade g where s.student_id = g.student_id and
> g.test_id = t.test_id and t.date_taken between '01-01-2003' and
> '03-31-2003' group by s.name;
>
> But in any case, see:
> http://www.postgresql.org/docs/view.php?version=7.3&idoc=1&file=arrays.html
>
> for more information on use of arrays.
>
> Joe
>


Re: Arrays ... need clarification....

От
Joe Conway
Дата:
Medi Montaseri wrote:
> Yes....your proposed method is indeed the traditional approach....but
> since PG provides
> Arrays, I figured "How Nice"....

Don't get me wrong -- I like the feature (enough that I'm working on
improving it for the next release), I just don't think this is a good
application for it. But that's just IMHO ;-)

> And the link does not provide much, I stopped by there first, before
> asking....
> All the examples on the link are using
> insert into table values ( x, x, x , ...)
> instead of
> insert into table (col, col, ...) values ( val, val, ...)

I agree that the documentation on arrays is pretty limited. Again, I
hope to improve that before 7.4 is released. In your original question:

 > method-1:
 > insert into test ( name , grades) values ( 'user1', '{}');
 > select * from test where name = 'user1';
 > name | id
 > --------+--------
 > user1   | {}
 > update test set grades[1] = 10 where name = 'user1';
 > ERROR:  Invalid array subscripts

I think you need to do this instead if the array is not null, but empty:

update test set grades = '{10}' where name = 'user1';

At that point you can add elements by specifying an index of plus or
minus one from the array lower bound or upper bound:

regression=# create table test ( name varchar(20) , grades integer[]);
CREATE TABLE
regression=# insert into test ( name , grades) values ( 'user1', '{}');
INSERT 2466139 1
regression=# update test set grades = '{10}' where name = 'user1';
UPDATE 1
regression=# update test set grades[0] = 9 where name = 'user1';
UPDATE 1
regression=# update test set grades[-1] = 9 where name = 'user1';
UPDATE 1
regression=# update test set grades[2] = 9 where name = 'user1';
UPDATE 1
regression=# update test set grades[3] = 9 where name = 'user1';
UPDATE 1
regression=# select * from test;
  name  |    grades
-------+--------------
  user1 | {9,9,10,9,9}
(1 row)
regression=# select array_dims(grades) from test;
  array_dims
------------
  [-1:3]
(1 row)

Arrays default to a lower bound of 1, but you can change that by adding
elements as shown.

If you do stick with arrays, and want to do any analytics on the array
data, take a look at PL/R. It makes array operations easy because R is a
vector based language. For example:

regression=# create or replace function array_avg(int[]) returns float8
as 'mean(arg1)' language 'plr';
CREATE FUNCTION
regression=# create table test ( name varchar(20) , grades integer[]);
CREATE TABLE
regression=# insert into test ( name , grades) values ( 'user1',
'{91,87,75,96,91}');
INSERT 2466243 1
regression=# select array_avg(grades) from test;
  array_avg
-----------
         88
(1 row)

You can download a copy at http://www.joeconway.com/ if interested, but
you also need R compiled with the --enable-R-shlib option; for R see
http://cran.r-project.org/

HTH,

Joe


Re: Arrays ... need clarification....

От
Medi Montaseri
Дата:
Thank you very much Joe....related to Dr. Conway of the Perl community ?

May I also suggest the following,

The "insert into table (col, col, array_col) values ( val, val, '{}')"
construct is a good
model as it resembles a declaration with no value specified.

Also, currently as you noted, one has to fill the array sequentially, ie
no random array
access is allowed unless one has initialized it already. This appears to
be a limitation of
the underlying data structure. Perhaps you could allow for such random
accesses.

Perhaps you could also support the following

insert into Table (array[m:n]) values ('{0}')

to mean place valure zero in elements from array[m] to array[n].

Thanks

Joe Conway wrote:

> Medi Montaseri wrote:
>
>> Yes....your proposed method is indeed the traditional approach....but
>> since PG provides
>> Arrays, I figured "How Nice"....
>
>
> Don't get me wrong -- I like the feature (enough that I'm working on
> improving it for the next release), I just don't think this is a good
> application for it. But that's just IMHO ;-)
>
>> And the link does not provide much, I stopped by there first, before
>> asking....
>> All the examples on the link are using
>> insert into table values ( x, x, x , ...)
>> instead of
>> insert into table (col, col, ...) values ( val, val, ...)
>
>
> I agree that the documentation on arrays is pretty limited. Again, I
> hope to improve that before 7.4 is released. In your original question:
>
> > method-1:
> > insert into test ( name , grades) values ( 'user1', '{}');
> > select * from test where name = 'user1';
> > name | id
> > --------+--------
> > user1   | {}
> > update test set grades[1] = 10 where name = 'user1';
> > ERROR:  Invalid array subscripts
>
> I think you need to do this instead if the array is not null, but empty:
>
> update test set grades = '{10}' where name = 'user1';
>
> At that point you can add elements by specifying an index of plus or
> minus one from the array lower bound or upper bound:
>
> regression=# create table test ( name varchar(20) , grades integer[]);
> CREATE TABLE
> regression=# insert into test ( name , grades) values ( 'user1', '{}');
> INSERT 2466139 1
> regression=# update test set grades = '{10}' where name = 'user1';
> UPDATE 1
> regression=# update test set grades[0] = 9 where name = 'user1';
> UPDATE 1
> regression=# update test set grades[-1] = 9 where name = 'user1';
> UPDATE 1
> regression=# update test set grades[2] = 9 where name = 'user1';
> UPDATE 1
> regression=# update test set grades[3] = 9 where name = 'user1';
> UPDATE 1
> regression=# select * from test;
>  name  |    grades
> -------+--------------
>  user1 | {9,9,10,9,9}
> (1 row)
> regression=# select array_dims(grades) from test;
>  array_dims
> ------------
>  [-1:3]
> (1 row)
>
> Arrays default to a lower bound of 1, but you can change that by
> adding elements as shown.
>
> If you do stick with arrays, and want to do any analytics on the array
> data, take a look at PL/R. It makes array operations easy because R is
> a vector based language. For example:
>
> regression=# create or replace function array_avg(int[]) returns
> float8 as 'mean(arg1)' language 'plr';
> CREATE FUNCTION
> regression=# create table test ( name varchar(20) , grades integer[]);
> CREATE TABLE
> regression=# insert into test ( name , grades) values ( 'user1',
> '{91,87,75,96,91}');
> INSERT 2466243 1
> regression=# select array_avg(grades) from test;
>  array_avg
> -----------
>         88
> (1 row)
>
> You can download a copy at http://www.joeconway.com/ if interested,
> but you also need R compiled with the --enable-R-shlib option; for R
> see http://cran.r-project.org/
>
> HTH,
>
> Joe
>


Re: Arrays ... need clarification....

От
Joe Conway
Дата:
Medi Montaseri wrote:
> Thank you very much Joe....related to Dr. Conway of the Perl
> community ?

Nah, nobody famous in my clan ;-)

> Also, currently as you noted, one has to fill the array sequentially,
> ie no random array access is allowed unless one has initialized it
> already. This appears to be a limitation of the underlying data
> structure. Perhaps you could allow for such random accesses.

It is, in that the current implementation cannot deal with NULL
elements. I don't think that will get fixed for 7.4, but perhaps in some
future release.

Joe


Re: Arrays ... need clarification....

От
Guy Fraser
Дата:
Ah ha.

Now I see the reson for the lower bound in dims(...) output, I did not
know that it was possible to expand an array from both ends.

Is there a psudo data type that can be used as a function input type for
any kind of array?

I have built some functions that take dims output and return an integer
value {n, where bounds are from 1 to n} of elements in an integer array
and a text array but if I could make one function that handles any type
of array, that would be great.

Since you said you are improving the array features, could you make a
function like dims, that outputs the bounds as a 2D array integers of
integer sets?
ie. {{1,3},{1,4}} rather than [1:3][1:4].

This was going to be the next step of my array_size() functions, but
what I have already works for what I need.

Some of the other things I would like to see is :
 - the ability to populate an array from a set of data rows
 - the ability to output an array as a set of data rows

 From these features alone, many new array functions could be possible
using other standard features.

Most of the arrays I deal with are text arrays, so the PL/R and
int_array stuff doesn't help me.

I have hacked together some functions and sql procedures that help me do
some of these things, but I think builtin features would be more
efficient than anything I could do with PL/pgSQL. I have not written any
'C' functions yet, but do write programs in 'C' and 'PHP' that use
string based queries to work with PostgreSQL.

Joe Conway wrote:
...snip...

>
> I agree that the documentation on arrays is pretty limited. Again, I
> hope to improve that before 7.4 is released. In your original question:
>
> > method-1:
> > insert into test ( name , grades) values ( 'user1', '{}');
> > select * from test where name = 'user1';
> > name | id
> > --------+--------
> > user1   | {}
> > update test set grades[1] = 10 where name = 'user1';
> > ERROR:  Invalid array subscripts
>
> I think you need to do this instead if the array is not null, but empty:
>
> update test set grades = '{10}' where name = 'user1';
>
> At that point you can add elements by specifying an index of plus or
> minus one from the array lower bound or upper bound:
>
> regression=# create table test ( name varchar(20) , grades integer[]);
> CREATE TABLE
> regression=# insert into test ( name , grades) values ( 'user1', '{}');
> INSERT 2466139 1
> regression=# update test set grades = '{10}' where name = 'user1';
> UPDATE 1
> regression=# update test set grades[0] = 9 where name = 'user1';
> UPDATE 1
> regression=# update test set grades[-1] = 9 where name = 'user1';
> UPDATE 1
> regression=# update test set grades[2] = 9 where name = 'user1';
> UPDATE 1
> regression=# update test set grades[3] = 9 where name = 'user1';
> UPDATE 1
> regression=# select * from test;
>  name  |    grades
> -------+--------------
>  user1 | {9,9,10,9,9}
> (1 row)
> regression=# select array_dims(grades) from test;
>  array_dims
> ------------
>  [-1:3]
> (1 row)
>
> Arrays default to a lower bound of 1, but you can change that by
> adding elements as shown.
>
...snip...


Re: Arrays ... need clarification....

От
Joe Conway
Дата:
Guy Fraser wrote:
> Is there a psudo data type that can be used as a function input type for
> any kind of array?

Not really in 7.3.x, but it works in cvs now.

> I have built some functions that take dims output and return an integer
> value {n, where bounds are from 1 to n} of elements in an integer array
> and a text array but if I could make one function that handles any type
> of array, that would be great.

Also in cvs is array_lower(array anyarray, dim int) and
array_upper(array anyarray, dim int), which will give you the '1' and
the 'n' respectively.

> Since you said you are improving the array features, could you make a
> function like dims, that outputs the bounds as a 2D array integers of
> integer sets?
> ie. {{1,3},{1,4}} rather than [1:3][1:4].

Same functions mentioned above:

regression=# SELECT array_lower(ARRAY[[1,2,3],[4,5,6]],1);
  array_lower
-------------
            1
(1 row)

regression=# SELECT array_lower(ARRAY[[1,2,3],[4,5,6]],2);
  array_lower
-------------
            1
(1 row)

regression=# SELECT array_upper(ARRAY[[1,2,3],[4,5,6]],1);
  array_upper
-------------
            2
(1 row)

regression=# SELECT array_upper(ARRAY[[1,2,3],[4,5,6]],2);
  array_upper
-------------
            3
(1 row)

But I guess returning all the array bounds as an array might be a nice
function to have too.

> Some of the other things I would like to see is :
> - the ability to populate an array from a set of data rows

This works in cvs also:

CREATE TEMP TABLE arrtest_f (f0 int, f1 text, f2 float8);
insert into arrtest_f values(1,'cat1',1.21);
insert into arrtest_f values(2,'cat1',1.24);
insert into arrtest_f values(3,'cat1',1.18);
insert into arrtest_f values(4,'cat1',1.26);
insert into arrtest_f values(5,'cat1',1.15);
insert into arrtest_f values(6,'cat2',1.15);
insert into arrtest_f values(7,'cat2',1.26);
insert into arrtest_f values(8,'cat2',1.32);
insert into arrtest_f values(9,'cat2',1.30);
regression=# SELECT ARRAY(select f2 from arrtest_f order by f2) AS "ARRAY";
                      ARRAY
-----------------------------------------------
  {1.15,1.15,1.18,1.21,1.24,1.26,1.26,1.3,1.32}
(1 row)


> - the ability to output an array as a set of data rows


I submitted a function that did this, but it was rejected but its
usefulness is limited by the current table function semantics, i.e. you
can do:

   select * from output_array_as_rows(array_constant);

but there is currently no way to do this (or something equiv):

   select output_array_as_rows(tbl.array_field) from tbl;

The resolution to this issue quickly started to look like a massive
project -- I might take it on one day, but not likely before a 7.4
feature freeze.

>  From these features alone, many new array functions could be possible
> using other standard features.
>
> Most of the arrays I deal with are text arrays, so the PL/R and
> int_array stuff doesn't help me.

You can certainly pass text arrays to PL/R, and R is inherently a vector
(array) processing language. You obviously wouldn't need all the
statistical processing power of R, but I think you'd find it makes most
manipulations or arrays fairly easy.

Joe


Re: Arrays ... need clarification....

От
Guy Fraser
Дата:
Thank you for the feedback.

It would be great to see what is in cvs now make it into 7.4 .

Joe Conway wrote:

> Guy Fraser wrote:
>
>> Is there a psudo data type that can be used as a function input type
>> for any kind of array?
>
>
> Not really in 7.3.x, but it works in cvs now.

I had noticed the anyarray somewhere before but I don't think I was able
to use it [IIRC].

>
>> I have built some functions that take dims output and return an
>> integer value {n, where bounds are from 1 to n} of elements in an
>> integer array and a text array but if I could make one function that
>> handles any type of array, that would be great.
>
>
> Also in cvs is array_lower(array anyarray, dim int) and
> array_upper(array anyarray, dim int), which will give you the '1' and
> the 'n' respectively.
>
>> Since you said you are improving the array features, could you make a
>> function like dims, that outputs the bounds as a 2D array integers of
>> integer sets?
>> ie. {{1,3},{1,4}} rather than [1:3][1:4].
>
>
> Same functions mentioned above:

...snip...

>
> But I guess returning all the array bounds as an array might be a nice
> function to have too.
>
>> Some of the other things I would like to see is :
>> - the ability to populate an array from a set of data rows
>
>
> This works in cvs also:
>
> CREATE TEMP TABLE arrtest_f (f0 int, f1 text, f2 float8);
> insert into arrtest_f values(1,'cat1',1.21);
> insert into arrtest_f values(2,'cat1',1.24);
> insert into arrtest_f values(3,'cat1',1.18);
> insert into arrtest_f values(4,'cat1',1.26);
> insert into arrtest_f values(5,'cat1',1.15);
> insert into arrtest_f values(6,'cat2',1.15);
> insert into arrtest_f values(7,'cat2',1.26);
> insert into arrtest_f values(8,'cat2',1.32);
> insert into arrtest_f values(9,'cat2',1.30);
> regression=# SELECT ARRAY(select f2 from arrtest_f order by f2) AS
> "ARRAY";
>                      ARRAY
> -----------------------------------------------
>  {1.15,1.15,1.18,1.21,1.24,1.26,1.26,1.3,1.32}
> (1 row)

...snip...

>
>>  From these features alone, many new array functions could be
>> possible using other standard features.
>>
>> Most of the arrays I deal with are text arrays, so the PL/R and
>> int_array stuff doesn't help me.
>
>
> You can certainly pass text arrays to PL/R, and R is inherently a
> vector (array) processing language. You obviously wouldn't need all
> the statistical processing power of R, but I think you'd find it makes
> most manipulations or arrays fairly easy.
>
> Joe

Guy