Обсуждение: Foreign Keys With Arrays

Поиск
Список
Период
Сортировка

Foreign Keys With Arrays

От
Ketema Harris
Дата:
If I have a column that is of type int4[] (array of integers) is it possible to require that each integer within that array is a foreign key?

Re: Foreign Keys With Arrays

От
Andreas Seltenreich
Дата:
Ketema Harris writes:

> If I have a column that is of type int4[] (array of integers) is it possible
> to require that each integer within that array is a foreign key?

I'm afraid you'll have to define a custom check-constraint to do that.

E.g.

--8<---------------cut here---------------start------------->8---
scratch=# create table foo (a int);
CREATE TABLE
scratch=# insert into foo values (1);
INSERT 6856486 1
scratch=# insert into foo values (2);
INSERT 6856487 1
scratch=# create table bar (a int[]);
CREATE TABLE
scratch=# create function valid_array(a int4[]) returns boolean as $$
scratch$# begin
scratch$# for i in array_lower(a, 1)..array_upper(a,1) loop
scratch$#    if not exists (select 1 from foo where foo.a = a[i]) then
scratch$#        return 'f';
scratch$#    end if;
scratch$# end loop;
scratch$# return 't';
scratch$# end $$ language plpgsql;
CREATE FUNCTION
scratch=# alter table bar add constraint foo check(valid_array(a));
ALTER TABLE
scratch=# insert into bar values ('{1,2}');
INSERT 6856494 1
scratch=# insert into bar values ('{1,3}');
ERROR:  new row for relation "bar" violates check constraint "foo"
scratch=#
--8<---------------cut here---------------end--------------->8---

regards,
Andreas
--

Re: Foreign Keys With Arrays

От
Andreas Seltenreich
Дата:
Andreas Seltenreich writes:

> Ketema Harris writes:
>
>> If I have a column that is of type int4[] (array of integers) is it possible
>> to require that each integer within that array is a foreign key?
>
> I'm afraid you'll have to define a custom check-constraint to do that.
>
> E.g.
>
[...]
> scratch=# create function valid_array(a int4[]) returns boolean as $$
[...]

...and of course you'll need the appropriate trigger function on the
referenced table to maintain referential integrity, but I hope the
example will get you going.

regards,
Andreas
--

Please comment on pgsql speed at handling 550,000 records

От
Srinivas Iyyer
Дата:
dear group,
 I am running a file with 550,000 lines.  This data is
not so complex. The scheme is
Table A

seq_id | seq_name  | seq_identifier
1      | ABC       | NM_0000023
....

(23k lines)


Table B:

gos_id  |  go_id  | go_cat  | go_name
1        GO:00003 | P       | death
......
(7k)


Now table C

seq_id   | gos_id
1           1
1        |  2
1        |  200

...
(550K lines)




Table Temp_C
seq_name  |  go_id
ABC       | GO:9993934
ABC       | GO:3489343
....
(550,500 lines)


question:

TAble A and B populated easily without question.

Table C, is completely a relationship table.
I wrote a python script that writes all 500K lines
with select statements inside.

insert into tablec (seq_id,gos_id) values (
(select seq_id from table a where seq_name ='xxxx'),
(select gos_id from table b where go_id = 'xxxx'));


Such 500K + insert statements took a long time and I
had to abort it.

3.5 hrs of time, 300K lines got inserted


I wrote a plpgsql function:

create function gosup() returns integer AS '
DECLARE
    referrer_key RECORD;
BEIGN
    FOR referrer_key IN SELECT * from TEMP_C LOOP
       INSERT INTO tableC(seq_id,gos_id) values(
(select seq_id from table a where seq_name
=referrer_key.seq_name),
(select gos_id from table b where go_id =
referrer_key.go_id))
    END LOOP
return 0;
END;
' LANGUAGE plpgsql;

so this function is written to make the inserting
faster.

I used \copy command to create temp_c.


The ironical point is, even now it is taking over an
hour (still running as of now).

So experts,

do you really thing inserting 550K lines is really a
long time consuming step no matter what method you
choose (such as regular insert statements or writing a
procedure to automate this step on server side).

Could any one throw some light on DB performance (no
matter wheter it is postgres, oracle, SyBase or
MySQL).

In specific what is the state of efficiency in
postgres.

Thank you.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Please comment on pgsql speed at handling 550,000 records

От
Michael Swierczek
Дата:
Dear Srinivas,
      It may be helpful to know what operating system you are using, and also the amount of system RAM, the processor type, and the hard drive sizes and speeds.  
     
-Mike

On 2/7/06, Srinivas Iyyer <srini_iyyer_bio@yahoo.com> wrote:

dear group,
I am running a file with 550,000 lines.  This data is
not so complex. The scheme is
Table A

seq_id | seq_name  | seq_identifier
1      | ABC       | NM_0000023
....

(23k lines)


Table B:

gos_id  |  go_id  | go_cat  | go_name
1        GO:00003 | P       | death
......
(7k)


Now table C

seq_id   | gos_id
1           1
1        |  2
1        |  200

...
(550K lines)




Table Temp_C
seq_name  |  go_id
ABC       | GO:9993934
ABC       | GO:3489343
....
(550,500 lines)


question:

TAble A and B populated easily without question.

Table C, is completely a relationship table.
I wrote a python script that writes all 500K lines
with select statements inside.

insert into tablec (seq_id,gos_id) values (
(select seq_id from table a where seq_name ='xxxx'),
(select gos_id from table b where go_id = 'xxxx'));


Such 500K + insert statements took a long time and I
had to abort it.

3.5 hrs of time, 300K lines got inserted


I wrote a plpgsql function:

create function gosup() returns integer AS '
DECLARE
    referrer_key RECORD;
BEIGN
    FOR referrer_key IN SELECT * from TEMP_C LOOP
       INSERT INTO tableC(seq_id,gos_id) values(
(select seq_id from table a where seq_name
=referrer_key.seq_name),
(select gos_id from table b where go_id =
referrer_key.go_id))
    END LOOP
return 0;
END;
' LANGUAGE plpgsql;

so this function is written to make the inserting
faster.

I used \copy command to create temp_c.


The ironical point is, even now it is taking over an
hour (still running as of now).

So experts,

do you really thing inserting 550K lines is really a
long time consuming step no matter what method you
choose (such as regular insert statements or writing a
procedure to automate this step on server side).

Could any one throw some light on DB performance (no
matter wheter it is postgres, oracle, SyBase or
MySQL).

In specific what is the state of efficiency in
postgres.

Thank you.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly

Re: Please comment on pgsql speed at handling 550,000 records

От
Tom Lane
Дата:
Srinivas Iyyer <srini_iyyer_bio@yahoo.com> writes:
> [ It's not fast to do 500K commands like ]
> insert into tablec (seq_id,gos_id) values (
> (select seq_id from table a where seq_name ='xxxx'),
> (select gos_id from table b where go_id = 'xxxx'));

Don't you want something like

insert into tablec (seq_id,gos_id)
  select seq_id, gos_id from a, b where seq_name = go_id;

SQL is not a low-level language, and breaking a table-wise operation
down into bite-size parts is not the way to make it go fast.  The
startup overhead for a command is almost always going to dwarf the time
spent processing any one row, so you want to make sure you process as
many rows per command as feasible.

Also, make sure you've ANALYZEd both input tables beforehand,
else the planner may choose a poor plan for this command.
It'd be worth looking at the EXPLAIN output for the command
just to make sure nothing silly is happening.

            regards, tom lane

Re: Please comment on pgsql speed at handling 550,000 records

От
Srinivas Iyyer
Дата:
Hi Mike,
 I am using a Dell Precission 670,
RAM: 2 GB
Processor: two Intel® XeonTM  3.00GHz, 2MB L2 Cache
OS : Red Hat® Enterprise Linux WS v4,
Chipset : .Intel E7525 chipset
Disks :  1.5TB SATA disk space (3x500); If I am not
wrong they are  7200RPM  disks

thanks
Srini


--- Michael Swierczek <mike.swierczek@gmail.com>
wrote:

> Dear Srinivas,
>       It may be helpful to know what operating
> system you are using, and
> also the amount of system RAM, the processor type,
> and the hard drive sizes
> and speeds.
>
> -Mike
>
> On 2/7/06, Srinivas Iyyer
> <srini_iyyer_bio@yahoo.com> wrote:
> >
> >
> > dear group,
> > I am running a file with 550,000 lines.  This data
> is
> > not so complex. The scheme is
> > Table A
> >
> > seq_id | seq_name  | seq_identifier
> > 1      | ABC       | NM_0000023
> > ....
> >
> > (23k lines)
> >
> >
> > Table B:
> >
> > gos_id  |  go_id  | go_cat  | go_name
> > 1        GO:00003 | P       | death
> > ......
> > (7k)
> >
> >
> > Now table C
> >
> > seq_id   | gos_id
> > 1           1
> > 1        |  2
> > 1        |  200
> >
> > ...
> > (550K lines)
> >
> >
> >
> >
> > Table Temp_C
> > seq_name  |  go_id
> > ABC       | GO:9993934
> > ABC       | GO:3489343
> > ....
> > (550,500 lines)
> >
> >
> > question:
> >
> > TAble A and B populated easily without question.
> >
> > Table C, is completely a relationship table.
> > I wrote a python script that writes all 500K lines
> > with select statements inside.
> >
> > insert into tablec (seq_id,gos_id) values (
> > (select seq_id from table a where seq_name
> ='xxxx'),
> > (select gos_id from table b where go_id =
> 'xxxx'));
> >
> >
> > Such 500K + insert statements took a long time and
> I
> > had to abort it.
> >
> > 3.5 hrs of time, 300K lines got inserted
> >
> >
> > I wrote a plpgsql function:
> >
> > create function gosup() returns integer AS '
> > DECLARE
> >     referrer_key RECORD;
> > BEIGN
> >     FOR referrer_key IN SELECT * from TEMP_C LOOP
> >        INSERT INTO tableC(seq_id,gos_id) values(
> > (select seq_id from table a where seq_name
> > =referrer_key.seq_name),
> > (select gos_id from table b where go_id =
> > referrer_key.go_id))
> >     END LOOP
> > return 0;
> > END;
> > ' LANGUAGE plpgsql;
> >
> > so this function is written to make the inserting
> > faster.
> >
> > I used \copy command to create temp_c.
> >
> >
> > The ironical point is, even now it is taking over
> an
> > hour (still running as of now).
> >
> > So experts,
> >
> > do you really thing inserting 550K lines is really
> a
> > long time consuming step no matter what method you
> > choose (such as regular insert statements or
> writing a
> > procedure to automate this step on server side).
> >
> > Could any one throw some light on DB performance
> (no
> > matter wheter it is postgres, oracle, SyBase or
> > MySQL).
> >
> > In specific what is the state of efficiency in
> > postgres.
> >
> > Thank you.
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please
> send an appropriate
> >        subscribe-nomail command to
> majordomo@postgresql.org so that your
> >        message can get through to the mailing list
> cleanly
> >
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Please comment on pgsql speed at handling 550,000 records

От
Srinivas Iyyer
Дата:
Sorry Tom, for the confusion.

seq_id is not equal to go_id

seq_id = NM_000033
go_is = GO:049934

It is bad on my part I wrote
seq_id = 'xxxx'
go_id = 'xxxx' it should be 'yyyy'

srini

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Srinivas Iyyer <srini_iyyer_bio@yahoo.com> writes:
> > [ It's not fast to do 500K commands like ]
> > insert into tablec (seq_id,gos_id) values (
> > (select seq_id from table a where seq_name
> ='xxxx'),
> > (select gos_id from table b where go_id =
> 'xxxx'));
>
> Don't you want something like
>
> insert into tablec (seq_id,gos_id)
>   select seq_id, gos_id from a, b where seq_name =
> go_id;
>
> SQL is not a low-level language, and breaking a
> table-wise operation
> down into bite-size parts is not the way to make it
> go fast.  The
> startup overhead for a command is almost always
> going to dwarf the time
> spent processing any one row, so you want to make
> sure you process as
> many rows per command as feasible.
>
> Also, make sure you've ANALYZEd both input tables
> beforehand,
> else the planner may choose a poor plan for this
> command.
> It'd be worth looking at the EXPLAIN output for the
> command
> just to make sure nothing silly is happening.
>
>             regards, tom lane
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Please comment on pgsql speed at handling 550,000 records

От
Tom Lane
Дата:
Srinivas Iyyer <srini_iyyer_bio@yahoo.com> writes:
> Sorry Tom, for the confusion.
> seq_id is not equal to go_id

> seq_id = NM_000033
> go_is = GO:049934

So how do you know which ones are related?  If there's a table showing
the relationship, joining against that is the way to go.  You might also
want to rethink your choice of data keys --- if there isn't a simple way
to join related rows in different tables, you've chosen a poor data
representation.

A small tip is that integer or bigint keys are likely to perform better
than strings ... but this is definitely a second-order effect, and I
don't think you've yet got the first-order representational decisions
correct.

            regards, tom lane

Re: Please comment on pgsql speed at handling 550,000

От
Sean Davis
Дата:
I would do something like this:

Create table seq_go (
    seq_name varchar,
    go_id varchar
);
Create index seq_go_seq_name_idx on seq_go(seq_name);
Create indes seq_go_go_id on seq_go(go_id);

Do a copy from the text file that contains the sequence to GO mapping into
the above table.  This should be VERY fast.

Then create table c as before.

Finally, do:

insert into table c (seq_id,gos_id)
select a.seq_id,b.gos_id from a,b,seq_go
where seq_go.seq_name=a.seq_name and
 b.go_id=seq_go.go_id;

There may be some typos (haven't tested this), but the insert is quite fast.
I have built these tables using the same data that you are looking at
(probably) in a few seconds--postgres is not limiting in that way.

Sean


On 2/7/06 1:36 PM, "Srinivas Iyyer" <srini_iyyer_bio@yahoo.com> wrote:

> Sorry Tom, for the confusion.
>
> seq_id is not equal to go_id
>
> seq_id = NM_000033
> go_is = GO:049934
>
> It is bad on my part I wrote
> seq_id = 'xxxx'
> go_id = 'xxxx' it should be 'yyyy'
>
> srini
>
> --- Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Srinivas Iyyer <srini_iyyer_bio@yahoo.com> writes:
>>> [ It's not fast to do 500K commands like ]
>>> insert into tablec (seq_id,gos_id) values (
>>> (select seq_id from table a where seq_name
>> ='xxxx'),
>>> (select gos_id from table b where go_id =
>> 'xxxx'));
>>
>> Don't you want something like
>>
>> insert into tablec (seq_id,gos_id)
>>   select seq_id, gos_id from a, b where seq_name =
>> go_id;
>>
>> SQL is not a low-level language, and breaking a
>> table-wise operation
>> down into bite-size parts is not the way to make it
>> go fast.  The
>> startup overhead for a command is almost always
>> going to dwarf the time
>> spent processing any one row, so you want to make
>> sure you process as
>> many rows per command as feasible.
>>
>> Also, make sure you've ANALYZEd both input tables
>> beforehand,
>> else the planner may choose a poor plan for this
>> command.
>> It'd be worth looking at the EXPLAIN output for the
>> command
>> just to make sure nothing silly is happening.
>>
>> regards, tom lane
>>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match