Обсуждение: Foreign Keys With Arrays
If I have a column that is of type int4[] (array of integers) is it possible to require that each integer within that array is a foreign key?
Ketema Harris writes: > If I have a column that is of type int4[] (array of integers) is it possible > to require that each integer within that array is a foreign key? I'm afraid you'll have to define a custom check-constraint to do that. E.g. --8<---------------cut here---------------start------------->8--- scratch=# create table foo (a int); CREATE TABLE scratch=# insert into foo values (1); INSERT 6856486 1 scratch=# insert into foo values (2); INSERT 6856487 1 scratch=# create table bar (a int[]); CREATE TABLE scratch=# create function valid_array(a int4[]) returns boolean as $$ scratch$# begin scratch$# for i in array_lower(a, 1)..array_upper(a,1) loop scratch$# if not exists (select 1 from foo where foo.a = a[i]) then scratch$# return 'f'; scratch$# end if; scratch$# end loop; scratch$# return 't'; scratch$# end $$ language plpgsql; CREATE FUNCTION scratch=# alter table bar add constraint foo check(valid_array(a)); ALTER TABLE scratch=# insert into bar values ('{1,2}'); INSERT 6856494 1 scratch=# insert into bar values ('{1,3}'); ERROR: new row for relation "bar" violates check constraint "foo" scratch=# --8<---------------cut here---------------end--------------->8--- regards, Andreas --
Andreas Seltenreich writes: > Ketema Harris writes: > >> If I have a column that is of type int4[] (array of integers) is it possible >> to require that each integer within that array is a foreign key? > > I'm afraid you'll have to define a custom check-constraint to do that. > > E.g. > [...] > scratch=# create function valid_array(a int4[]) returns boolean as $$ [...] ...and of course you'll need the appropriate trigger function on the referenced table to maintain referential integrity, but I hope the example will get you going. regards, Andreas --
dear group, I am running a file with 550,000 lines. This data is not so complex. The scheme is Table A seq_id | seq_name | seq_identifier 1 | ABC | NM_0000023 .... (23k lines) Table B: gos_id | go_id | go_cat | go_name 1 GO:00003 | P | death ...... (7k) Now table C seq_id | gos_id 1 1 1 | 2 1 | 200 ... (550K lines) Table Temp_C seq_name | go_id ABC | GO:9993934 ABC | GO:3489343 .... (550,500 lines) question: TAble A and B populated easily without question. Table C, is completely a relationship table. I wrote a python script that writes all 500K lines with select statements inside. insert into tablec (seq_id,gos_id) values ( (select seq_id from table a where seq_name ='xxxx'), (select gos_id from table b where go_id = 'xxxx')); Such 500K + insert statements took a long time and I had to abort it. 3.5 hrs of time, 300K lines got inserted I wrote a plpgsql function: create function gosup() returns integer AS ' DECLARE referrer_key RECORD; BEIGN FOR referrer_key IN SELECT * from TEMP_C LOOP INSERT INTO tableC(seq_id,gos_id) values( (select seq_id from table a where seq_name =referrer_key.seq_name), (select gos_id from table b where go_id = referrer_key.go_id)) END LOOP return 0; END; ' LANGUAGE plpgsql; so this function is written to make the inserting faster. I used \copy command to create temp_c. The ironical point is, even now it is taking over an hour (still running as of now). So experts, do you really thing inserting 550K lines is really a long time consuming step no matter what method you choose (such as regular insert statements or writing a procedure to automate this step on server side). Could any one throw some light on DB performance (no matter wheter it is postgres, oracle, SyBase or MySQL). In specific what is the state of efficiency in postgres. Thank you. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Dear Srinivas,
It may be helpful to know what operating system you are using, and also the amount of system RAM, the processor type, and the hard drive sizes and speeds.
-Mike
It may be helpful to know what operating system you are using, and also the amount of system RAM, the processor type, and the hard drive sizes and speeds.
-Mike
On 2/7/06, Srinivas Iyyer <srini_iyyer_bio@yahoo.com> wrote:
dear group,
I am running a file with 550,000 lines. This data is
not so complex. The scheme is
Table A
seq_id | seq_name | seq_identifier
1 | ABC | NM_0000023
....
(23k lines)
Table B:
gos_id | go_id | go_cat | go_name
1 GO:00003 | P | death
......
(7k)
Now table C
seq_id | gos_id
1 1
1 | 2
1 | 200
...
(550K lines)
Table Temp_C
seq_name | go_id
ABC | GO:9993934
ABC | GO:3489343
....
(550,500 lines)
question:
TAble A and B populated easily without question.
Table C, is completely a relationship table.
I wrote a python script that writes all 500K lines
with select statements inside.
insert into tablec (seq_id,gos_id) values (
(select seq_id from table a where seq_name ='xxxx'),
(select gos_id from table b where go_id = 'xxxx'));
Such 500K + insert statements took a long time and I
had to abort it.
3.5 hrs of time, 300K lines got inserted
I wrote a plpgsql function:
create function gosup() returns integer AS '
DECLARE
referrer_key RECORD;
BEIGN
FOR referrer_key IN SELECT * from TEMP_C LOOP
INSERT INTO tableC(seq_id,gos_id) values(
(select seq_id from table a where seq_name
=referrer_key.seq_name),
(select gos_id from table b where go_id =
referrer_key.go_id))
END LOOP
return 0;
END;
' LANGUAGE plpgsql;
so this function is written to make the inserting
faster.
I used \copy command to create temp_c.
The ironical point is, even now it is taking over an
hour (still running as of now).
So experts,
do you really thing inserting 550K lines is really a
long time consuming step no matter what method you
choose (such as regular insert statements or writing a
procedure to automate this step on server side).
Could any one throw some light on DB performance (no
matter wheter it is postgres, oracle, SyBase or
MySQL).
In specific what is the state of efficiency in
postgres.
Thank you.
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Srinivas Iyyer <srini_iyyer_bio@yahoo.com> writes: > [ It's not fast to do 500K commands like ] > insert into tablec (seq_id,gos_id) values ( > (select seq_id from table a where seq_name ='xxxx'), > (select gos_id from table b where go_id = 'xxxx')); Don't you want something like insert into tablec (seq_id,gos_id) select seq_id, gos_id from a, b where seq_name = go_id; SQL is not a low-level language, and breaking a table-wise operation down into bite-size parts is not the way to make it go fast. The startup overhead for a command is almost always going to dwarf the time spent processing any one row, so you want to make sure you process as many rows per command as feasible. Also, make sure you've ANALYZEd both input tables beforehand, else the planner may choose a poor plan for this command. It'd be worth looking at the EXPLAIN output for the command just to make sure nothing silly is happening. regards, tom lane
Hi Mike, I am using a Dell Precission 670, RAM: 2 GB Processor: two Intel® XeonTM 3.00GHz, 2MB L2 Cache OS : Red Hat® Enterprise Linux WS v4, Chipset : .Intel E7525 chipset Disks : 1.5TB SATA disk space (3x500); If I am not wrong they are 7200RPM disks thanks Srini --- Michael Swierczek <mike.swierczek@gmail.com> wrote: > Dear Srinivas, > It may be helpful to know what operating > system you are using, and > also the amount of system RAM, the processor type, > and the hard drive sizes > and speeds. > > -Mike > > On 2/7/06, Srinivas Iyyer > <srini_iyyer_bio@yahoo.com> wrote: > > > > > > dear group, > > I am running a file with 550,000 lines. This data > is > > not so complex. The scheme is > > Table A > > > > seq_id | seq_name | seq_identifier > > 1 | ABC | NM_0000023 > > .... > > > > (23k lines) > > > > > > Table B: > > > > gos_id | go_id | go_cat | go_name > > 1 GO:00003 | P | death > > ...... > > (7k) > > > > > > Now table C > > > > seq_id | gos_id > > 1 1 > > 1 | 2 > > 1 | 200 > > > > ... > > (550K lines) > > > > > > > > > > Table Temp_C > > seq_name | go_id > > ABC | GO:9993934 > > ABC | GO:3489343 > > .... > > (550,500 lines) > > > > > > question: > > > > TAble A and B populated easily without question. > > > > Table C, is completely a relationship table. > > I wrote a python script that writes all 500K lines > > with select statements inside. > > > > insert into tablec (seq_id,gos_id) values ( > > (select seq_id from table a where seq_name > ='xxxx'), > > (select gos_id from table b where go_id = > 'xxxx')); > > > > > > Such 500K + insert statements took a long time and > I > > had to abort it. > > > > 3.5 hrs of time, 300K lines got inserted > > > > > > I wrote a plpgsql function: > > > > create function gosup() returns integer AS ' > > DECLARE > > referrer_key RECORD; > > BEIGN > > FOR referrer_key IN SELECT * from TEMP_C LOOP > > INSERT INTO tableC(seq_id,gos_id) values( > > (select seq_id from table a where seq_name > > =referrer_key.seq_name), > > (select gos_id from table b where go_id = > > referrer_key.go_id)) > > END LOOP > > return 0; > > END; > > ' LANGUAGE plpgsql; > > > > so this function is written to make the inserting > > faster. > > > > I used \copy command to create temp_c. > > > > > > The ironical point is, even now it is taking over > an > > hour (still running as of now). > > > > So experts, > > > > do you really thing inserting 550K lines is really > a > > long time consuming step no matter what method you > > choose (such as regular insert statements or > writing a > > procedure to automate this step on server side). > > > > Could any one throw some light on DB performance > (no > > matter wheter it is postgres, oracle, SyBase or > > MySQL). > > > > In specific what is the state of efficiency in > > postgres. > > > > Thank you. > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > > > ---------------------------(end of > broadcast)--------------------------- > > TIP 1: if posting/reading through Usenet, please > send an appropriate > > subscribe-nomail command to > majordomo@postgresql.org so that your > > message can get through to the mailing list > cleanly > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Sorry Tom, for the confusion. seq_id is not equal to go_id seq_id = NM_000033 go_is = GO:049934 It is bad on my part I wrote seq_id = 'xxxx' go_id = 'xxxx' it should be 'yyyy' srini --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Srinivas Iyyer <srini_iyyer_bio@yahoo.com> writes: > > [ It's not fast to do 500K commands like ] > > insert into tablec (seq_id,gos_id) values ( > > (select seq_id from table a where seq_name > ='xxxx'), > > (select gos_id from table b where go_id = > 'xxxx')); > > Don't you want something like > > insert into tablec (seq_id,gos_id) > select seq_id, gos_id from a, b where seq_name = > go_id; > > SQL is not a low-level language, and breaking a > table-wise operation > down into bite-size parts is not the way to make it > go fast. The > startup overhead for a command is almost always > going to dwarf the time > spent processing any one row, so you want to make > sure you process as > many rows per command as feasible. > > Also, make sure you've ANALYZEd both input tables > beforehand, > else the planner may choose a poor plan for this > command. > It'd be worth looking at the EXPLAIN output for the > command > just to make sure nothing silly is happening. > > regards, tom lane > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Srinivas Iyyer <srini_iyyer_bio@yahoo.com> writes: > Sorry Tom, for the confusion. > seq_id is not equal to go_id > seq_id = NM_000033 > go_is = GO:049934 So how do you know which ones are related? If there's a table showing the relationship, joining against that is the way to go. You might also want to rethink your choice of data keys --- if there isn't a simple way to join related rows in different tables, you've chosen a poor data representation. A small tip is that integer or bigint keys are likely to perform better than strings ... but this is definitely a second-order effect, and I don't think you've yet got the first-order representational decisions correct. regards, tom lane
I would do something like this: Create table seq_go ( seq_name varchar, go_id varchar ); Create index seq_go_seq_name_idx on seq_go(seq_name); Create indes seq_go_go_id on seq_go(go_id); Do a copy from the text file that contains the sequence to GO mapping into the above table. This should be VERY fast. Then create table c as before. Finally, do: insert into table c (seq_id,gos_id) select a.seq_id,b.gos_id from a,b,seq_go where seq_go.seq_name=a.seq_name and b.go_id=seq_go.go_id; There may be some typos (haven't tested this), but the insert is quite fast. I have built these tables using the same data that you are looking at (probably) in a few seconds--postgres is not limiting in that way. Sean On 2/7/06 1:36 PM, "Srinivas Iyyer" <srini_iyyer_bio@yahoo.com> wrote: > Sorry Tom, for the confusion. > > seq_id is not equal to go_id > > seq_id = NM_000033 > go_is = GO:049934 > > It is bad on my part I wrote > seq_id = 'xxxx' > go_id = 'xxxx' it should be 'yyyy' > > srini > > --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Srinivas Iyyer <srini_iyyer_bio@yahoo.com> writes: >>> [ It's not fast to do 500K commands like ] >>> insert into tablec (seq_id,gos_id) values ( >>> (select seq_id from table a where seq_name >> ='xxxx'), >>> (select gos_id from table b where go_id = >> 'xxxx')); >> >> Don't you want something like >> >> insert into tablec (seq_id,gos_id) >> select seq_id, gos_id from a, b where seq_name = >> go_id; >> >> SQL is not a low-level language, and breaking a >> table-wise operation >> down into bite-size parts is not the way to make it >> go fast. The >> startup overhead for a command is almost always >> going to dwarf the time >> spent processing any one row, so you want to make >> sure you process as >> many rows per command as feasible. >> >> Also, make sure you've ANALYZEd both input tables >> beforehand, >> else the planner may choose a poor plan for this >> command. >> It'd be worth looking at the EXPLAIN output for the >> command >> just to make sure nothing silly is happening. >> >> regards, tom lane >> > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match