Обсуждение: Query performance

Поиск
Список
Период
Сортировка

Query performance

От
"Christian Rengstl"
Дата:
Hi everyone,

i have a table with around 57 million tuples, with the following columns: pid(varchar), crit(varchar), val1(varchar),
val2(varchar).Example: 
pid    crit    val1    val2
p1      c1      x        y
p1      c2      x        z
p1      c3      y        x
...
What i am doing is to query all val1 and val2 for one pid and all crit values:

select val1, val2, crit from mytable where pid='somepid' and crit in(select crit from myCritTable);
where myCritTable is a table that contains all crit values (around 42.000) ordered by their insertion date.


QUERY PLAN

--------------------------------------------------------------------------------
----------------------------------------------------------
 Hash IN Join  (cost=1033.67..134959.41 rows=37120 width=23) (actual time=357.11
6..356984.535 rows=37539 loops=1)
   Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
   ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66 rows=37120 width=23) (
actual time=291.600..356707.737 rows=37539 loops=1)
         Recheck Cond: ((pid)::text = '1'::text)
         ->  Bitmap Index Scan on idx_test2_pid  (cost=0.00..232.92 rows=37120 w
idth=0) (actual time=234.516..234.516 rows=37539 loops=1)
               Index Cond: ((pid)::text = '1'::text)
   ->  Hash  (cost=700.20..700.20 rows=40220 width=13) (actual time=65.055..65.0
55 rows=40220 loops=1)
         ->  Seq Scan on snps_test  (cost=0.00..700.20 rows=40220 width=13) (act
ual time=0.020..30.131 rows=40220 loops=1)
 Total runtime: 357017.259 ms

Unfortunately the query takes pretty long for the big table, so maybe one of you has a suggestion on how to make it
faster.

--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230


Re: Query performance

От
"Nikolay Samokhvalov"
Дата:
On 8/3/06, Christian Rengstl <Christian.Rengstl@klinik.uni-regensburg.de> wrote:
> ...
> Unfortunately the query takes pretty long for the big table, so maybe one of you has a suggestion on how to make it
faster.
>

try smth like this:

select val1, val2, crit from mytable as a where pid='somepid' and
exists(select 1 from myCritTable as b where a.crit = b.crit);


--
Best regards,
Nikolay

Re: Query performance

От
"Christian Rengstl"
Дата:
Hi,

the complete query is the one i posted, but here comes the schema for mytable:
entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass),
  pid varchar(15) NOT NULL,
  crit varchar(13) NOT NULL,
  val1 varchar(1),
  val2 varchar(1),
  aendat text,
  aennam varchar(8),
  CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)

myCritTable:
  crit varchar(13) NOT NULL,
  chr int2,
  aendat timestamp,
  CONSTRAINT pk_crit_master PRIMARY KEY (crit)

My server is 8.1.4. As a matter of fact, i have no idea where the text type comes from, because as you can see from
abovethere are only varchar with maximum 15 characters. 

"Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 10:34 am:
> Hi,
>
> can you post the complete query,schema- and
> table-definition,server-version etc. ?
> This will help to identity the main problem.
>
> So at the moment i'm just guessing:
>
> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
>     ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66  rows=37120
> width=23)
>                          (actual time=291.600..356707.737 rows=37539 loops=1)
> This part is very expensive, but i got no clue why.
> Maybe the text-type is not so ideal.
>
> Best regards
>
> Hakan Kocaman
> Software-Development
>
> digame.de GmbH
> Richard-Byrd-Str. 4-8
> 50829 Köln
>
> Tel.: +49 (0) 221 59 68 88 31
> Fax: +49 (0) 221 59 68 88 98
> Email: hakan.kocaman@digame.de
>
>
>
>> -----Original Message-----
>> From: pgsql-general-owner@postgresql.org
>> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of
>> Christian Rengstl
>> Sent: Thursday, August 03, 2006 10:13 AM
>> To: pgsql-general@postgresql.org
>> Subject: [GENERAL] Query performance
>>
>>
>> Hi everyone,
>>
>> i have a table with around 57 million tuples, with the
>> following columns: pid(varchar), crit(varchar),
>> val1(varchar), val2(varchar). Example:
>> pid    crit    val1    val2
>> p1      c1      x        y
>> p1      c2      x        z
>> p1      c3      y        x
>> ...
>> What i am doing is to query all val1 and val2 for one pid and
>> all crit values:
>>
>> select val1, val2, crit from mytable where pid='somepid' and
>> crit in(select crit from myCritTable);
>> where myCritTable is a table that contains all crit values
>> (around 42.000) ordered by their insertion date.
>>
>>
>> QUERY PLAN
>>
>> --------------------------------------------------------------
>> ------------------
>> ----------------------------------------------------------
>>  Hash IN Join  (cost=1033.67..134959.41 rows=37120 width=23)
>> (actual time=357.11
>> 6..356984.535 rows=37539 loops=1)
>>    Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
>>    ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66
>> rows=37120 width=23) (
>> actual time=291.600..356707.737 rows=37539 loops=1)
>>          Recheck Cond: ((pid)::text = '1'::text)
>>          ->  Bitmap Index Scan on idx_test2_pid
>> (cost=0.00..232.92 rows=37120 w
>> idth=0) (actual time=234.516..234.516 rows=37539 loops=1)
>>                Index Cond: ((pid)::text = '1'::text)
>>    ->  Hash  (cost=700.20..700.20 rows=40220 width=13)
>> (actual time=65.055..65.0
>> 55 rows=40220 loops=1)
>>          ->  Seq Scan on snps_test  (cost=0.00..700.20
>> rows=40220 width=13) (act
>> ual time=0.020..30.131 rows=40220 loops=1)
>>  Total runtime: 357017.259 ms
>>
>> Unfortunately the query takes pretty long for the big table,
>> so maybe one of you has a suggestion on how to make it faster.
>>
>> --
>> Christian Rengstl M.A.
>> Klinik und Poliklinik für Innere Medizin II
>> Kardiologie - Forschung
>> Universitätsklinikum Regensburg
>> B3 1.388
>> Franz-Josef-Strauss-Allee 11
>> 93053 Regensburg
>> Tel.: +49-941-944-7230
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 4: Have you searched our list archives?
>>
>>                http://archives.postgresql.org
>>


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

Re: Query performance

От
"Hakan Kocaman"
Дата:
Hi,

can you post the complete query,schema- and table-definition,server-version etc. ?
This will help to identity the main problem.

So at the moment i'm just guessing:

Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
    ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66  rows=37120 width=23)
                         (actual time=291.600..356707.737 rows=37539 loops=1)
This part is very expensive, but i got no clue why.
Maybe the text-type is not so ideal.

Best regards

Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: hakan.kocaman@digame.de



> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of
> Christian Rengstl
> Sent: Thursday, August 03, 2006 10:13 AM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Query performance
>
>
> Hi everyone,
>
> i have a table with around 57 million tuples, with the
> following columns: pid(varchar), crit(varchar),
> val1(varchar), val2(varchar). Example:
> pid    crit    val1    val2
> p1      c1      x        y
> p1      c2      x        z
> p1      c3      y        x
> ...
> What i am doing is to query all val1 and val2 for one pid and
> all crit values:
>
> select val1, val2, crit from mytable where pid='somepid' and
> crit in(select crit from myCritTable);
> where myCritTable is a table that contains all crit values
> (around 42.000) ordered by their insertion date.
>
>
> QUERY PLAN
>
> --------------------------------------------------------------
> ------------------
> ----------------------------------------------------------
>  Hash IN Join  (cost=1033.67..134959.41 rows=37120 width=23)
> (actual time=357.11
> 6..356984.535 rows=37539 loops=1)
>    Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
>    ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66
> rows=37120 width=23) (
> actual time=291.600..356707.737 rows=37539 loops=1)
>          Recheck Cond: ((pid)::text = '1'::text)
>          ->  Bitmap Index Scan on idx_test2_pid
> (cost=0.00..232.92 rows=37120 w
> idth=0) (actual time=234.516..234.516 rows=37539 loops=1)
>                Index Cond: ((pid)::text = '1'::text)
>    ->  Hash  (cost=700.20..700.20 rows=40220 width=13)
> (actual time=65.055..65.0
> 55 rows=40220 loops=1)
>          ->  Seq Scan on snps_test  (cost=0.00..700.20
> rows=40220 width=13) (act
> ual time=0.020..30.131 rows=40220 loops=1)
>  Total runtime: 357017.259 ms
>
> Unfortunately the query takes pretty long for the big table,
> so maybe one of you has a suggestion on how to make it faster.
>
> --
> Christian Rengstl M.A.
> Klinik und Poliklinik für Innere Medizin II
> Kardiologie - Forschung
> Universitätsklinikum Regensburg
> B3 1.388
> Franz-Josef-Strauss-Allee 11
> 93053 Regensburg
> Tel.: +49-941-944-7230
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

Re: Query performance

От
Richard Huxton
Дата:
Christian Rengstl wrote:
> Hi,
>
> the complete query is the one i posted, but here comes the schema for mytable:
> entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass),
>   pid varchar(15) NOT NULL,
>   crit varchar(13) NOT NULL,
>   val1 varchar(1),
>   val2 varchar(1),
>   aendat text,
>   aennam varchar(8),
>   CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)
>
> myCritTable:
>   crit varchar(13) NOT NULL,
>   chr int2,
>   aendat timestamp,
>   CONSTRAINT pk_crit_master PRIMARY KEY (crit)

Still doesn't match the EXPLAIN output - where's snp_id? Where's table
test2?

> My server is 8.1.4. As a matter of fact, i have no idea where the text
 > type comes from, because as you can see from above there are only
 > varchar with maximum 15 characters.

PG is casting it to text. There's no real difference between the types
(other than the size limit) and it's not expensive.

> "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 10:34 am:
>> Hi,
>>
>> can you post the complete query,schema- and
>> table-definition,server-version etc. ?
>> This will help to identity the main problem.
>>
>> So at the moment i'm just guessing:
>>
>> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
>>     ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66  rows=37120
>> width=23)
>>     (actual time=291.600..356707.737 rows=37539 loops=1)
>> This part is very expensive, but i got no clue why.

Yep, it looks like the "Bitmap Heap Scan" is at the heart of this. You
might want to increase work_mem, it could be that the bitmap is spilling
to disk (which is much slower than keeping it all in RAM)

http://www.postgresql.org/docs/8.1/static/runtime-config-resource.html#RUNTIME-CONFIG-RESOURCE-MEMORY

If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see what happens.

--
   Richard Huxton
   Archonet Ltd

Re: Query performance

От
"Hakan Kocaman"
Дата:
Hi,


> -----Original Message-----
> From: Richard Huxton [mailto:dev@archonet.com]
> Sent: Thursday, August 03, 2006 11:00 AM
> To: Christian Rengstl
> Cc: Hakan Kocaman; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Query performance
>
>
> Christian Rengstl wrote:
> > Hi,
> >
> > the complete query is the one i posted, but here comes the
> schema for mytable:
> > entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass),
> >   pid varchar(15) NOT NULL,
> >   crit varchar(13) NOT NULL,
> >   val1 varchar(1),
> >   val2 varchar(1),
> >   aendat text,
> >   aennam varchar(8),
> >   CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)
> >
> > myCritTable:
> >   crit varchar(13) NOT NULL,
> >   chr int2,
> >   aendat timestamp,
> >   CONSTRAINT pk_crit_master PRIMARY KEY (crit)
>
> Still doesn't match the EXPLAIN output - where's snp_id?
> Where's table
> test2?
>

Yep, that bothered me too.

> > My server is 8.1.4. As a matter of fact, i have no idea
> where the text
>  > type comes from, because as you can see from above there are only
>  > varchar with maximum 15 characters.
>
> PG is casting it to text. There's no real difference between
> the types
> (other than the size limit) and it's not expensive.

But wouldn't a comparison between int4 be much cheaper.
If i see smth like "id" (here snp_id) in a fieldname it should be a int-type, i think.

>
> > "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06
> 10:34 am:
> >> Hi,
> >>
> >> can you post the complete query,schema- and
> >> table-definition,server-version etc. ?
> >> This will help to identity the main problem.
> >>
> >> So at the moment i'm just guessing:
> >>
> >> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
> >>     ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66
>  rows=37120
> >> width=23)
> >>     (actual time=291.600..356707.737 rows=37539 loops=1)
> >> This part is very expensive, but i got no clue why.
>
> Yep, it looks like the "Bitmap Heap Scan" is at the heart of
> this. You
> might want to increase work_mem, it could be that the bitmap
> is spilling
> to disk (which is much slower than keeping it all in RAM)
>
> http://www.postgresql.org/docs/8.1/static/runtime-config-resou
rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY

If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see what happens.

--
   Richard Huxton
   Archonet Ltd



Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: hakan.kocaman@digame.de

Re: Query performance

От
"Christian Rengstl"
Дата:
Hi,

i would rather compare int4 too, but the snp_id can be something like "abc123" unfortunately.

"Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 11:08 am:
> Hi,
>
>
>> -----Original Message-----
>> From: Richard Huxton [mailto:dev@archonet.com]
>> Sent: Thursday, August 03, 2006 11:00 AM
>> To: Christian Rengstl
>> Cc: Hakan Kocaman; pgsql-general@postgresql.org
>> Subject: Re: [GENERAL] Query performance
>>
>>
>> Christian Rengstl wrote:
>> > Hi,
>> >
>> > the complete query is the one i posted, but here comes the
>> schema for mytable:
>> > entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq''::regclass),
>> >   pid varchar(15) NOT NULL,
>> >   crit varchar(13) NOT NULL,
>> >   val1 varchar(1),
>> >   val2 varchar(1),
>> >   aendat text,
>> >   aennam varchar(8),
>> >   CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)
>> >
>> > myCritTable:
>> >   crit varchar(13) NOT NULL,
>> >   chr int2,
>> >   aendat timestamp,
>> >   CONSTRAINT pk_crit_master PRIMARY KEY (crit)
>>
>> Still doesn't match the EXPLAIN output - where's snp_id?
>> Where's table
>> test2?
>>
>
> Yep, that bothered me too.
>
>> > My server is 8.1.4. As a matter of fact, i have no idea
>> where the text
>>  > type comes from, because as you can see from above there are only
>>  > varchar with maximum 15 characters.
>>
>> PG is casting it to text. There's no real difference between
>> the types
>> (other than the size limit) and it's not expensive.
>
> But wouldn't a comparison between int4 be much cheaper.
> If i see smth like "id" (here snp_id) in a fieldname it should be a
> int-type, i think.
>
>>
>> > "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06
>> 10:34 am:
>> >> Hi,
>> >>
>> >> can you post the complete query,schema- and
>> >> table-definition,server-version etc. ?
>> >> This will help to identity the main problem.
>> >>
>> >> So at the moment i'm just guessing:
>> >>
>> >> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
>> >>     ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66
>>  rows=37120
>> >> width=23)
>> >>     (actual time=291.600..356707.737 rows=37539 loops=1)
>> >> This part is very expensive, but i got no clue why.
>>
>> Yep, it looks like the "Bitmap Heap Scan" is at the heart of
>> this. You
>> might want to increase work_mem, it could be that the bitmap
>> is spilling
>> to disk (which is much slower than keeping it all in RAM)
>>
>> http://www.postgresql.org/docs/8.1/static/runtime-config-resou
> rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY
>
> If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see what happens.
>
> --
>    Richard Huxton
>    Archonet Ltd
>
>
>
> Hakan Kocaman
> Software-Development
>
> digame.de GmbH
> Richard-Byrd-Str. 4-8
> 50829 Köln
>
> Tel.: +49 (0) 221 59 68 88 31
> Fax: +49 (0) 221 59 68 88 98
> Email: hakan.kocaman@digame.de
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230


Re: Query performance

От
"Hakan Kocaman"
Дата:
Hi,

maybe you should overthink your db-design, but thats a bit premature whithout your complete
table-definitions(including table-names, datatypes, indexes, foreign-key constraints,etc.)

If your are using pgadmin3 just cut'n paste the content of the window on the bottom left for
the corresponding tables.

If you're using psql try \d yur-table-name.

Best regards

Hakan Kocaman
Software-Development

digame.de GmbH
Richard-Byrd-Str. 4-8
50829 Köln

Tel.: +49 (0) 221 59 68 88 31
Fax: +49 (0) 221 59 68 88 98
Email: hakan.kocaman@digame.de



> -----Original Message-----
> From: Christian Rengstl
> [mailto:Christian.Rengstl@klinik.uni-regensburg.de]
> Sent: Thursday, August 03, 2006 11:18 AM
> To: Richard Huxton; Hakan Kocaman
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Query performance
>
>
> Hi,
>
> i would rather compare int4 too, but the snp_id can be
> something like "abc123" unfortunately.
>
> "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 11:08 am:
> > Hi,
> >
> >
> >> -----Original Message-----
> >> From: Richard Huxton [mailto:dev@archonet.com]
> >> Sent: Thursday, August 03, 2006 11:00 AM
> >> To: Christian Rengstl
> >> Cc: Hakan Kocaman; pgsql-general@postgresql.org
> >> Subject: Re: [GENERAL] Query performance
> >>
> >>
> >> Christian Rengstl wrote:
> >> > Hi,
> >> >
> >> > the complete query is the one i posted, but here comes the
> >> schema for mytable:
> >> > entry_no int8 NOT NULL DEFAULT
> nextval('entry_no_seq''::regclass),
> >> >   pid varchar(15) NOT NULL,
> >> >   crit varchar(13) NOT NULL,
> >> >   val1 varchar(1),
> >> >   val2 varchar(1),
> >> >   aendat text,
> >> >   aennam varchar(8),
> >> >   CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)
> >> >
> >> > myCritTable:
> >> >   crit varchar(13) NOT NULL,
> >> >   chr int2,
> >> >   aendat timestamp,
> >> >   CONSTRAINT pk_crit_master PRIMARY KEY (crit)
> >>
> >> Still doesn't match the EXPLAIN output - where's snp_id?
> >> Where's table
> >> test2?
> >>
> >
> > Yep, that bothered me too.
> >
> >> > My server is 8.1.4. As a matter of fact, i have no idea
> >> where the text
> >>  > type comes from, because as you can see from above
> there are only
> >>  > varchar with maximum 15 characters.
> >>
> >> PG is casting it to text. There's no real difference between
> >> the types
> >> (other than the size limit) and it's not expensive.
> >
> > But wouldn't a comparison between int4 be much cheaper.
> > If i see smth like "id" (here snp_id) in a fieldname it should be a
> > int-type, i think.
> >
> >>
> >> > "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06
> >> 10:34 am:
> >> >> Hi,
> >> >>
> >> >> can you post the complete query,schema- and
> >> >> table-definition,server-version etc. ?
> >> >> This will help to identity the main problem.
> >> >>
> >> >> So at the moment i'm just guessing:
> >> >>
> >> >> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
> >> >>     ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66
> >>  rows=37120
> >> >> width=23)
> >> >>     (actual time=291.600..356707.737 rows=37539 loops=1)
> >> >> This part is very expensive, but i got no clue why.
> >>
> >> Yep, it looks like the "Bitmap Heap Scan" is at the heart of
> >> this. You
> >> might want to increase work_mem, it could be that the bitmap
> >> is spilling
> >> to disk (which is much slower than keeping it all in RAM)
> >>
> >> http://www.postgresql.org/docs/8.1/static/runtime-config-resou
> > rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY
> >
> > If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see
> what happens.
> >
> > --
> >    Richard Huxton
> >    Archonet Ltd
> >
> >
> >
> > Hakan Kocaman
> > Software-Development
> >
> > digame.de GmbH
> > Richard-Byrd-Str. 4-8
> > 50829 Köln
> >
> > Tel.: +49 (0) 221 59 68 88 31
> > Fax: +49 (0) 221 59 68 88 98
> > Email: hakan.kocaman@digame.de
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> >        choose an index scan if your joining column's
> datatypes do not
> >        match
>
>
> --
> Christian Rengstl M.A.
> Klinik und Poliklinik für Innere Medizin II
> Kardiologie - Forschung
> Universitätsklinikum Regensburg
> B3 1.388
> Franz-Josef-Strauss-Allee 11
> 93053 Regensburg
> Tel.: +49-941-944-7230
>
>

Re: Query performance

От
"Christian Rengstl"
Дата:
Hi,

here is the definition of the master table which is inherited by around 30 tables based on the value of chr:
CREATE TABLE snp_master
(
  entry_no int8 NOT NULL DEFAULT nextval('entry_no_seq'::regclass),
  pid varchar(15) NOT NULL,
  snp_id varchar(13) NOT NULL,
  val1 varchar(1),
  val2 varchar(1),
  chr int2 NOT NULL,
  aendat text,
  aennam varchar(8),
  CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no),
  CONSTRAINT "UNIQUE_SNP_ALLEL_MASTER" UNIQUE (pid, entry_no, snp_id)
)
WITHOUT OIDS;

The thing is that i load the data from txt files which themselves have more or less the same structure. So for every
pidof 1500 there are up to 42000 different snp_id values and for each of this combinations there are different val1 and
val2entries (all together this accounts for up to around 58 million tuples in the biggest table). MyCritTable then just
containsthe distinct snp_ids so that at least this query does not take very long any more. 
CREATE TABLE snps_master
(
  snp_id varchar(13) NOT NULL,
  chr int2,
  aendat timestamp,
  CONSTRAINT pk_snp_master PRIMARY KEY (snp_id)
)
WITHOUT OIDS;

Up to now there are no foreign key constraints, as once the data is loaded into the db there will be no update or
deleteoperations at all. I only have to export the data to different file formats for which i need the query posted
originally.

"Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 11:36 am:
> Hi,
>
> maybe you should overthink your db-design, but thats a bit premature
> whithout your complete
> table-definitions(including table-names, datatypes, indexes,
> foreign-key constraints,etc.)
>
> If your are using pgadmin3 just cut'n paste the content of the window
> on the bottom left for
> the corresponding tables.
>
> If you're using psql try \d yur-table-name.
>
> Best regards
>
> Hakan Kocaman
> Software-Development
>
> digame.de GmbH
> Richard-Byrd-Str. 4-8
> 50829 Köln
>
> Tel.: +49 (0) 221 59 68 88 31
> Fax: +49 (0) 221 59 68 88 98
> Email: hakan.kocaman@digame.de
>
>
>
>> -----Original Message-----
>> From: Christian Rengstl
>> [mailto:Christian.Rengstl@klinik.uni-regensburg.de]
>> Sent: Thursday, August 03, 2006 11:18 AM
>> To: Richard Huxton; Hakan Kocaman
>> Cc: pgsql-general@postgresql.org
>> Subject: Re: [GENERAL] Query performance
>>
>>
>> Hi,
>>
>> i would rather compare int4 too, but the snp_id can be
>> something like "abc123" unfortunately.
>>
>> "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06 11:08 am:
>> > Hi,
>> >
>> >
>> >> -----Original Message-----
>> >> From: Richard Huxton [mailto:dev@archonet.com]
>> >> Sent: Thursday, August 03, 2006 11:00 AM
>> >> To: Christian Rengstl
>> >> Cc: Hakan Kocaman; pgsql-general@postgresql.org
>> >> Subject: Re: [GENERAL] Query performance
>> >>
>> >>
>> >> Christian Rengstl wrote:
>> >> > Hi,
>> >> >
>> >> > the complete query is the one i posted, but here comes the
>> >> schema for mytable:
>> >> > entry_no int8 NOT NULL DEFAULT
>> nextval('entry_no_seq''::regclass),
>> >> >   pid varchar(15) NOT NULL,
>> >> >   crit varchar(13) NOT NULL,
>> >> >   val1 varchar(1),
>> >> >   val2 varchar(1),
>> >> >   aendat text,
>> >> >   aennam varchar(8),
>> >> >   CONSTRAINT "PK_ENTRY" PRIMARY KEY (entry_no)
>> >> >
>> >> > myCritTable:
>> >> >   crit varchar(13) NOT NULL,
>> >> >   chr int2,
>> >> >   aendat timestamp,
>> >> >   CONSTRAINT pk_crit_master PRIMARY KEY (crit)
>> >>
>> >> Still doesn't match the EXPLAIN output - where's snp_id?
>> >> Where's table
>> >> test2?
>> >>
>> >
>> > Yep, that bothered me too.
>> >
>> >> > My server is 8.1.4. As a matter of fact, i have no idea
>> >> where the text
>> >>  > type comes from, because as you can see from above
>> there are only
>> >>  > varchar with maximum 15 characters.
>> >>
>> >> PG is casting it to text. There's no real difference between
>> >> the types
>> >> (other than the size limit) and it's not expensive.
>> >
>> > But wouldn't a comparison between int4 be much cheaper.
>> > If i see smth like "id" (here snp_id) in a fieldname it should be a
>> > int-type, i think.
>> >
>> >>
>> >> > "Hakan Kocaman" <Hakan.Kocaman@digame.de> wrote on 08/03/06
>> >> 10:34 am:
>> >> >> Hi,
>> >> >>
>> >> >> can you post the complete query,schema- and
>> >> >> table-definition,server-version etc. ?
>> >> >> This will help to identity the main problem.
>> >> >>
>> >> >> So at the moment i'm just guessing:
>> >> >>
>> >> >> Hash Cond: (("outer".snp_id)::text = ("inner".snp_id)::text)
>> >> >>     ->  Bitmap Heap Scan on test2  (cost=232.92..132766.66
>> >>  rows=37120
>> >> >> width=23)
>> >> >>     (actual time=291.600..356707.737 rows=37539 loops=1)
>> >> >> This part is very expensive, but i got no clue why.
>> >>
>> >> Yep, it looks like the "Bitmap Heap Scan" is at the heart of
>> >> this. You
>> >> might want to increase work_mem, it could be that the bitmap
>> >> is spilling
>> >> to disk (which is much slower than keeping it all in RAM)
>> >>
>> >> http://www.postgresql.org/docs/8.1/static/runtime-config-resou
>> > rce.html#RUNTIME-CONFIG-RESOURCE-MEMORY
>> >
>> > If it's still at 1MB (1024), try 5MB, 10MB, 50MB and see
>> what happens.
>> >
>> > --
>> >    Richard Huxton
>> >    Archonet Ltd
>> >
>> >
>> >
>> > Hakan Kocaman
>> > Software-Development
>> >
>> > digame.de GmbH
>> > Richard-Byrd-Str. 4-8
>> > 50829 Köln
>> >
>> > Tel.: +49 (0) 221 59 68 88 31
>> > Fax: +49 (0) 221 59 68 88 98
>> > Email: hakan.kocaman@digame.de
>> >
>> > ---------------------------(end of
>> broadcast)---------------------------
>> > TIP 9: In versions below 8.0, the planner will ignore your desire to
>> >        choose an index scan if your joining column's
>> datatypes do not
>> >        match
>>
>>
>> --
>> Christian Rengstl M.A.
>> Klinik und Poliklinik für Innere Medizin II
>> Kardiologie - Forschung
>> Universitätsklinikum Regensburg
>> B3 1.388
>> Franz-Josef-Strauss-Allee 11
>> 93053 Regensburg
>> Tel.: +49-941-944-7230
>>
>>


--
Christian Rengstl M.A.
Klinik und Poliklinik für Innere Medizin II
Kardiologie - Forschung
Universitätsklinikum Regensburg
B3 1.388
Franz-Josef-Strauss-Allee 11
93053 Regensburg
Tel.: +49-941-944-7230

Re: Query performance

От
Chris Mair
Дата:
> i have a table with around 57 million tuples, with the following columns: pid(varchar), crit(varchar), val1(varchar),
val2(varchar).Example: 
> pid    crit    val1    val2
> p1      c1      x        y
> p1      c2      x        z
> p1      c3      y        x
> ...
> What i am doing is to query all val1 and val2 for one pid and all crit values:
>
> select val1, val2, crit from mytable where pid='somepid' and crit in(select crit from myCritTable);
> where myCritTable is a table that contains all crit values (around 42.000) ordered by their insertion date.

In case myCritTable doesn't change a lot and this select by contrast is
executed a lot, have you considered precomputing whether a record from
your big table has a crit value from myCritTable?

Of course this info would be invalidated each time myCritTable is
updated, so you would trade fast selects on the big table vs. slow
updates on myCritTable. Don't know wether that makes sence for you...

Bye, Chris.