Обсуждение: query performance

Поиск
Список
Период
Сортировка

query performance

От
pepone.onrez
Дата:
I have this query in a table with 150 thowsand tuples and it takes to long

    t_documentcontent._id AS _id
    FROM t_documentcontent LIMIT 50 OFFSET 80000

here is the explain output

"Limit  (cost= 100058762.30..100058799.02 rows=50 width=58) (actual time=19433.474..19433.680 rows=50 loops=1)"
"  ->  Seq Scan on t_documentcontent  (cost=100000000.00..100110772.07 rows=150807 width=58) (actual time=53.934..19402.030 rows=80050 loops=1)"
"Total runtime: 19433.748 ms"

here the table structure

CREATE TABLE t_documentcontent(
    _id varchar(60) NOT NULL,
    _filesystem varchar(60) NOT NULL,
    _parent varchar(60) NOT NULL,
    _document varchar(60) NOT NULL,
    _title varchar NOT NULL,
    _resume varchar,
    _content varchar,
    _lang integer NOT NULL,
    _creationdate timestamp NOT NULL DEFAULT now(),
    _updatedate timestamp NOT NULL DEFAULT now(),
    _indexeddate timestamp NOT NULL DEFAULT now(),
    CONSTRAINT documentcontent_pkey PRIMARY KEY (_id),
    CONSTRAINT documentcontent_filesystem_fkey
        FOREIGN KEY (_filesystem) REFERENCES t_filesystem(_id) ON UPDATE CASCADE ON DELETE NO ACTION,
    CONSTRAINT documentcontent_parent_fkey FOREIGN KEY (_parent) REFERENCES t_node(_id) ON DELETE NO ACTION,
    CONSTRAINT documentcontent_document_fkey
        FOREIGN KEY (_document) REFERENCES t_node(_id) ON UPDATE CASCADE ON DELETE NO ACTION,
    CONSTRAINT documentcontent_lang_lang_fkey FOREIGN KEY (_lang) REFERENCES t_lang(_id)) WITHOUT OIDS;

Any ideas for improve this query performance.

Re: query performance

От
"Alex Turner"
Дата:
If you haven't already, make sure you've done a vacuum full recently.  When in doubt, pg_dump the db, and reload it, and see if that helps, but this works for me:

create table overview as select _id from t_documentcontent;
alter table overview add constraint overview_pkey primary key (_id);

select * from t_documentcontent where _id in (select _id
    FROM overview LIMIT 50 OFFSET 80000);

create a trigger on insert/delete/update to keep this table in sync, or if you don't need to then just re-run the create every so often depending on your needs (I'll be happy to demonstrate the required triggers if you need it).

make sure that you have adequate RAM available for file cache, hitting the disk everytime you query will suck no matter what you do.

Alex

On Jan 13, 2008 9:58 PM, pepone. onrez < pepone.onrez@gmail.com> wrote:
I have this query in a table with 150 thowsand tuples and it takes to long

    t_documentcontent._id AS _id
    FROM t_documentcontent LIMIT 50 OFFSET 80000

here is the explain output

"Limit  (cost= 100058762.30..100058799.02 rows=50 width=58) (actual time=19433.474..19433.680 rows=50 loops=1)"
"  ->  Seq Scan on t_documentcontent  (cost=100000000.00..100110772.07 rows=150807 width=58) (actual time=53.934..19402.030 rows=80050 loops=1)"
"Total runtime: 19433.748 ms"

here the table structure

CREATE TABLE t_documentcontent(
    _id varchar(60) NOT NULL,
    _filesystem varchar(60) NOT NULL,
    _parent varchar(60) NOT NULL,
    _document varchar(60) NOT NULL,
    _title varchar NOT NULL,
    _resume varchar,
    _content varchar,
    _lang integer NOT NULL,
    _creationdate timestamp NOT NULL DEFAULT now(),
    _updatedate timestamp NOT NULL DEFAULT now(),
    _indexeddate timestamp NOT NULL DEFAULT now(),
    CONSTRAINT documentcontent_pkey PRIMARY KEY (_id),
    CONSTRAINT documentcontent_filesystem_fkey
        FOREIGN KEY (_filesystem) REFERENCES t_filesystem(_id) ON UPDATE CASCADE ON DELETE NO ACTION,
    CONSTRAINT documentcontent_parent_fkey FOREIGN KEY (_parent) REFERENCES t_node(_id) ON DELETE NO ACTION,
    CONSTRAINT documentcontent_document_fkey
        FOREIGN KEY (_document) REFERENCES t_node(_id) ON UPDATE CASCADE ON DELETE NO ACTION,
    CONSTRAINT documentcontent_lang_lang_fkey FOREIGN KEY (_lang) REFERENCES t_lang(_id)) WITHOUT OIDS;

Any ideas for improve this query performance.

Re: query performance

От
"Scott Marlowe"
Дата:
On Jan 13, 2008 8:58 PM, pepone. onrez <pepone.onrez@gmail.com> wrote:
> I have this query in a table with 150 thowsand tuples and it takes to long
>
>     t_documentcontent._id AS _id
>     FROM t_documentcontent LIMIT 50 OFFSET 80000
>
> here is the explain output
>
> "Limit  (cost= 100058762.30..100058799.02 rows=50 width=58) (actual
> time=19433.474..19433.680 rows=50 loops=1)"
> "  ->  Seq Scan on t_documentcontent  (cost=100000000.00..100110772.07
> rows=150807 width=58) (actual time=53.934..19402.030 rows=80050 loops=1)"
> "Total runtime: 19433.748 ms"

looks like you've set enable_seqscan=off.  When looking for help on
queries it's a good idea to mention such things...

with no order by, and possibly no index on t_documentcontent._id,
there's no choice but a seq scan.

try adding both.

Re: query performance

От
Tom Lane
Дата:
"Scott Marlowe" <scott.marlowe@gmail.com> writes:
> On Jan 13, 2008 8:58 PM, pepone. onrez <pepone.onrez@gmail.com> wrote:
>> t_documentcontent._id AS _id
>> FROM t_documentcontent LIMIT 50 OFFSET 80000

> with no order by, and possibly no index on t_documentcontent._id,
> there's no choice but a seq scan.

More to the point: a query with a huge OFFSET is *always* going to suck,
because there is no choice but to read through all those records before
getting to the ones you want.  You need to fundamentally rethink how you
are going about this.  I'm assuming that this is actually just one query
in a series that are intended to eventually fetch the whole table.

One solution is to set up a cursor and FETCH 50 rows at a time from it.
However that requires holding a transaction open, which might not work
well in your environment.

Another possibility, if you have a primary key on the table, is to do
something like

    SELECT ... FROM ... WHERE id > ? ORDER BY id LIMIT 50

where you leave out the WHERE clause on the first call, and on
subsequent calls '?' is the last id value seen in the prior call.

            regards, tom lane

Re: query performance

От
pepone.onrez
Дата:
Sorry Alex i forget mention that i have setscan of in my last test.

now I have set seqscan on  and indexscan on and added order by _id 

The table has an index in the _id field 

CREATE INDEX i_documentcontent_document
  ON t_documentcontent
  USING btree
  (_document);

The database was rencently vacum analyze , but not vacun full

here is the explain of 2 diferent queries , when i put a large OFFSET

 EXPLAIN ANALYZE SELECT

    t_documentcontent._id AS _id
    FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 50000

"Limit  (cost=137068.24..137068.36 rows=50 width=58) (actual time=41119.702..41119.792 rows=50 loops=1)"
"  ->  Sort  (cost=136943.24..137320.26 rows=150807 width=58) (actual time=41064.802..41100.424 rows=50050 loops=1)"
"        Sort Key: _id"
"        ->  Seq Scan on t_documentcontent  (cost= 0.00..110772.07 rows=150807 width=58) (actual time=106.679..33267.194 rows=150807 loops=1)"
"Total runtime: 41120.015 ms"

 EXPLAIN ANALYZE SELECT

    t_documentcontent._id AS _id
    FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 10000

"Limit  (cost=39839.37..40038.56 rows=50 width=58) (actual time=1172.969..1194.228 rows=50 loops=1)"
"  ->  Index Scan using i_documentcontent_id on t_documentcontent  (cost=0.00..600805.54 rows=150807 width=58) (actual time= 0.077..1189.688 rows=10050 loops=1)"
"Total runtime: 1194.316 ms"

Tom
 i using uuid for the _id field that is the primary key  add a WHERE id > ?  don 't apply
the cursor aproach is also not suitable for same of my queries

I use this query for paginate contents of a filesysstem with lots of documents avoid offset is not posible always

Re: query performance

От
"Alex Turner"
Дата:
If you have to access the data this way (with no where clause at all - which sometimes you do) then I have already provided a solution that will work reasonably well.  If you create what is essentially a materialized view of just the id field, the sequence scan will return much fewer pages than when you do it on the main table.  Then you join it to the indexed main table, and page in just the rows you need.  Voila - much faster result.  Of course we haven't really talked about how that will affect insert speed and delete speed if you trigger then up, but you haven't really talked about any requirements there.

Alex

On Jan 13, 2008 11:27 PM, pepone. onrez <pepone.onrez@gmail.com> wrote:
Sorry Alex i forget mention that i have setscan of in my last test.

now I have set seqscan on  and indexscan on and added order by _id 

The table has an index in the _id field 

CREATE INDEX i_documentcontent_document
  ON t_documentcontent
  USING btree
  (_document);

The database was rencently vacum analyze , but not vacun full

here is the explain of 2 diferent queries , when i put a large OFFSET

 EXPLAIN ANALYZE SELECT

    t_documentcontent._id AS _id
    FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 50000

"Limit  (cost=137068.24..137068.36 rows=50 width=58) (actual time=41119.702..41119.792 rows=50 loops=1)"
"  ->  Sort  (cost=136943.24..137320.26 rows=150807 width=58) (actual time=41064.802..41100.424 rows=50050 loops=1)"
"        Sort Key: _id"
"        ->  Seq Scan on t_documentcontent  (cost= 0.00..110772.07 rows=150807 width=58) (actual time=106.679..33267.194 rows=150807 loops=1)"
"Total runtime: 41120.015 ms"

 EXPLAIN ANALYZE SELECT

    t_documentcontent._id AS _id
    FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 10000

"Limit  (cost=39839.37..40038.56 rows=50 width=58) (actual time=1172.969..1194.228 rows=50 loops=1)"
"  ->  Index Scan using i_documentcontent_id on t_documentcontent  (cost=0.00..600805.54 rows=150807 width=58) (actual time= 0.077..1189.688 rows=10050 loops=1)"
"Total runtime: 1194.316 ms"

Tom
 i using uuid for the _id field that is the primary key  add a WHERE id > ?  don 't apply
the cursor aproach is also not suitable for same of my queries

I use this query for paginate contents of a filesysstem with lots of documents avoid offset is not posible always

Re: query performance

От
"Alex Turner"
Дата:
Here is a table I threw together to demonstrate the approximate speed of a materialized view in this case:

trend=# explain analyze select property_id from overview order by property_id limit 50 offset 50000;
                                                         QUERY PLAN            
----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=19112.75..19112.88 rows=50 width=8) (actual time=446.048..446.125 rows=50 loops=1)
   ->  Sort  (cost=18987.75..19400.49 rows=165094 width=8) (actual time=384.788..424.433 rows=50050 loops=1)
         Sort Key: property_id
         ->  Seq Scan on overview  (cost=0.00..2501.94 rows=165094 width=8) (actual time= 0.012..88.691 rows=173409 loops=1)
 Total runtime: 447.578 ms
(5 rows)

trend=# select count(*) from overview;
 count
--------
 173409
(1 row)

trend=#

It's not great - but it's better than 47 seconds (The machine I'm running it on is far from big iron, so these results should be fairly typical for any modern x86 box - also this materialized view is almost certainly in RAM, and therefore IO speed is irrelevant).

Tom lane has already suggested another approach, whereby you order your results, then select the next 10 from the set where the id is greater than the greatest of the last one:

select id from overview order by id limit 50;

x = get row['id']  // for row 1
do something
x=get row['id']  // for row 2
do something
...
x=get row['id']  // for row 50

select id from overview where id>x order by id limit 50.

The order by is relevant and infact imperative, because you must order your results somehow, otherwise your pagination will produce different results each time you try it as database updates will affect the order the rows come back by default without an order by clause.

Let me say that again to be clear:  The order rows come back if you don't specify an order by can change!  so pulling rows without an order by is a REALLY bad idea.  This will break your pagination if a database update happens between someone viewing a page and hitting next to view the next page.

Alex

On Jan 13, 2008 11:43 PM, Alex Turner <armtuk@gmail.com> wrote:
If you have to access the data this way (with no where clause at all - which sometimes you do) then I have already provided a solution that will work reasonably well.  If you create what is essentially a materialized view of just the id field, the sequence scan will return much fewer pages than when you do it on the main table.  Then you join it to the indexed main table, and page in just the rows you need.  Voila - much faster result.  Of course we haven't really talked about how that will affect insert speed and delete speed if you trigger then up, but you haven't really talked about any requirements there.

Alex


On Jan 13, 2008 11:27 PM, pepone. onrez <pepone.onrez@gmail.com > wrote:
Sorry Alex i forget mention that i have setscan of in my last test.

now I have set seqscan on  and indexscan on and added order by _id 

The table has an index in the _id field 

CREATE INDEX i_documentcontent_document
  ON t_documentcontent
  USING btree
  (_document);

The database was rencently vacum analyze , but not vacun full

here is the explain of 2 diferent queries , when i put a large OFFSET

 EXPLAIN ANALYZE SELECT

    t_documentcontent._id AS _id
    FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 50000

"Limit  (cost=137068.24..137068.36 rows=50 width=58) (actual time=41119.702..41119.792 rows=50 loops=1)"
"  ->  Sort  (cost=136943.24..137320.26 rows=150807 width=58) (actual time=41064.802..41100.424 rows=50050 loops=1)"
"        Sort Key: _id"
"        ->  Seq Scan on t_documentcontent  (cost= 0.00..110772.07 rows=150807 width=58) (actual time=106.679..33267.194 rows=150807 loops=1)"
"Total runtime: 41120.015 ms"

 EXPLAIN ANALYZE SELECT

    t_documentcontent._id AS _id
    FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 10000

"Limit  (cost=39839.37..40038.56 rows=50 width=58) (actual time=1172.969..1194.228 rows=50 loops=1)"
"  ->  Index Scan using i_documentcontent_id on t_documentcontent  (cost=0.00..600805.54 rows=150807 width=58) (actual time= 0.077..1189.688 rows=10050 loops=1)"
"Total runtime: 1194.316 ms"

Tom
 i using uuid for the _id field that is the primary key  add a WHERE id > ?  don 't apply
the cursor aproach is also not suitable for same of my queries

I use this query for paginate contents of a filesysstem with lots of documents avoid offset is not posible always


Re: query performance

От
pepone.onrez
Дата:

Thanks Alex

I test your solution and is realy more faster.

Nested Loop  (cost=1743.31..2044.58 rows=50 width=908) (actual time=101.695..106.178 rows=50 loops=1)
  ->  HashAggregate  (cost=1743.31..1743.31 rows=50 width=108) (actual time=101.509..101.567 rows=50 loops=1)
        ->  Subquery Scan "IN_subquery"  (cost=1741.60..1743.19 rows=50 width=108) (actual time=101.327..101.456 rows=50 loops=1)
              ->  Limit  (cost=1741.60..1742.69 rows=50 width=108) (actual time=101.313..101.383 rows=50 loops=1)
                    ->  Seq Scan on overview  (cost=0.00..3283.07 rows=150807 width=108) (actual time=0.036..72.249 rows=80050 loops=1)
  ->  Index Scan using i_documentcontent_id on t_documentcontent  (cost=0.00..6.01 rows=1 width=908) (actual time=0.083..0.085 rows=1 loops=50)
        Index Cond: ((t_documentcontent._id)::text = ("outer"._id)::text)
Total runtime: 106.323 ms

I now need to see what trigers i need to add, and test the insertions. Thanks again

On Jan 14, 2008 5:54 AM, Alex Turner <armtuk@gmail.com> wrote:
Here is a table I threw together to demonstrate the approximate speed of a materialized view in this case:

trend=# explain analyze select property_id from overview order by property_id limit 50 offset 50000;
                                                         QUERY PLAN            
----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=19112.75..19112.88 rows=50 width=8) (actual time=446.048..446.125 rows=50 loops=1)
   ->  Sort  (cost=18987.75..19400.49 rows=165094 width=8) (actual time=384.788..424.433 rows=50050 loops=1)
         Sort Key: property_id
         ->  Seq Scan on overview  (cost=0.00..2501.94 rows=165094 width=8) (actual time= 0.012..88.691 rows=173409 loops=1)
 Total runtime: 447.578 ms
(5 rows)

trend=# select count(*) from overview;
 count
--------
 173409
(1 row)

trend=#

It's not great - but it's better than 47 seconds (The machine I'm running it on is far from big iron, so these results should be fairly typical for any modern x86 box - also this materialized view is almost certainly in RAM, and therefore IO speed is irrelevant).

Tom lane has already suggested another approach, whereby you order your results, then select the next 10 from the set where the id is greater than the greatest of the last one:

select id from overview order by id limit 50;

x = get row['id']  // for row 1
do something
x=get row['id']  // for row 2
do something
...
x=get row['id']  // for row 50

select id from overview where id>x order by id limit 50.

The order by is relevant and infact imperative, because you must order your results somehow, otherwise your pagination will produce different results each time you try it as database updates will affect the order the rows come back by default without an order by clause.

Let me say that again to be clear:  The order rows come back if you don't specify an order by can change!  so pulling rows without an order by is a REALLY bad idea.  This will break your pagination if a database update happens between someone viewing a page and hitting next to view the next page.

Alex


On Jan 13, 2008 11:43 PM, Alex Turner <armtuk@gmail.com > wrote:
If you have to access the data this way (with no where clause at all - which sometimes you do) then I have already provided a solution that will work reasonably well.  If you create what is essentially a materialized view of just the id field, the sequence scan will return much fewer pages than when you do it on the main table.  Then you join it to the indexed main table, and page in just the rows you need.  Voila - much faster result.  Of course we haven't really talked about how that will affect insert speed and delete speed if you trigger then up, but you haven't really talked about any requirements there.

Alex


On Jan 13, 2008 11:27 PM, pepone. onrez <pepone.onrez@gmail.com > wrote:
Sorry Alex i forget mention that i have setscan of in my last test.

now I have set seqscan on  and indexscan on and added order by _id 

The table has an index in the _id field 

CREATE INDEX i_documentcontent_document
  ON t_documentcontent
  USING btree
  (_document);

The database was rencently vacum analyze , but not vacun full

here is the explain of 2 diferent queries , when i put a large OFFSET

 EXPLAIN ANALYZE SELECT

    t_documentcontent._id AS _id
    FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 50000

"Limit  (cost=137068.24..137068.36 rows=50 width=58) (actual time=41119.702..41119.792 rows=50 loops=1)"
"  ->  Sort  (cost=136943.24..137320.26 rows=150807 width=58) (actual time=41064.802..41100.424 rows=50050 loops=1)"
"        Sort Key: _id"
"        ->  Seq Scan on t_documentcontent  (cost= 0.00..110772.07 rows=150807 width=58) (actual time=106.679..33267.194 rows=150807 loops=1)"
"Total runtime: 41120.015 ms"

 EXPLAIN ANALYZE SELECT

    t_documentcontent._id AS _id
    FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 10000

"Limit  (cost=39839.37..40038.56 rows=50 width=58) (actual time=1172.969..1194.228 rows=50 loops=1)"
"  ->  Index Scan using i_documentcontent_id on t_documentcontent  (cost=0.00..600805.54 rows=150807 width=58) (actual time= 0.077..1189.688 rows=10050 loops=1)"
"Total runtime: 1194.316 ms"

Tom
 i using uuid for the _id field that is the primary key  add a WHERE id > ?  don 't apply
the cursor aproach is also not suitable for same of my queries

I use this query for paginate contents of a filesysstem with lots of documents avoid offset is not posible always



Re: query performance

От
"Alex Turner"
Дата:
If you combine it with Tom Lane's suggestion - it will go even better, something like:

select * from t_documentcontent where _id in (select _id from overview where _id>x order by _id limit 50);

Alex

On Jan 13, 2008 11:59 PM, pepone. onrez <pepone.onrez@gmail.com> wrote:

Thanks Alex

I test your solution and is realy more faster.

Nested Loop  (cost=1743.31..2044.58 rows=50 width=908) (actual time=101.695..106.178 rows=50 loops=1)
  ->  HashAggregate  (cost=1743.31..1743.31 rows=50 width=108) (actual time=101.509..101.567 rows=50 loops=1)
        ->  Subquery Scan "IN_subquery"  (cost=1741.60..1743.19 rows=50 width=108) (actual time=101.327..101.456 rows=50 loops=1)
              ->  Limit  (cost=1741.60..1742.69 rows=50 width=108) (actual time=101.313..101.383 rows=50 loops=1)
                    ->  Seq Scan on overview  (cost=0.00..3283.07 rows=150807 width=108) (actual time=0.036..72.249 rows=80050 loops=1)
  ->  Index Scan using i_documentcontent_id on t_documentcontent  (cost=0.00..6.01 rows=1 width=908) (actual time=0.083..0.085 rows=1 loops=50)
        Index Cond: ((t_documentcontent._id)::text = ("outer"._id)::text)
Total runtime: 106.323 ms

I now need to see what trigers i need to add, and test the insertions. Thanks again


On Jan 14, 2008 5:54 AM, Alex Turner < armtuk@gmail.com> wrote:
Here is a table I threw together to demonstrate the approximate speed of a materialized view in this case:

trend=# explain analyze select property_id from overview order by property_id limit 50 offset 50000;
                                                         QUERY PLAN            
----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=19112.75..19112.88 rows=50 width=8) (actual time=446.048..446.125 rows=50 loops=1)
   ->  Sort  (cost=18987.75..19400.49 rows=165094 width=8) (actual time=384.788..424.433 rows=50050 loops=1)
         Sort Key: property_id
         ->  Seq Scan on overview  (cost=0.00..2501.94 rows=165094 width=8) (actual time= 0.012..88.691 rows=173409 loops=1)
 Total runtime: 447.578 ms
(5 rows)

trend=# select count(*) from overview;
 count
--------
 173409
(1 row)

trend=#

It's not great - but it's better than 47 seconds (The machine I'm running it on is far from big iron, so these results should be fairly typical for any modern x86 box - also this materialized view is almost certainly in RAM, and therefore IO speed is irrelevant).

Tom lane has already suggested another approach, whereby you order your results, then select the next 10 from the set where the id is greater than the greatest of the last one:

select id from overview order by id limit 50;

x = get row['id']  // for row 1
do something
x=get row['id']  // for row 2
do something
...
x=get row['id']  // for row 50

select id from overview where id>x order by id limit 50.

The order by is relevant and infact imperative, because you must order your results somehow, otherwise your pagination will produce different results each time you try it as database updates will affect the order the rows come back by default without an order by clause.

Let me say that again to be clear:  The order rows come back if you don't specify an order by can change!  so pulling rows without an order by is a REALLY bad idea.  This will break your pagination if a database update happens between someone viewing a page and hitting next to view the next page.

Alex


On Jan 13, 2008 11:43 PM, Alex Turner <armtuk@gmail.com > wrote:
If you have to access the data this way (with no where clause at all - which sometimes you do) then I have already provided a solution that will work reasonably well.  If you create what is essentially a materialized view of just the id field, the sequence scan will return much fewer pages than when you do it on the main table.  Then you join it to the indexed main table, and page in just the rows you need.  Voila - much faster result.  Of course we haven't really talked about how that will affect insert speed and delete speed if you trigger then up, but you haven't really talked about any requirements there.

Alex


On Jan 13, 2008 11:27 PM, pepone. onrez <pepone.onrez@gmail.com > wrote:
Sorry Alex i forget mention that i have setscan of in my last test.

now I have set seqscan on  and indexscan on and added order by _id 

The table has an index in the _id field 

CREATE INDEX i_documentcontent_document
  ON t_documentcontent
  USING btree
  (_document);

The database was rencently vacum analyze , but not vacun full

here is the explain of 2 diferent queries , when i put a large OFFSET

 EXPLAIN ANALYZE SELECT

    t_documentcontent._id AS _id
    FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 50000

"Limit  (cost=137068.24..137068.36 rows=50 width=58) (actual time=41119.702..41119.792 rows=50 loops=1)"
"  ->  Sort  (cost=136943.24..137320.26 rows=150807 width=58) (actual time=41064.802..41100.424 rows=50050 loops=1)"
"        Sort Key: _id"
"        ->  Seq Scan on t_documentcontent  (cost= 0.00..110772.07 rows=150807 width=58) (actual time=106.679..33267.194 rows=150807 loops=1)"
"Total runtime: 41120.015 ms"

 EXPLAIN ANALYZE SELECT

    t_documentcontent._id AS _id
    FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 10000

"Limit  (cost=39839.37..40038.56 rows=50 width=58) (actual time=1172.969..1194.228 rows=50 loops=1)"
"  ->  Index Scan using i_documentcontent_id on t_documentcontent  (cost=0.00..600805.54 rows=150807 width=58) (actual time= 0.077..1189.688 rows=10050 loops=1)"
"Total runtime: 1194.316 ms"

Tom
 i using uuid for the _id field that is the primary key  add a WHERE id > ?  don 't apply
the cursor aproach is also not suitable for same of my queries

I use this query for paginate contents of a filesysstem with lots of documents avoid offset is not posible always




Re: query performance

От
"Alex Turner"
Дата:
Oh - if you do this then make sure that you have the primary key index on overview too.

Alex

On Jan 14, 2008 12:53 AM, Alex Turner <armtuk@gmail.com > wrote:
If you combine it with Tom Lane's suggestion - it will go even better, something like:

select * from t_documentcontent where _id in (select _id from overview where _id>x order by _id limit 50);

Alex


On Jan 13, 2008 11:59 PM, pepone. onrez <pepone.onrez@gmail.com> wrote:

Thanks Alex

I test your solution and is realy more faster.

Nested Loop  (cost=1743.31..2044.58 rows=50 width=908) (actual time=101.695..106.178 rows=50 loops=1)
  ->  HashAggregate  (cost=1743.31..1743.31 rows=50 width=108) (actual time=101.509..101.567 rows=50 loops=1)
        ->  Subquery Scan "IN_subquery"  (cost=1741.60..1743.19 rows=50 width=108) (actual time=101.327..101.456 rows=50 loops=1)
              ->  Limit  (cost=1741.60..1742.69 rows=50 width=108) (actual time=101.313..101.383 rows=50 loops=1)
                    ->  Seq Scan on overview  (cost=0.00..3283.07 rows=150807 width=108) (actual time=0.036..72.249 rows=80050 loops=1)
  ->  Index Scan using i_documentcontent_id on t_documentcontent  (cost=0.00..6.01 rows=1 width=908) (actual time=0.083..0.085 rows=1 loops=50)
        Index Cond: ((t_documentcontent._id)::text = ("outer"._id)::text)
Total runtime: 106.323 ms

I now need to see what trigers i need to add, and test the insertions. Thanks again


On Jan 14, 2008 5:54 AM, Alex Turner < armtuk@gmail.com> wrote:
Here is a table I threw together to demonstrate the approximate speed of a materialized view in this case:

trend=# explain analyze select property_id from overview order by property_id limit 50 offset 50000;
                                                         QUERY PLAN            
----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=19112.75..19112.88 rows=50 width=8) (actual time=446.048..446.125 rows=50 loops=1)
   ->  Sort  (cost=18987.75..19400.49 rows=165094 width=8) (actual time=384.788..424.433 rows=50050 loops=1)
         Sort Key: property_id
         ->  Seq Scan on overview  (cost=0.00..2501.94 rows=165094 width=8) (actual time= 0.012..88.691 rows=173409 loops=1)
 Total runtime: 447.578 ms
(5 rows)

trend=# select count(*) from overview;
 count
--------
 173409
(1 row)

trend=#

It's not great - but it's better than 47 seconds (The machine I'm running it on is far from big iron, so these results should be fairly typical for any modern x86 box - also this materialized view is almost certainly in RAM, and therefore IO speed is irrelevant).

Tom lane has already suggested another approach, whereby you order your results, then select the next 10 from the set where the id is greater than the greatest of the last one:

select id from overview order by id limit 50;

x = get row['id']  // for row 1
do something
x=get row['id']  // for row 2
do something
...
x=get row['id']  // for row 50

select id from overview where id>x order by id limit 50.

The order by is relevant and infact imperative, because you must order your results somehow, otherwise your pagination will produce different results each time you try it as database updates will affect the order the rows come back by default without an order by clause.

Let me say that again to be clear:  The order rows come back if you don't specify an order by can change!  so pulling rows without an order by is a REALLY bad idea.  This will break your pagination if a database update happens between someone viewing a page and hitting next to view the next page.

Alex


On Jan 13, 2008 11:43 PM, Alex Turner <armtuk@gmail.com > wrote:
If you have to access the data this way (with no where clause at all - which sometimes you do) then I have already provided a solution that will work reasonably well.  If you create what is essentially a materialized view of just the id field, the sequence scan will return much fewer pages than when you do it on the main table.  Then you join it to the indexed main table, and page in just the rows you need.  Voila - much faster result.  Of course we haven't really talked about how that will affect insert speed and delete speed if you trigger then up, but you haven't really talked about any requirements there.

Alex


On Jan 13, 2008 11:27 PM, pepone. onrez <pepone.onrez@gmail.com > wrote:
Sorry Alex i forget mention that i have setscan of in my last test.

now I have set seqscan on  and indexscan on and added order by _id 

The table has an index in the _id field 

CREATE INDEX i_documentcontent_document
  ON t_documentcontent
  USING btree
  (_document);

The database was rencently vacum analyze , but not vacun full

here is the explain of 2 diferent queries , when i put a large OFFSET

 EXPLAIN ANALYZE SELECT

    t_documentcontent._id AS _id
    FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 50000

"Limit  (cost=137068.24..137068.36 rows=50 width=58) (actual time=41119.702..41119.792 rows=50 loops=1)"
"  ->  Sort  (cost=136943.24..137320.26 rows=150807 width=58) (actual time=41064.802..41100.424 rows=50050 loops=1)"
"        Sort Key: _id"
"        ->  Seq Scan on t_documentcontent  (cost= 0.00..110772.07 rows=150807 width=58) (actual time=106.679..33267.194 rows=150807 loops=1)"
"Total runtime: 41120.015 ms"

 EXPLAIN ANALYZE SELECT

    t_documentcontent._id AS _id
    FROM t_documentcontent ORDER BY _id LIMIT 50 OFFSET 10000

"Limit  (cost=39839.37..40038.56 rows=50 width=58) (actual time=1172.969..1194.228 rows=50 loops=1)"
"  ->  Index Scan using i_documentcontent_id on t_documentcontent  (cost=0.00..600805.54 rows=150807 width=58) (actual time= 0.077..1189.688 rows=10050 loops=1)"
"Total runtime: 1194.316 ms"

Tom
 i using uuid for the _id field that is the primary key  add a WHERE id > ?  don 't apply
the cursor aproach is also not suitable for same of my queries

I use this query for paginate contents of a filesysstem with lots of documents avoid offset is not posible always





Re: query performance

От
Alban Hertroys
Дата:
On Jan 14, 2008, at 3:58 AM, pepone.onrez wrote:

> I have this query in a table with 150 thowsand tuples and it takes
> to long
>
>     t_documentcontent._id AS _id
>     FROM t_documentcontent LIMIT 50 OFFSET 80000


You want an ORDER BY there. Not only will it probably speed things
up, without it there's no guaranteed order in the results returned.

As table records have no specific order and updates and inserts on
that table take the first free position, you risk ending up showing
some records twice in your set (same id on different 'pages') and
missing others because you've already skipped past them when there's
concurrent access.

There's still some risk for that if you order, but as you seem to
order on a sequence-generated column, updates aren't a problem for
you and inserts end up at the end anyway. But only if you order them.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,478f1e139491365710960!