Re: Query planner question

Поиск
Список
Период
Сортировка
От Alban Hertroys
Тема Re: Query planner question
Дата
Msg-id 0C77865F-BB43-44C6-AFDC-E52BD46B82B3@gmail.com
обсуждение исходный текст
Ответ на Re: Query planner question  (Soni M <diptatapa@gmail.com>)
Ответы Re: Query planner question  (Soni M <diptatapa@gmail.com>)
Список pgsql-general
On 23 Aug 2014, at 4:34, Soni M <diptatapa@gmail.com> wrote:
> On Fri, Aug 22, 2014 at 9:10 PM, Alban Hertroys <haramrae@gmail.com> wrote:
> On 22 August 2014 14:26, Soni M <diptatapa@gmail.com> wrote:
> > Currently we have only latest_transmission_id as FK, described here :
> > TABLE "ticket" CONSTRAINT "fkcbe86b0c6ddac9e" FOREIGN KEY
> > (latest_transmission_id) REFERENCES transmission_base(transmission_id)
> >
> > Change the query to include only FK still result the same:
> > explain select t.ticket_id ,
> > tb.transmission_id
> > from ticket t,
> > transmission_base tb
> > where t.latest_transmission_id = tb.transmission_id
> > and tb.parse_date > ('2014-07-31');
> >                                                             QUERY PLAN
> >
----------------------------------------------------------------------------------------------------------------------------------
> >  Hash Join  (cost=113928.06..2583606.96 rows=200338 width=8)
> >    Hash Cond: (t.latest_transmission_id = tb.transmission_id)
> >    ->  Seq Scan on ticket t  (cost=0.00..1767767.26 rows=69990826 width=8)
> >    ->  Hash  (cost=108923.38..108923.38 rows=400374 width=4)
> >          ->  Index Scan using transmission_base_by_parse_date on
> > transmission_base tb  (cost=0.00..108923.38 rows=400374 width=4)
> >                Index Cond: (parse_date > '2014-07-31 00:00:00'::timestamp
> > without time zone)
> > (6 rows)
>
> Do you have an index on ticket (latest_transmission_id)?
>
> Yes, both t.latest_transmission_id and tb.transmission_id is indexed.
>
> Indexes:
>     "transmission_base_pkey" PRIMARY KEY, btree (transmission_id) CLUSTER
> Indexes:
>     "ticket_by_latest_transmission" btree (latest_transmission_id)

Okay, so we got those indexes. So much for the low-hanging fruit.

From the above plan we learn that the database estimates[1] that 400k rows from transmission match your condition
(parse_date> '2014-07-31’). The ticket table has a foreign key to that table, which suggests a 1:n relationship. It
alsohas around 70M rows, or at least the database seems to think that about that amount will match those 400k
transmissions.

That means that if on average 175 (=70M/400k) ticket ID’s match a transmission ID, the database would be needing all
those70M rows anyway - and even if it only needs every 175th row, a sequential scan is not a particularly inefficient
wayto go about this. 
The alternative is a whole lot of index lookups, probably not in the same order as either the index or the rows on
disk,meaning quite a bit of random disk I/O. 

I’m suspecting that the cost estimates for this query with seq-scans disabled aren’t very different, provided doing so
comesup with a comparable plan. 

Things you might want to verify/try:
* Are those estimated numbers of rows accurate? If not, is autovacuum (or scheduled vacuum) keeping up with the amount
ofdata churn on these tables? Do you collect a sufficiently large sample for the statistics? 
* How much bloat is in these tables/indexes?
* Did you change planner settings (such as disabling bitmap scans; I kind of expected one here) or did you change cost
estimateparameters? 
* Does it help to put an index on transmission (parse_date, transmission_id)?
* If none of that helps, we’re going to need the output of explain analyze - that will probably take long to create, so
youmight as well start with that and do the other stuff at the side. 

What kind of hardware are these disks on? Is it possible that disk I/O on this particular machine is relatively slow
(relativeto the seq/random cost factor for disk access as specified in your postgresql.conf)? 

Cheers,

Alban Hertroys

[1] You did not provide explain analyse output, so we only have estimates to work with.
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



В списке pgsql-general по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: Query planner question
Следующее
От: Anderson Valadares
Дата:
Сообщение: AIX GCC