any hope for my big query?

Поиск
Список
Период
Сортировка
От Ben
Тема any hope for my big query?
Дата
Msg-id Pine.LNX.4.64.0609281505360.21293@GRD.cube42.tai.silentmedia.com
обсуждение исходный текст
Ответы Re: any hope for my big query?  ("Jim C. Nasby" <jim@nasby.net>)
Re: any hope for my big query?  (Edoardo Ceccarelli <eddy@axa.it>)
Re: any hope for my big query?  (Simon Riggs <simon@2ndquadrant.com>)
Re: any hope for my big query?  ("Jim C. Nasby" <jim@nasby.net>)
Re: any hope for my big query?  (Shaun Thomas <sthomas@leapfrogonline.com>)
Список pgsql-performance
Hey guys, I've got a query that is inherently expensive, because it has to
do some joins against some large tables. But it's currently *very*
expensive (at least for a web app), and I've been struggling in vain all
day to knock the cost down. Annoyingly, the least costly version I've come
up with remains my first attempt, and is the most straight-forward:

explain select
     distinct public.album.id
from
     public.album,public.albumjoin,public.track,umdb.node
where
     node.dir=2811
     and albumjoin.album = public.album.id
     and public.albumjoin.track = public.track.id
     and levenshtein(substring(public.track.name for 75),
         substring(node.file for 75)) <= 10
     and public.album.id in
         (select album from albumjoin group by album having count(*) between 15 and 25)
group by public.album.id
having count(*) >= 5;


  Unique  (cost=991430.53..1013711.74 rows=425772 width=4)
    ->  GroupAggregate  (cost=991430.53..1012647.31 rows=425772 width=4)
          Filter: (count(*) >= 5)
          ->  Sort  (cost=991430.53..996373.93 rows=1977360 width=4)
                Sort Key: album.id
                ->  Nested Loop  (cost=513549.06..737866.68 rows=1977360 width=4)
                      Join Filter: (levenshtein("substring"(("inner".name)::text, 1, 75), "substring"("outer".file, 1,
75))<= 10) 
                      ->  Index Scan using node_dir on node  (cost=0.00..3.22 rows=16 width=40)
                            Index Cond: (dir = 2811)
                      ->  Materialize  (cost=513549.06..520153.61 rows=370755 width=25)
                            ->  Hash Join  (cost=271464.72..510281.31 rows=370755 width=25)
                                  Hash Cond: ("outer".id = "inner".track)
                                  ->  Seq Scan on track  (cost=0.00..127872.69 rows=5111469 width=25)
                                  ->  Hash  (cost=268726.83..268726.83 rows=370755 width=8)
                                        ->  Hash Join  (cost=150840.51..268726.83 rows=370755 width=8)
                                              Hash Cond: ("outer".album = "inner".id)
                                              ->  Seq Scan on albumjoin  (cost=0.00..88642.18 rows=5107318 width=8)
                                              ->  Hash  (cost=150763.24..150763.24 rows=30908 width=8)
                                                    ->  Hash Join  (cost=127951.57..150763.24 rows=30908 width=8)
                                                          Hash Cond: ("outer".id = "inner".album)
                                                          ->  Seq Scan on album  (cost=0.00..12922.72 rows=425772
width=4)
                                                          ->  Hash  (cost=127874.30..127874.30 rows=30908 width=4)
                                                                ->  HashAggregate  (cost=126947.06..127565.22
rows=30908width=4) 
                                                                      Filter: ((count(*) >= 15) AND (count(*) <= 25))
                                                                      ->  Seq Scan on albumjoin  (cost=0.00..88642.18
rows=5107318width=4) 


I've tried adding a length(public.track.name) index and filtering
public.track to those rows where length(name) is within a few characters
of node.file, but that actually makes the plan more expensive.

Is there any hope to make things much cheaper? Unfortunately, I can't
filter out anything from the album or albumjoin tables.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: archive wal's failure and load increase.
Следующее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: Performace Optimization for Dummies