Обсуждение: optimizing select ... not in (select ...)
I have this query : select distinct on (Pictures.PictureID) * from Pictures where Pictures.PictureID not in (select distinct PictureID from Keywords); and I find it a bit slow. Does anybody have suggestions to run this faster ? (I have indexes on PictureID on both Pictures and Keywords) -- Laurent Martelli laurent@aopsys.com http://www.bearteam.org/~laurent/
Laurent, > select distinct on (Pictures.PictureID) * from Pictures where > Pictures.PictureID not in (select distinct PictureID from Keywords); > > and I find it a bit slow. Does anybody have suggestions to run this > faster ? (I have indexes on PictureID on both Pictures and Keywords) NOT IN is always slow on all RDBMS that I know, unless the subselect has a very small (<100) return set. This is because the engine has to compare each row in the master query against every value returned by NOT IN, one row at a time. Try the WHERE NOT EXISTS construction instead: SELECT * FROM Pictures WHERE NOT EXISTS ( SELECT pictureID FROM keywords WHERE keywords.pictureID = Pictures.pictureID ); This uses the DB engine's JOIN functionality and thus runs considerably faster. BTW, all those "DISTINCT" in the query example you gave, assuming that PictureID is the unique index of Pictures, are completely superfluous and will only slow the query down. Particularly the use of DISTINCT in a subquery should only be used if the contents of the subquery will be displayed as part of the result set. -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Вложения
Laurent Martelli <laurent@aopsys.com> writes: > I have this query : > > select distinct on (Pictures.PictureID) * from Pictures where Pictures.PictureID not in (select distinct PictureID fromKeywords); > > and I find it a bit slow. Does anybody have suggestions to run this > faster ? (I have indexes on PictureID on both Pictures and Keywords) Try this instead: select distinct on (Pictures.PictureID) *from Pictureswhere not exists (select distinct PictureID from Keywords where Pictures.PictureID=Keywords.PictureID); I've found that it's usually faster, probably because it can use indices better. Tomas
Can it be redone using NOT EXISTS? > I have this query : > > select distinct on (Pictures.PictureID) * from Pictures where Pictures.PictureID not in (select distinct PictureID fromKeywords); > > and I find it a bit slow. Does anybody have suggestions to run this > faster ? (I have indexes on PictureID on both Pictures and Keywords) > > -- > Laurent Martelli > laurent@aopsys.com http://www.bearteam.org/~laurent/ > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
>>>>> "Tomas" == Tomas Berndtsson <tomas@nocrew.org> writes: Tomas> Laurent Martelli <laurent@aopsys.com> writes: >> I have this query : >> >> select distinct on (Pictures.PictureID)* from Pictures where >> Pictures.PictureID not in (select distinct PictureID from >> Keywords); >> >> and I find it a bit slow. Does anybody have suggestions to run >> this faster ? (I have indexes on PictureID on both Picturesand >> Keywords) Tomas> Try this instead: Tomas> select distinct on (Pictures.PictureID) * from Pictures where Tomas> not exists (select distinct PictureID from Keywordswhere Tomas> Pictures.PictureID=Keywords.PictureID); Tomas> I've found that it's usually faster, probably because it can Tomas> use indices better. It is indeed much faster. Thanks to all of you who answered so fast. -- Laurent Martelli laurent@aopsys.com http://www.bearteam.org/~laurent/