Обсуждение: More Performance Questions
Hi. Speed-wise, is there a signifficant performance difference between doing complex queries in the following forms? Form 1: ( SELECT Master.* FROM Master, MasterFTI WHERE MasterFTI.ID = Master.ID AND MasterFTI.String = 'string1' ) UNION ( SELECT Master.* FROM Master, MasterFTI WHERE MasterFTI.ID = Master.ID AND MasterFTI.String = 'string2' ) ... UNION ...; Form 2: SELECT DISTINCT Master.* FROM Master, MasterFTI WHERE MasterFTI.ID = Master.ID AND ( MasterFTI.String = 'string1' OR MasterFTI.String = 'string2' OR ... ); The reason am asking is because I don't know how the back end splits and executes these queries. Are the UNION/INTERSECT/EXCEPT queries each executed separately in sequence? Or does the optimizer do some magic and transform them in a more efficient way that doesn't require multiple passes? And is the overhead of running multiple UNION queries greater than the overhead of doing a DISTINCT? I need to sort the records anyway, so the fact that DISTINCT does a SORT is a bonus in this case. In an extreme case my dynamically constructed queries (from a CGI) can have as many as 50 terms in them, which using the UNION method, equates to 50 queries being run (if that is the way it all gets executed). Is there likely to be a sizeable improvement in using the other method? The reason I am asking before trying is because I'd like to avoid re-writing my custom->SQL parser again. I was hoping that someone with a bit more background knowledge into how PostgreSQL works could shed some light on it... Regards. Gordan
On Wednesday 07 Nov 2001 18:13, Tom Lane wrote: > Gordan Bobic <gordan@bobich.net> writes: > > And is the overhead of running multiple UNION queries greater than the > > overhead of doing a DISTINCT? I need to sort the records anyway, so the > > fact that DISTINCT does a SORT is a bonus in this case. > > UNION implies DISTINCT, so you're going to get sort and uniq steps in > either case. Yes, but I thought that if I have 50 UNION queries, that would do a sort + uniq for each "append" between them, whereas in the distinc case it only gets done once, albeit on a bigger data set. > What this is really going to boil down to is how the > restriction and join steps are done, and you haven't given enough info > to speculate about that. Well, what I said is pretty much it. It's the case of either doing single FTI term search per query and doing UNION (for OR search) or INTERSECT (for AND search) of multiple queries. If the search is executed in this way, and each UNION segment is executed in sequence, then that means N queries, where N is the number of search terms. In the SELECT DISTINCT case where multiple terms are ORed in the WHERE clause, it is vaguely concievable that the entire query (at least in the UNION case) could be executed in a single pass. Is that the how it works? Or is each OR term located in a separate pass? What I'm really trying to figure out is if there is an advantage (in theory at least) in doing one slightly more complex query, or lots of simpler ones. > Try some experimentation with EXPLAIN to see > what kinds of plans you get. Well, all the fields that are searched on are indexed, and for testing I usually enable_seqscan=off. What I am going to do is re-write my parser/SQL generator and give it a go - with a bit of luck, there will be a noticeable difference in performance. Thanks. Gordan
Gordan Bobic <gordan@bobich.net> writes: > And is the overhead of running multiple UNION queries greater than the > overhead of doing a DISTINCT? I need to sort the records anyway, so the fact > that DISTINCT does a SORT is a bonus in this case. UNION implies DISTINCT, so you're going to get sort and uniq steps in either case. What this is really going to boil down to is how the restriction and join steps are done, and you haven't given enough info to speculate about that. Try some experimentation with EXPLAIN to see what kinds of plans you get. regards, tom lane
I don't know about weather this would be faster or not, but you can have all your criteria in one where segment using IN, i.e.: WHERE Value IN ('Criteria1', 'Criteria2', etc...). It may also have some benefit in being shorter. Peter Darley -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Gordan Bobic Sent: Wednesday, November 07, 2001 10:26 AM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] More Performance Questions On Wednesday 07 Nov 2001 18:13, Tom Lane wrote: > Gordan Bobic <gordan@bobich.net> writes: > > And is the overhead of running multiple UNION queries greater than the > > overhead of doing a DISTINCT? I need to sort the records anyway, so the > > fact that DISTINCT does a SORT is a bonus in this case. > > UNION implies DISTINCT, so you're going to get sort and uniq steps in > either case. Yes, but I thought that if I have 50 UNION queries, that would do a sort + uniq for each "append" between them, whereas in the distinc case it only gets done once, albeit on a bigger data set. > What this is really going to boil down to is how the > restriction and join steps are done, and you haven't given enough info > to speculate about that. Well, what I said is pretty much it. It's the case of either doing single FTI term search per query and doing UNION (for OR search) or INTERSECT (for AND search) of multiple queries. If the search is executed in this way, and each UNION segment is executed in sequence, then that means N queries, where N is the number of search terms. In the SELECT DISTINCT case where multiple terms are ORed in the WHERE clause, it is vaguely concievable that the entire query (at least in the UNION case) could be executed in a single pass. Is that the how it works? Or is each OR term located in a separate pass? What I'm really trying to figure out is if there is an advantage (in theory at least) in doing one slightly more complex query, or lots of simpler ones. > Try some experimentation with EXPLAIN to see > what kinds of plans you get. Well, all the fields that are searched on are indexed, and for testing I usually enable_seqscan=off. What I am going to do is re-write my parser/SQL generator and give it a go - with a bit of luck, there will be a noticeable difference in performance. Thanks. Gordan ---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html