Re: feeding big script to psql
От | Peter Wilson |
---|---|
Тема | Re: feeding big script to psql |
Дата | |
Msg-id | dcr68o$29n5$1@news.hub.org обсуждение исходный текст |
Ответ на | Re: feeding big script to psql (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-general |
Tom Lane wrote: > Peter Wilson <petew@yellowhawk.co.uk> writes: >> Tom Lane wrote: >>>> Oh? Could you provide a test case for this? I can certainly believe >>>> that the planner might choose a bad plan if it has no statistics, but >>>> it shouldn't take a long time to do it. > >> On investigation the problems occurs on 'EXPLAIN ANALYZE' - which is >> what pgadminIII does when you press the explain button. > > Ah. Well, this is an ideal example of why you need statistics --- > without 'em, the planner is more or less flying blind about the number > of matching rows. The original plan had > >> -> Index Scan using ca_pk on contact_att subb (cost=0.00..6.01 rows=1 width=8) (actual time=0.207..234.423rows=3 loops=2791) >> Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text)) >> Filter: ((contact_id)::numeric = 3854.000000) > > while your "after a vacuum" (I suppose really a vacuum analyze) plan has > >> -> Index Scan using ca_pk on contact_att subb (cost=0.00..1433.95 rows=78 width=8) (actual time=0.367..259.617rows=3 loops=1) >> Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text)) >> Filter: ((contact_id)::numeric = 3854.000000) > > This is the identical scan plan ... but now that the planner realizes > it's going to be pretty expensive, it arranges the join in a way that > requires only one scan of contact_att and not 2791 of 'em. > > The key point here is that the index condition on instance/client_id > is not selective --- it'll pull out a lot of rows. All but 3 of 'em are > then discarded by the contact_id condition, but the damage in terms > of runtime was already done. With stats, the planner can realize this > --- without stats, it has no chance. > > Looking at your table definition, I suppose you were expecting the > contact_id condition to be used with the index, but since contact_id is > bigint, comparing it to a numeric-type constant is not considered indexable. > You want to lose the ".000000" in the query. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend > Thanks for that Tom - especially the bit about removing the .00000 from the numbers. I'm pretty new to some of this database stuff - even newer at trying to optimise queries and 'think like the planner'. Never occurred to me the number format would have that effect. Removing the zeroes actaully knocked a few ms of the execution times in real-life querries :-) Just out of interest - is there an opportunity for the planner to realise the sub-select is basically invariant for the outer-query and execute once, regardless of stats. Seems like the loop-invariant optimisation in a 'C' compiler. If you have to do something once v. doing it 2791 times then I'd plop for the once! Thanks again Tom, much appreciated for that little nugget Pete -- Peter Wilson. YellowHawk Ltd, http://www.yellowhawk.co.uk
В списке pgsql-general по дате отправления: