Обсуждение: Pruning useless tables for queries
[Please CC any responses] One optimisation is for the query planner to drop tables whose output do not affect the final result (where the WHERE clauses and the CHECK constraints prove that no rows can be returned). While this is not the case for simple queries, when involving views and inheritance it's very easy to do. One problem is that you cannot totally remove the table as while the data may not be used, meta-data such as field names and types are still relevant (except in the case of inherited tables). The simple solution would be to introduce a dummy Limit node with zero rows. Ideally, you could create a new node would has a RangeTable (I think that's the right term) but produces no output. This would allow it to be detected by join types and simplified away. It could also be used in the following case: kleptog=# explain select * from website where false; QUERY PLAN ---------------------------------------------------------------Result (cost=0.00..1.28 rows=28 width=63) One-Time Filter:false -> Seq Scan on website (cost=0.00..1.28 rows=28 width=63) (3 rows) I believe the way to approach it would be to add a test to path/allpaths.c:set_plain_rel_pathlist(). Get a list of all the CHECK constraints and then use code similar to path/indxpath.c:pred_test() to check if they are incompatible. It needs to be slightly different as indeterminate should be considered as true rather than false as is the case for partial indexes where the code is currently for. Note, pred_test type code can also be used to remove extraneous clauses from the restrictions, which may improve planning estimates. After this it depends on how this will be represented. If it's a new node, code will have to be added to many places to deal with this. The function set_inherited_rel_pathlist can optimise the subtables away without effecting the rest of the query planner. Initially it may be possible to do it only for inherited tables. Removing tables based on constant restrictions (always true or false) could also be dealt with. One issue is NULLs. In theory IS NULL nodes could be compared against the NOT NULL status of the column, though it's not clear whether this would be easy. From the point of view of pred_test, it fairly straight forward; if the issue is indeterminate, just return and get the current behaviour. Looks pretty simple. What are the chances of getting this accepted? Anyone see any issues I missed? Hace a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "the West won the world not by the superiority of its ideas or values or > religion but rather by its superiority in applying organised violence. > Westerners often forget this fact, non-Westerners never do." > - Samuel P. Huntington
Martijn van Oosterhout <kleptog@svana.org> writes: > One optimisation is for the query planner to drop tables whose output do not > affect the final result (where the WHERE clauses and the CHECK constraints > prove that no rows can be returned). While this is not the case for simple > queries, when involving views and inheritance it's very easy to do. Under what conditions is this actually going to buy you anything? Indexscans with self-contradictory index conditions, for example, fall through quite quickly already (look at the scan startup logic in nbtree.c). I'm not sure that there's any gain in having the planner duplicate that effort. > Ideally, you could create a new node would has a RangeTable (I think that's > the right term) but produces no output. We already use Result nodes with resconstantqual qualifiers to handle gating of execution of entire subplans (see query_planner()). It might be worth thinking about whether that concept is useful to apply at lower levels of a plan tree. regards, tom lane
On Wed, May 21, 2003 at 12:42:35PM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > One optimisation is for the query planner to drop tables whose output do not > > affect the final result (where the WHERE clauses and the CHECK constraints > > prove that no rows can be returned). While this is not the case for simple > > queries, when involving views and inheritance it's very easy to do. > > Under what conditions is this actually going to buy you anything? Well, I guess it's in situations where your CHECK constraints are used to indicate range restrictions in subtables, for example partitioning. In fact, the principle users of this would be for cases where inheritance pulls in lots of tables that are actually quite separate. If you can use restrictions to reduce the number of tables I think that's worth it. In fact, I'd be happy if it *only* applied to inheritence trees. That would simplify the implementation and not affect the rest of the planner at all. The issue of optimising constant restrictions would be separate. > Index-scans with self-contradictory index conditions, for example, fall > through quite quickly already (look at the scan startup logic in nbtree.c). > I'm not sure that there's any gain in having the planner duplicate that > effort. That only applies in cases where an index-scan is used. If the restriction is "WHERE FALSE" then the planner produces a sequential scan. Other than a small amount of code to do with partial indexes, there doesn't appear to be a lot of work to simplify restrictions in cases where the result is always true or false. > > Ideally, you could create a new node would has a RangeTable (I think that's > > the right term) but produces no output. > > We already use Result nodes with resconstantqual qualifiers to handle > gating of execution of entire subplans (see query_planner()). It might > be worth thinking about whether that concept is useful to apply at lower > levels of a plan tree. Ok, that does make things easier. It obviously wouldn't be needed for inheritance trees, unless the entire tree was pruned, which seems so unlikely it's not worth bothering about. But I do think that using it cases where the restrictions are obviously self-contradictory would be beneficial. Thanks for the help, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > "the West won the world not by the superiority of its ideas or values or > religion but rather by its superiority in applying organised violence. > Westerners often forget this fact, non-Westerners never do." > - Samuel P. Huntington
> > One optimisation is for the query planner to drop tables whose output do not > > affect the final result (where the WHERE clauses and the CHECK constraints > > prove that no rows can be returned). While this is not the case for simple > > queries, when involving views and inheritance it's very easy to do. > > Under what conditions is this actually going to buy you anything? > > Indexscans with self-contradictory index conditions, for example, fall > through quite quickly already (look at the scan startup logic in nbtree.c). > I'm not sure that there's any gain in having the planner duplicate that > effort. It covers cases that have check constraints on columns without an index, or that have an index, but another index is chosen by the planner because it looks cheaper. Maybe part of the startup logic in nbtree.c can be moved to the planner, so it covers more cases and is not duplicated ? There is a question whether this is useful for normal table access, since most applications won't query on values that are not allowed. But for views (especially union all) and inheritance it would imho certainly be very valuable. Andreas