Обсуждение: Pruning useless tables for queries

Поиск
Список
Период
Сортировка

Pruning useless tables for queries

От
Martijn van Oosterhout
Дата:
[Please CC any responses]

One optimisation is for the query planner to drop tables whose output do not
affect the final result (where the WHERE clauses and the CHECK constraints
prove that no rows can be returned). While this is not the case for simple
queries, when involving views and inheritance it's very easy to do.

One problem is that you cannot totally remove the table as while the data
may not be used, meta-data such as field names and types are still relevant
(except in the case of inherited tables). The simple solution would be to
introduce a dummy Limit node with zero rows.

Ideally, you could create a new node would has a RangeTable (I think that's
the right term) but produces no output. This would allow it to be detected
by join types and simplified away. It could also be used in the following
case:

kleptog=# explain select * from website where false;                         QUERY PLAN
---------------------------------------------------------------Result  (cost=0.00..1.28 rows=28 width=63)  One-Time
Filter:false  ->  Seq Scan on website  (cost=0.00..1.28 rows=28 width=63) 
(3 rows)

I believe the way to approach it would be to add a test to
path/allpaths.c:set_plain_rel_pathlist(). Get a list of all the CHECK
constraints and then use code similar to path/indxpath.c:pred_test() to
check if they are incompatible. It needs to be slightly different as
indeterminate should be considered as true rather than false as is the case
for partial indexes where the code is currently for.

Note, pred_test type code can also be used to remove extraneous clauses from
the restrictions, which may improve planning estimates.

After this it depends on how this will be represented. If it's a new node,
code will have to be added to many places to deal with this. The function
set_inherited_rel_pathlist can optimise the subtables away without effecting
the rest of the query planner. Initially it may be possible to do it only
for inherited tables.

Removing tables based on constant restrictions (always true or false) could
also be dealt with.

One issue is NULLs. In theory IS NULL nodes could be compared against the
NOT NULL status of the column, though it's not clear whether this would be
easy. From the point of view of pred_test, it fairly straight forward; if
the issue is indeterminate, just return and get the current behaviour.

Looks pretty simple. What are the chances of getting this accepted? Anyone
see any issues I missed?

Hace a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values or
> religion but rather by its superiority in applying organised violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Re: Pruning useless tables for queries

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> One optimisation is for the query planner to drop tables whose output do not
> affect the final result (where the WHERE clauses and the CHECK constraints
> prove that no rows can be returned). While this is not the case for simple
> queries, when involving views and inheritance it's very easy to do.

Under what conditions is this actually going to buy you anything?

Indexscans with self-contradictory index conditions, for example, fall
through quite quickly already (look at the scan startup logic in nbtree.c).
I'm not sure that there's any gain in having the planner duplicate that
effort.

> Ideally, you could create a new node would has a RangeTable (I think that's
> the right term) but produces no output.

We already use Result nodes with resconstantqual qualifiers to handle
gating of execution of entire subplans (see query_planner()).  It might
be worth thinking about whether that concept is useful to apply at lower
levels of a plan tree.
        regards, tom lane


Re: Pruning useless tables for queries

От
Martijn van Oosterhout
Дата:
On Wed, May 21, 2003 at 12:42:35PM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > One optimisation is for the query planner to drop tables whose output do not
> > affect the final result (where the WHERE clauses and the CHECK constraints
> > prove that no rows can be returned). While this is not the case for simple
> > queries, when involving views and inheritance it's very easy to do.
>
> Under what conditions is this actually going to buy you anything?

Well, I guess it's in situations where your CHECK constraints are used to
indicate range restrictions in subtables, for example partitioning. In fact,
the principle users of this would be for cases where inheritance pulls in
lots of tables that are actually quite separate. If you can use restrictions
to reduce the number of tables I think that's worth it.

In fact, I'd be happy if it *only* applied to inheritence trees. That would
simplify the implementation and not affect the rest of the planner at all.
The issue of optimising constant restrictions would be separate.

> Index-scans with self-contradictory index conditions, for example, fall
> through quite quickly already (look at the scan startup logic in nbtree.c).
> I'm not sure that there's any gain in having the planner duplicate that
> effort.

That only applies in cases where an index-scan is used. If the restriction is
"WHERE FALSE" then the planner produces a sequential scan. Other than a
small amount of code to do with partial indexes, there doesn't appear to be
a lot of work to simplify restrictions in cases where the result is always
true or false.

> > Ideally, you could create a new node would has a RangeTable (I think that's
> > the right term) but produces no output.
>
> We already use Result nodes with resconstantqual qualifiers to handle
> gating of execution of entire subplans (see query_planner()).  It might
> be worth thinking about whether that concept is useful to apply at lower
> levels of a plan tree.

Ok, that does make things easier. It obviously wouldn't be needed for
inheritance trees, unless the entire tree was pruned, which seems so
unlikely it's not worth bothering about. But I do think that using it cases
where the restrictions are obviously self-contradictory would be beneficial.

Thanks for the help,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values or
> religion but rather by its superiority in applying organised violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Re: Pruning useless tables for queries

От
"Zeugswetter Andreas SB SD"
Дата:
> > One optimisation is for the query planner to drop tables whose output do not
> > affect the final result (where the WHERE clauses and the CHECK constraints
> > prove that no rows can be returned). While this is not the case for simple
> > queries, when involving views and inheritance it's very easy to do.
>
> Under what conditions is this actually going to buy you anything?
>
> Indexscans with self-contradictory index conditions, for example, fall
> through quite quickly already (look at the scan startup logic in nbtree.c).
> I'm not sure that there's any gain in having the planner duplicate that
> effort.

It covers cases that have check constraints on columns without an index, or
that have an index, but another index is chosen by the planner because it
looks cheaper. Maybe part of the startup logic in nbtree.c can be moved to
the planner, so it covers more cases and is not duplicated ?

There is a question whether this is useful for normal table access, since
most applications won't query on values that are not allowed. But for views
(especially union all) and inheritance it would imho certainly be very valuable.

Andreas