Обсуждение: Use index to estimate expression selectivity
Dear hackers, After discussing the issue on irc, it looks like it could be possible for the planner to use a partial index matching an expression exactly to estimate its selectivity. Here is a simplified version (thanks ysch) of the issue I am facing: https://dbfiddle.uk/flPq8-pj I have tried using CREATE STATISTICS as well but haven't found a way to improve the planner estimation for that query. I have worked around the problem for my specific use case but that behavior is counter-intuitive, is there any interest in improving it? Thank you!
Bono Stebler <bono.stebler@gmail.com> writes: > After discussing the issue on irc, it looks like it could be possible > for the planner to use a partial index matching an expression exactly to > estimate its selectivity. I think going forward we're going to be more interested in extending CREATE STATISTICS than in adding special behaviors around indexes. An index is a pretty expensive thing to maintain if you really only want some statistics. Contrariwise, if you need the index for functional reasons (perhaps to enforce some strange uniqueness constraint) but you don't like some decision the planner takes because of the existence of that index, you're kind of stuck. So decoupling this stuff makes more sense from where I sit. Having said that ... > Here is a simplified version (thanks ysch) of the issue I am facing: > https://dbfiddle.uk/flPq8-pj > I have tried using CREATE STATISTICS as well but haven't found a way to > improve the planner estimation for that query. I assume what you did was try to make stats on "synchronized_at IS DISTINCT FROM updated_at"? Yeah, it does not surprise me that we fail to match that to this query. The trouble with expression statistics (and expression indexes) is that it's impractical to match every subexpression of the query to every subexpression that might be presented by CREATE STATISTICS: you soon get into exponential behavior. So there's a limited set of contexts where we look for a match. I experimented a bit and found that if you do have statistics on that, then "WHERE (synchronized_at IS DISTINCT FROM updated_at) IS TRUE" will consult the stats. Might do as a hacky workaround. regards, tom lane
On 11/23/23 18:30, Tom Lane wrote: > Bono Stebler <bono.stebler@gmail.com> writes: >> After discussing the issue on irc, it looks like it could be possible >> for the planner to use a partial index matching an expression exactly to >> estimate its selectivity. > > I think going forward we're going to be more interested in extending > CREATE STATISTICS than in adding special behaviors around indexes. > An index is a pretty expensive thing to maintain if you really only > want some statistics. Contrariwise, if you need the index for > functional reasons (perhaps to enforce some strange uniqueness > constraint) but you don't like some decision the planner takes because > of the existence of that index, you're kind of stuck. So decoupling > this stuff makes more sense from where I sit. > I agree adding indexes if you only really want the statistics part would be rather expensive, but I do think using indexes created for functional reasons as a source of statistics is worth consideration. Actually, I've been experimenting with using btree indexes to estimate certain conditions (e.g. the simplest var=const), and the estimates calculated from internal pages is quite useful. Maybe not as the primary estimate, but to set "safe" range for non-MCV items. For example if the traditional estimate says 1 row matches, but we see there's ~100 leaf pages for that key, maybe we should bump up the estimate ... But yeah, it may affect the query planning in undesirable ways. Perhaps we could have "use for statistics" reloption or something ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company