Обсуждение: [HACKERS] More optimization effort?
Currently following query does not use an index: t-ishii@localhost: psql -p 5433 test Pager usage is off. psql (9.6.3) Type "help" for help. test=# explain select * from pgbench_accounts where aid*100 < 10000; QUERY PLAN ------------------------------------------------------------------------Seq Scan on pgbench_accounts (cost=0.00..3319.00rows=33333 width=97) Filter: ((aid * 100) < 10000) (2 rows) While following one does use the index. test=# explain select * from pgbench_accounts where aid < 10000/100; QUERY PLAN --------------------------------------------------------------------------------------------------Index Scan using pgbench_accounts_pkeyon pgbench_accounts (cost=0.29..11.08 rows=102 width=97) Index Cond: (aid < 100) (2 rows) Is it worth to make our optimizer a little bit smarter to convert the the first query into the second form? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On 21 July 2017 at 07:11, Tatsuo Ishii <ishii@sraoss.co.jp> wrote:
Currently following query does not use an index:
t-ishii@localhost: psql -p 5433 test
Pager usage is off.
psql (9.6.3)
Type "help" for help.
test=# explain select * from pgbench_accounts where aid*100 < 10000;
QUERY PLAN
------------------------------------------------------------ ------------
Seq Scan on pgbench_accounts (cost=0.00..3319.00 rows=33333 width=97)
Filter: ((aid * 100) < 10000)
(2 rows)
While following one does use the index.
test=# explain select * from pgbench_accounts where aid < 10000/100;
QUERY PLAN
------------------------------------------------------------ ------------------------------ --------
Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.29..11.08 rows=102 width=97)
Index Cond: (aid < 100)
(2 rows)
Is it worth to make our optimizer a little bit smarter to convert the
the first query into the second form?
If I understand correctly, you're proposing that the optimiser should attempt algebraic simplification to fold more constants, rather than stopping pre-evaluation constant expressions as soon as we see a non-constant like we do now. Right?
I'm sure there are documented algorithms out there for algebraic manipulations like that, taking account of precedence etc. But will they be cheap enough to run in the optimiser? And likely to benefit many queries?
Tatsuo Ishii <ishii@sraoss.co.jp> writes: > Currently following query does not use an index: > test=# explain select * from pgbench_accounts where aid*100 < 10000; > While following one does use the index. > test=# explain select * from pgbench_accounts where aid < 10000/100; > Is it worth to make our optimizer a little bit smarter to convert the > the first query into the second form? That's a rather ambitious definition of "little bit" smarter. For starters, I'm not sure those queries are even fully equivalent w.r.t. issues like overflow and roundoff. While a person might decide that the transformation is OK anyway, I'm not sure that the optimizer should be allowed to take such liberties. But the bigger picture is that doing something that helps to any useful extent would require a really substantial amount of new, datatype- and operator-specific knowledge that doesn't exist in the system today. And as Craig noted, applying that knowledge would be expensive, even in cases where it failed to help. So, color me skeptical ... regards, tom lane
On Fri, Jul 21, 2017 at 10:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > But the bigger picture is that doing something that helps to any > useful extent would require a really substantial amount of new, > datatype- and operator-specific knowledge that doesn't exist in the > system today. And as Craig noted, applying that knowledge would > be expensive, even in cases where it failed to help. > > So, color me skeptical ... I agree, but with a caveat. If somebody felt like doing all of that work, and either made it cheap enough to justify enabling it by default or added a controlling GUC, it'd be fine with me. We've talked before about having knobs to adjust how hard the optimizer tries to optimize things, and this would be a good candidate for such a thing. The bigger issue from my perspective is that I really doubt that anybody wants to put the effort into doing something like this in a principled way. Another very similar (but possibly easier) case is: select * from pgbench_accounts where aid = 1.0; This will use a sequential scan rather than an index scan, because the query optimizer doesn't know that the only integer for which =(int4, numeric) will return true is 1. Therefore it has to scan the whole table one row at a time and check, for each one, whether the = operator returns true. It can't cast the constant to an integer because the user might have written 1.1 rather than 1.0, in which case the cast would fail; but the query should return 0 rows, not ERROR. You can imagine fixing this by having some kind of datatype-specific knowledge that would replace "aid = 1.0" with "aid = 1" and "aid = 1.1" with "false"; it would also have to know that "aid = 9999999999" should be changed to "false" because 9999999999 isn't representable as int4. I have, however, decided not to volunteer to be the one who works on that project. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Another very similar (but possibly easier) case is: > select * from pgbench_accounts where aid = 1.0; > This will use a sequential scan rather than an index scan, because the > query optimizer doesn't know that the only integer for which =(int4, > numeric) will return true is 1. Therefore it has to scan the whole > table one row at a time and check, for each one, whether the = > operator returns true. It can't cast the constant to an integer > because the user might have written 1.1 rather than 1.0, in which case > the cast would fail; but the query should return 0 rows, not ERROR. > You can imagine fixing this by having some kind of datatype-specific > knowledge that would replace "aid = 1.0" with "aid = 1" and "aid = > 1.1" with "false"; it would also have to know that "aid = 9999999999" > should be changed to "false" because 9999999999 isn't representable as > int4. Couple thoughts about that: * Another conceivable route to a solution is to add "int = numeric" and friends to the btree opclasses for integers. I'm not sure if there is any fundamental reason we've not done that (it's possible it would fall foul of the requirements about transitivity, not sure). However, this would only fix things to the extent of allowing an index scan to occur; it wouldn't help in non-indexed cases, nor would it know anything about simplifying say "aid = 1.1" to "false". * The already-existing protransform infrastructure could be used to address this type of problem; that is, you could imagine attaching a transform function to numeric_eq that would look for cases like "integer::numeric = nonintegralconstant" and simplify them accordingly. When that feature went in, there was talk of using transforms to simplify e.g. "variable + 0" or "variable * 1", but nobody's got round to anything of the sort yet. * On the other hand, protransform doesn't offer any easy way to apply similar optimizations to a bunch of different functions/operators. For instance, if your goal is to simplify "variable + 0", there are a depressingly large number of "+" operators to write transform functions for. Maybe there's no way around duplicate coding for that, but it looks tedious and bulky. > I have, however, decided not to volunteer to be the one who works on > that project. Me either. Any one of these things would require a *lot* of work in order to have a coherent feature that provided useful behavior across a bunch of different datatypes. regards, tom lane
On 21 July 2017 at 20:00, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I have, however, decided not to volunteer to be the one who works on >> that project. > > Me either. Any one of these things would require a *lot* of work in > order to have a coherent feature that provided useful behavior across > a bunch of different datatypes. I had in the past idly thought about whether it would be possible to link in one of the various general purpose theorem proving libraries and use it to simplify the expressions. But I really have no idea how much work it would be to teach one about all the properties and constraints of our existing data types and operators or for that matter how easy it would be to figure out what theorems we want proven to be able to use an index. -- greg