Обсуждение: Bad query plan when the wrong data type is used
This query:
select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
from variation_item_sellingsite_asin visa
inner join product p on p.id = visa.product_id
inner join variation_item vi on vi.id = visa.variation_item_id
where visa.id =4
runs in 43 msec. The "visa.id" column has int4 datatype. The query plan uses an index condition:
"Nested Loop (cost=0.00..26.19 rows=1 width=28)"
" -> Nested Loop (cost=0.00..17.75 rows=1 width=24)"
" -> Index Scan using variation_item_sellingsite_asin_pkey on variation_item_sellingsite_asin visa (cost=0.00..8.58 rows=1 width=16)"
" Index Cond: (id = 4)"
" -> Index Scan using pk_product_id on product p (cost=0.00..9.16 rows=1 width=16)"
" Index Cond: (p.id = visa.product_id)"
" -> Index Scan using pk_variation_item_id on variation_item vi (cost=0.00..8.43 rows=1 width=12)"
" Index Cond: (vi.id = visa.variation_item_id)"
This query:
select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
from variation_item_sellingsite_asin visa
inner join product p on p.id = visa.product_id
inner join variation_item vi on vi.id = visa.variation_item_id
where visa.id =4.0
Runs for 1144 msec! Query plan uses seq scan + filter:
"Nested Loop (cost=33957.27..226162.68 rows=14374 width=28)"
" -> Hash Join (cost=33957.27..106190.76 rows=14374 width=20)"
" Hash Cond: (visa.variation_item_id = vi.id)"
" -> Seq Scan on variation_item_sellingsite_asin visa (cost=0.00..71928.04 rows=14374 width=16)"
" Filter: ((id)::numeric = 4.0)"
" -> Hash (cost=22026.01..22026.01 rows=954501 width=12)"
" -> Seq Scan on variation_item vi (cost=0.00..22026.01 rows=954501 width=12)"
" -> Index Scan using pk_product_id on product p (cost=0.00..8.33 rows=1 width=16)"
" Index Cond: (p.id = visa.product_id)"
Which is silly. I think that PostgreSQL converts the int side to a float, and then compares them.
It would be better to do this, for each item in the loop:
(I spent an hour figuring out what is wrong with my program. In some cases it was slow, in other cases it was really fast, and I never got an error message.)
What do you think?
Laszlo
select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
from variation_item_sellingsite_asin visa
inner join product p on p.id = visa.product_id
inner join variation_item vi on vi.id = visa.variation_item_id
where visa.id =4
runs in 43 msec. The "visa.id" column has int4 datatype. The query plan uses an index condition:
"Nested Loop (cost=0.00..26.19 rows=1 width=28)"
" -> Nested Loop (cost=0.00..17.75 rows=1 width=24)"
" -> Index Scan using variation_item_sellingsite_asin_pkey on variation_item_sellingsite_asin visa (cost=0.00..8.58 rows=1 width=16)"
" Index Cond: (id = 4)"
" -> Index Scan using pk_product_id on product p (cost=0.00..9.16 rows=1 width=16)"
" Index Cond: (p.id = visa.product_id)"
" -> Index Scan using pk_variation_item_id on variation_item vi (cost=0.00..8.43 rows=1 width=12)"
" Index Cond: (vi.id = visa.variation_item_id)"
This query:
select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
from variation_item_sellingsite_asin visa
inner join product p on p.id = visa.product_id
inner join variation_item vi on vi.id = visa.variation_item_id
where visa.id =4.0
Runs for 1144 msec! Query plan uses seq scan + filter:
"Nested Loop (cost=33957.27..226162.68 rows=14374 width=28)"
" -> Hash Join (cost=33957.27..106190.76 rows=14374 width=20)"
" Hash Cond: (visa.variation_item_id = vi.id)"
" -> Seq Scan on variation_item_sellingsite_asin visa (cost=0.00..71928.04 rows=14374 width=16)"
" Filter: ((id)::numeric = 4.0)"
" -> Hash (cost=22026.01..22026.01 rows=954501 width=12)"
" -> Seq Scan on variation_item vi (cost=0.00..22026.01 rows=954501 width=12)"
" -> Index Scan using pk_product_id on product p (cost=0.00..8.33 rows=1 width=16)"
" Index Cond: (p.id = visa.product_id)"
Which is silly. I think that PostgreSQL converts the int side to a float, and then compares them.
It would be better to do this, for each item in the loop:
- evaluate the right side (which is float)
- tell if it is an integer or not
- if not an integer, then discard the row immediately
- otherwise use its integer value for the index scan
(I spent an hour figuring out what is wrong with my program. In some cases it was slow, in other cases it was really fast, and I never got an error message.)
What do you think?
Laszlo
Laszlo, > Which is silly. I think that PostgreSQL converts the int side to a > float, and then compares them. > > It would be better to do this, for each item in the loop: > > * evaluate the right side (which is float) > * tell if it is an integer or not > * if not an integer, then discard the row immediately > * otherwise use its integer value for the index scan Not terribly likely, I'm afraid. Data type coercion is *way* more complex than you realize (consider the number of data types we have, and the ability to add UDTs, and then square it). And the functionality you propose would break backwards compatibility; many people currently use ".0" currently in order to force a coercion to Float or Numeric. I'm not saying that PostgreSQL couldn't do better on this kind of case, but that doing better is a major project, not a minor one. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
You will get the same behaviour from any database product where the query as written requires type coercion - the coercion has to go in the direction of the "wider" type. I have seen the exact same scenario with Oracle, and I view it as a problem with the way the query is written, not with the database server.
Whoever coded the application which is making this query presumably knows that the visa.id field is an integer type in the schema they designed, so why are they passing a float? Convert the 4.0 to 4 on the application side instead, it's one function call or cast.
It's not reasonable to expect the query compiler to pick up the slack for poorly written SQL.
Cheers
Dave
Whoever coded the application which is making this query presumably knows that the visa.id field is an integer type in the schema they designed, so why are they passing a float? Convert the 4.0 to 4 on the application side instead, it's one function call or cast.
It's not reasonable to expect the query compiler to pick up the slack for poorly written SQL.
Cheers
Dave
On Tue, Feb 8, 2011 at 4:04 PM, Josh Berkus <josh@agliodbs.com> wrote:
Laszlo,Not terribly likely, I'm afraid. Data type coercion is *way* more
> Which is silly. I think that PostgreSQL converts the int side to a
> float, and then compares them.
>
> It would be better to do this, for each item in the loop:
>
> * evaluate the right side (which is float)
> * tell if it is an integer or not
> * if not an integer, then discard the row immediately
> * otherwise use its integer value for the index scan
complex than you realize (consider the number of data types we have, and
the ability to add UDTs, and then square it). And the functionality you
propose would break backwards compatibility; many people currently use
".0" currently in order to force a coercion to Float or Numeric.
I'm not saying that PostgreSQL couldn't do better on this kind of case,
but that doing better is a major project, not a minor one.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
09.02.11 01:14, Dave Crooke написав(ла):
As for me it would be great for optimizer to consider the next:
1) val1::narrow = val2::wide as (val1::narrow = val2::narrow and val2::narrow = val2::wide)
2) val1::narrow < val2::wide as (val1::narrow < val2::narrow and val1::wide < val2::wide)
3) val1::narrow > val2::wide as (val1::narrow + 1 > val2::narrow and val1::wide > val2::wide)
Of course it should use additional check it this allows to use an index.
Surely, this is not an easy thing to implement, but as for me similar question are raised quite often in this list.
Best regards, Vitalii Tymchyshyn
You will get the same behaviour from any database product where the query as written requires type coercion - the coercion has to go in the direction of the "wider" type. I have seen the exact same scenario with Oracle, and I view it as a problem with the way the query is written, not with the database server.Actually the problem may be in layers, and the problem may even be not noticed until it's late enough. As far as I remember from this list there are problems with column being integer and parameter prepared as bigint or number. Same for number vs double vs float.
Whoever coded the application which is making this query presumably knows that the visa.id field is an integer type in the schema they designed, so why are they passing a float? Convert the 4.0 to 4 on the application side instead, it's one function call or cast.
As for me it would be great for optimizer to consider the next:
1) val1::narrow = val2::wide as (val1::narrow = val2::narrow and val2::narrow = val2::wide)
2) val1::narrow < val2::wide as (val1::narrow < val2::narrow and val1::wide < val2::wide)
3) val1::narrow > val2::wide as (val1::narrow + 1 > val2::narrow and val1::wide > val2::wide)
Of course it should use additional check it this allows to use an index.
Surely, this is not an easy thing to implement, but as for me similar question are raised quite often in this list.
Best regards, Vitalii Tymchyshyn
On Tue, Feb 8, 2011 at 5:04 PM, Josh Berkus <josh@agliodbs.com> wrote: > Laszlo, > >> Which is silly. I think that PostgreSQL converts the int side to a >> float, and then compares them. >> >> It would be better to do this, for each item in the loop: >> >> * evaluate the right side (which is float) >> * tell if it is an integer or not >> * if not an integer, then discard the row immediately >> * otherwise use its integer value for the index scan > > Not terribly likely, I'm afraid. Data type coercion is *way* more > complex than you realize (consider the number of data types we have, and > the ability to add UDTs, and then square it). And the functionality you > propose would break backwards compatibility; many people currently use > ".0" currently in order to force a coercion to Float or Numeric. > > I'm not saying that PostgreSQL couldn't do better on this kind of case, > but that doing better is a major project, not a minor one. Specifically, the problem is that x = 4.0, where x is an integer, is defined to mean x::numeric = 4.0, not x = 4.0::integer. If it meant the latter, then testing x = 3.5 would throw an error, whereas what actually happens is it just returns false. Now, in this particular case, we all know that the only way x::numeric = 4.0 can be true is if x = 4::int. But that's a property of the numeric and integer data types that doesn't hold in general. Consider t = 'foo'::citext, where t has type text. That could be true if t = 'Foo' or t = 'foO' or t = 'FOO', etc. We could fix this by adding some special case logic that understands properties of integers and numeric values and optimizes x = 4.0::numeric to x = 4::int and x = 3.5::numeric to constant false. That would be cool, in a way, but I'm not sure it's really worth the code it would take, unless it falls naturally out of some larger project in that area. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Feb 8, 2011 at 5:04 PM, Josh Berkus <josh@agliodbs.com> wrote: >> I'm not saying that PostgreSQL couldn't do better on this kind of case, >> but that doing better is a major project, not a minor one. > Specifically, the problem is that x = 4.0, where x is an integer, is > defined to mean x::numeric = 4.0, not x = 4.0::integer. If it meant > the latter, then testing x = 3.5 would throw an error, whereas what > actually happens is it just returns false. > We could fix this by adding some special case logic that understands > properties of integers and numeric values and optimizes x = > 4.0::numeric to x = 4::int and x = 3.5::numeric to constant false. > That would be cool, in a way, but I'm not sure it's really worth the > code it would take, unless it falls naturally out of some larger > project in that area. I think that most of the practical problems around this case could be solved without such a hack. What we should do instead is invent cross-type operators "int = numeric" etc and make them members of both the integer and numeric index opclasses. There are reasons why that wouldn't work for integer versus float (read the last section of src/backend/access/nbtree/README) but right offhand it seems like it ought to be safe enough for numeric. Now, it wouldn't be quite as fast as if we somehow downconverted numeric to integer beforehand, but at least you'd only be talking about a slow comparison operator and not a fundamentally stupider plan. That's close enough for me, for what is in the end a stupidly written query. Of course, the above is still not exactly a small project, since you'd be talking about something like 36 new operators to cover all of int2, int4, int8. But it's a straightforward extension. regards, tom lane
On Sun, Feb 27, 2011 at 1:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Feb 8, 2011 at 5:04 PM, Josh Berkus <josh@agliodbs.com> wrote: >>> I'm not saying that PostgreSQL couldn't do better on this kind of case, >>> but that doing better is a major project, not a minor one. > >> Specifically, the problem is that x = 4.0, where x is an integer, is >> defined to mean x::numeric = 4.0, not x = 4.0::integer. If it meant >> the latter, then testing x = 3.5 would throw an error, whereas what >> actually happens is it just returns false. > >> We could fix this by adding some special case logic that understands >> properties of integers and numeric values and optimizes x = >> 4.0::numeric to x = 4::int and x = 3.5::numeric to constant false. >> That would be cool, in a way, but I'm not sure it's really worth the >> code it would take, unless it falls naturally out of some larger >> project in that area. > > I think that most of the practical problems around this case could be > solved without such a hack. What we should do instead is invent > cross-type operators "int = numeric" etc and make them members of both > the integer and numeric index opclasses. There are reasons why that > wouldn't work for integer versus float (read the last section of > src/backend/access/nbtree/README) but right offhand it seems like it > ought to be safe enough for numeric. Now, it wouldn't be quite as fast > as if we somehow downconverted numeric to integer beforehand, but at > least you'd only be talking about a slow comparison operator and not a > fundamentally stupider plan. That's close enough for me, for what is > in the end a stupidly written query. > > Of course, the above is still not exactly a small project, since you'd > be talking about something like 36 new operators to cover all of int2, > int4, int8. But it's a straightforward extension. Interesting. Worth a TODO? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Feb 28, 2011 at 02:04:53PM -0500, Robert Haas wrote: > On Sun, Feb 27, 2011 at 1:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> On Tue, Feb 8, 2011 at 5:04 PM, Josh Berkus <josh@agliodbs.com> wrote: > >>> I'm not saying that PostgreSQL couldn't do better on this kind of case, > >>> but that doing better is a major project, not a minor one. > > > >> Specifically, the problem is that x = 4.0, where x is an integer, is > >> defined to mean x::numeric = 4.0, not x = 4.0::integer. If it meant > >> the latter, then testing x = 3.5 would throw an error, whereas what > >> actually happens is it just returns false. > > > >> We could fix this by adding some special case logic that understands > >> properties of integers and numeric values and optimizes x = > >> 4.0::numeric to x = 4::int and x = 3.5::numeric to constant false. > >> That would be cool, in a way, but I'm not sure it's really worth the > >> code it would take, unless it falls naturally out of some larger > >> project in that area. > > > > I think that most of the practical problems around this case could be > > solved without such a hack. What we should do instead is invent > > cross-type operators "int = numeric" etc and make them members of both > > the integer and numeric index opclasses. There are reasons why that > > wouldn't work for integer versus float (read the last section of > > src/backend/access/nbtree/README) but right offhand it seems like it > > ought to be safe enough for numeric. Now, it wouldn't be quite as fast > > as if we somehow downconverted numeric to integer beforehand, but at > > least you'd only be talking about a slow comparison operator and not a > > fundamentally stupider plan. That's close enough for me, for what is > > in the end a stupidly written query. > > > > Of course, the above is still not exactly a small project, since you'd > > be talking about something like 36 new operators to cover all of int2, > > int4, int8. But it's a straightforward extension. > > Interesting. Worth a TODO? Since we are discussing int2 casting, I wanted to bring up this other casting issue from 2011, in case it helped the discussion. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +