Обсуждение: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"
The following bug has been logged on the website: Bug reference: 14546 Logged by: Casey Witt Email address: kcwitt@gmail.com PostgreSQL version: 9.6.1 Operating system: N/A Description: The point type is a built-in type, yet does not work with the "IS DISTINCT" clause and throws the error "42883: operator does not exist: point = point". This means that generated SQL template code such as the following does not work: UPDATE dcc.tbl_document_coordinates SET lower_left=@newValue WHERE (lower_left IS NOT DISTINCT FROM @oldValue OR lower_left IS NOT DISTINCT FROM @newValue) AND id=@primary_key; The point of the above statement is to update the "lower_left" point unless somebody has changed it already (IS NOT DISTINCT FROM @original_value) or somebody else has already changed it to the desired new value (IS NOT DISTINCT FROM @new_value). To update this single statement is easy, but the issue is that this statement is generated automatically by a function and includes all fields that have changed. Whatever issue there is with comparing points should be dealt with in postgres and not pushed to client applications. Conceptually, there should not be any difference between updating/comparing an int4 and a point (considering that they are both built-in types). This is a bug because https://www.postgresql.org/docs/9.0/static/functions-comparison.html states that "Comparison operators are available for all relevant data types." -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
kcwitt@gmail.com writes: > Whatever issue there is with comparing points should be dealt with in > postgres and not pushed to client applications. The problem with this is that it's not entirely clear what definition of "equality" IS [NOT] DISTINCT FROM should use. The current definition is "use the operator named =", which is rather ad hoc, but you can point to chapter and verse in the SQL standard where X IS [NOT] DISTINCT FROM Y is defined in terms of "X = Y", so it's not completely nuts either. Personally I'd prefer it to be defined as "use the equality operator of the data type's default btree opclass, or hash opclass if no btree opclass"; but it's likely that that would break some cross-type cases that work today, so it wouldn't be a panacea. But in either case, "point" loses because it has neither an operator named "=" nor a btree or hash opclass. This is not just an oversight. There is a "point ~= point" operator but it implements fuzzy equality, making it unsuitable as a basis for btree or hash behavior, even assuming that you could invent the linear ordering of points that would also be needed for a btree opclass. It would surely not be terribly hard to invent an exact-equality "=" operator for points, but it's not clear how useful that would be. Almost all the existing operators for points are fuzzy. The same problems exist for other geometric types, usually worse, because many of the others do have "=" operators but they compare areas :-(. Nobody's really wanted to break backwards compatibility enough to bring some sanity to that mess. Short answer is I don't think this is likely to change in the near future. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
> Nobody's really wanted to break backwards compatibility enough to bring > some sanity to that mess. I am trying to bring some sanity to that mess: https://www.postgresql.org/message-id/flat/CAE2gYzwwxPWbzxY3mtN4WL7W0DCkWo8gnB2ThUHU2XQ9XwgHMg%40mail.gmail.com Any comment helps to keep the discussion moving. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Tom, Thanks for the explanation. It looks like there are three separate issues here for the point type: 1) how to know if two points are equal (really equal, not fuzzy) 2) how to know if two points are "close enough" (fuzzy equal, which I assume must be very useful for GIS stuff but haven't used myself yet) 3) how to order the points in a btree (ie. which is bigger (0,1) or (1,0)) Does the following make sense: 1) implement an equality operator for a point (x1=x2 AND y1=y2). This would allow "IS [NOT] DISTINCT" to work out of the box, and most importantly (from my perspective at least) would allow "UPDATE table SET (p1 = @p1) WHERE p1 IS NOT DISTINCT FROM @p1_original;" to just work. 2) points still have the "~=" operator, so no impact there 3) couldn't a general ordering operator be defined as: If x1 < x2 then point1 < point 2 If x1 > x2 then point 1 > point 2 If x1 = x2 then If y1 < y2 then point1 < point 2 If y1 > y2 then point1 > point 2 If y1 = y2 then point1 = point 2 For my use case, the reason it is so important to get the "IS [NOT] DISTINCT" working for points is because my update queries are generated automatically based on a list of "dirty fields" in my application front-end. The application front-end creates a SQL update statement saying "update all these values which have changed AS LONG AS they haven't been updated by somebody else in the meantime". The point field is just one of many fields (and data types) subject to change. So without having "=" defined for point I only have two [very ugly] options: 1) store the point as two separate numeric columns (but I want to ensure that I either have a complete point or no point, so then I would have to also add check constraints), and when there are lots of points in the table this just "feels icky" 2) update my SQL generator to make a special exception for points (which is what I am doing now), but again, this just "feels icky". Regarding the ordering operator, somebody could say that for their case the y value should be compared before the x value, but does it really matter for the purposes of implementing a general comparison btree operator? Isn't the only thing that is truly required is that it is deterministic and guaranteed to produce the same ordering every time, with minimizing computation being a distant second place consideration? I say that minimizing computation should be a "distant" second place to imply that it is better to have an inefficient implementation of something than no implementation at all. I know that this discussion of "what is equals" comes up a lot, and in a lot of different programming languages, and the answer is often "we leave it undefined because we don't know the exact use case". I have never embraced this approach, and much prefer the "implement an equality operator that is rational for the simplest general case". As an exaggerated example, one could say, we can't implement an equality operator for "numeric" because somebody may be storing a "length" in that field and without knowing whether each value is in feet or meters it is not possible to compare two of these values. But in practice, nobody gets too excited about this because for most use cases the numeric type works as expected and when there is a special use case such as storing length we are accustomed to the design pattern of storing the length value in one column, the units of measure in another column, write a function to convert the length to a common base, and then use that function to create an index. Extending this general idea beyond points then, two "boxes" could be considered equal if all four points are equal, and could be ordered by the length of the perimeter (I proposed perimeter because I assume it would be the cheapest to calculate, but it could be area or anything else as long as it is the cheapest to calculate and deterministic). Anybody needing more than that (ie. geometric transforms) would still be able to write functions to do their transforms which could be used in the index. The key point here is that if the extra information placing doubt on the equality is not actually embedded in the data type, then it should not be considered when determining "built-in" data type equality and ordering operations. If there is currently no "=" defined for points, then no existing code base uses the "=" operator for points, so how could defining this operator break backwards compatibility? I know that this is a bit of a simplistic view of the situation, and I appreciate your time to help me understand the real complexities of this issue. Thanks, Casey -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Thursday, February 16, 2017 02:29 To: kcwitt@gmail.com Cc: pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT" kcwitt@gmail.com writes: > Whatever issue there is with comparing points should be dealt with in > postgres and not pushed to client applications. The problem with this is that it's not entirely clear what definition of "equality" IS [NOT] DISTINCT FROM should use. The current definition is "use the operator named =", which is rather ad hoc, but you can point to chapter and verse in the SQL standard where X IS [NOT] DISTINCT FROM Y is defined in terms of "X = Y", so it's not completely nuts either. Personally I'd prefer it to be defined as "use the equality operator of the data type's default btree opclass, or hash opclass if no btree opclass"; but it's likely that that would break some cross-type cases that work today, so it wouldn't be a panacea. But in either case, "point" loses because it has neither an operator named "=" nor a btree or hash opclass. This is not just an oversight. There is a "point ~= point" operator but it implements fuzzy equality, making it unsuitable as a basis for btree or hash behavior, even assuming that you could invent the linear ordering of points that would also be needed for a btree opclass. It would surely not be terribly hard to invent an exact-equality "=" operator for points, but it's not clear how useful that would be. Almost all the existing operators for points are fuzzy. The same problems exist for other geometric types, usually worse, because many of the others do have "=" operators but they compare areas :-(. Nobody's really wanted to break backwards compatibility enough to bring some sanity to that mess. Short answer is I don't think this is likely to change in the near future. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Emre, Yeah, changing existing Macros is tough, because it can break backwards compatibility. I had in mind that fixing this for a point would be easy since the "=" and btree operators are not defined, no existing codebases would be impacted by implementing them, but when I re-read Tom Lanes email I noticed the line that says: "The same problems exist for other geometric types, usually worse, because many of the others do have "=" operators butthey compare areas :-(. I feel like it is tragic that "=" got implemented for geometric types in PostgreSQL using area, because my gut sense is thatequality in PostgreSQL should be based on the raw values stored, and not some arbitrary transformation of them. In thiscase I am specifically making a distinction between how PostgreSQL should deal with points (and other geometric types)vs. how other things (ie. PostGIS) should deal with them. If PostgreSQL can deal with geometry simply and deterministically(ie. "=" means the raw values are equal, and ordering is based on the cheapest determistic function available),then that would provide a good foundation for everything else (ie. PostGIS) to define whatever makes sense inthat environment based on the fundamental definition in PostgreSQL. I don't really even like the fact that "~=" is defined for points for three reasons: 1) it is fuzzy, but no way to control the error. If you think of "~=" as being the same as "close to", coordinates on inIC chip will want to consider "close to" in the nanometer range, whereas coordinates for a house will consider "close to"in the meter range (if anybody disagrees that coordinates for a house would be in the meter range, that demonstrates thepoint that the definition of "close to" should not be baked into PostgreSQL because it is use case specific). AlthoughPostgreSQL doesn't innately understand the difference between nanometers and meters, the point is for the IC chipcase you may be comparing at the 6th decimal place, whereas for the house case you may want to compare at the first decimalplace. 2) there is no expectation that "~=" is defined for every type (unlike "=", which is documented (incorrectly) to be definedfor every type) 3) "~=" could be handled by a function which takes two points and an error value (thus allowing the user to use the errorvalue suitable for their use case) The idea here being that PostgreSQL doesn't even know what it is storing or in what context it is being used (as far as PostgreSQLis concerned, it is just storing bytes that fit the format of some defined type), so it shouldn't be making assumptionsabout what "close" means. Whereas PostGIS does know what it is storing and in what context it is being used, soPostGIS can make assumptions about what is close (but note that I have never actually used PostGIS, so this is just anassumption). I am NOT suggesting any change to "~=" because that clearly would break backward compatibility, but I can't help feelingthat PostgreSQL is building a pretty big "technical debt" by not sorting this out sooner rather than later. Casey -----Original Message----- From: Emre Hasegeli [mailto:emre@hasegeli.com] Sent: Friday, February 17, 2017 17:21 To: Tom Lane Cc: kcwitt@gmail.com; PostgreSQL Bugs Subject: Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT" > Nobody's really wanted to break backwards compatibility enough to > bring some sanity to that mess. I am trying to bring some sanity to that mess: https://www.postgresql.org/message-id/flat/CAE2gYzwwxPWbzxY3mtN4WL7W0DCkWo8gnB2ThUHU2XQ9XwgHMg%40mail.gmail.com Any comment helps to keep the discussion moving. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs