Обсуждение: Strange query behaviour
I'm finding a massive difference in query execution time between two queries that should be identical:
Very slow:
select ... from x natural left join y where y is null
Fast:
select ... from x natural left join y where y.primary_key_column is null
A fact that I suspect is important is that y has a column whose contents is PDFs with a total size of 35608033659. However, I can query that size using a query that looks like this:
select sum (length (pdf_field_1) + length (pdf_field_2)) from y
This runs very fast (2.8ms for 2324 rows).
So it is as if checking the whole tuple for NULL requires reading the PDF bytea columns, but checking just the primary key for NULL or even reading the lengths of the PDFs does not.
For the moment I'm going to fix it by just using "y.primary_key_column IS NULL" instead of "y IS NULL" where I want to check whether I have a row from y corresponding to a given row in x. But this seems like strange behaviour. I can think of a couple of potential enhancements that this suggests:
1) when checking an entire row for null, start with a primary key field or other NOT NULL field. In the common case of checking what happened with a left join, this is all that needs to be done - either there is a row, in which case the field cannot be NULL, or there is no row and all the other fields must also be NULL.
2) when checking a field for NULL, is it really necessary to load the field contents? It feels like whether or not a value is NULL should be possible to determine without de-toasting (if I have the right terminology).
Any ideas anybody might have would be much appreciated.
>>>>> "Isaac" == Isaac Morland <isaac.morland@gmail.com> writes: Isaac> So it is as if checking the whole tuple for NULL requires Isaac> reading the PDF bytea columns, but checking just the primary key Isaac> for NULL or even reading the lengths of the PDFs does not. That is almost certainly exactly what happens. If the PDF columns are of type bytea, or if they are of type text and the database encoding is single-byte, then length() does not need to detoast the column in order to get the size (the byte size of the toasted datum is stored in the toast pointer). However, constructing a whole-row datum does require detoasting any externally-stored columns (this used not to be the case, but that caused a lot of bugs). Isaac> For the moment I'm going to fix it by just using Isaac> "y.primary_key_column IS NULL" instead of "y IS NULL" where I Isaac> want to check whether I have a row from y corresponding to a Isaac> given row in x. What you should actually use in these cases for your IS NULL check is one of the columns of the join condition. That allows the planner to detect that the query is in fact an anti-join, and optimize accordingly. The other, and IMO better, way to write anti-join queries is to use an explicit NOT EXISTS. (Note, do _not_ use NOT IN, since that has its own issues with NULL handling.) Isaac> 1) when checking an entire row for null, This isn't a very common operation and the SQL-standard semantics for it are actually quite weird (for example, x IS NULL is not the opposite condition to x IS NOT NULL). So I don't think we need to encourage it. Isaac> start with a primary key field or other NOT NULL field. In the Isaac> common case of checking what happened with a left join, this is Isaac> all that needs to be done - either there is a row, in which case Isaac> the field cannot be NULL, or there is no row and all the other Isaac> fields must also be NULL. The planner can do even better than this if you apply the IS NULL test _specifically to one of the join columns_. When a join condition is strict (which it almost always is), then testing the column from the nullable side is provably (to the planner) equivalent to testing the existence of the matching row, which allows it to transform the join from an outer join to an anti-join. -- Andrew (irc:RhodiumToad)
On Tue, 22 Jan 2019 at 15:32, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Isaac" == Isaac Morland <isaac.morland@gmail.com> writes:
Isaac> So it is as if checking the whole tuple for NULL requires
Isaac> reading the PDF bytea columns, but checking just the primary key
Isaac> for NULL or even reading the lengths of the PDFs does not.
That is almost certainly exactly what happens. If the PDF columns are of
type bytea, or if they are of type text and the database encoding is
single-byte, then length() does not need to detoast the column in order
to get the size (the byte size of the toasted datum is stored in the
toast pointer).
Thanks very much for the detailed response! I just tested and indeed query performance has gone back to something like what I would expect. I feel more confident, however, with your confirmation and elaboration on the underlying details.
What you should actually use in these cases for your IS NULL check is
one of the columns of the join condition. That allows the planner to
detect that the query is in fact an anti-join, and optimize accordingly.
The other, and IMO better, way to write anti-join queries is to use an
explicit NOT EXISTS. (Note, do _not_ use NOT IN, since that has its own
issues with NULL handling.)
Thanks for the hint. As it happens, in the actual situation, I had a view which is defined something like:
CREATE ... AS
SELECT ..., NOT y IS NULL AS has_y
FROM x LEFT JOIN y USING (primary_key_field);
I used the simpler example when I found it would exhibit the same symptoms. However, using a join key field as you suggest still seems to be working to fix my problem.
Isaac> 1) when checking an entire row for null,
This isn't a very common operation and the SQL-standard semantics for it
are actually quite weird (for example, x IS NULL is not the opposite
condition to x IS NOT NULL). So I don't think we need to encourage it.
In my use case, I think "y IS NULL" is better code than "y.primary_key_field IS NULL". In the second snippet, it raises the question, "why primary_key_field?" The answer is, "because otherwise the query planner will get confused". The first snippet is what I really mean, and also happens to be shorter.
This does require people reading the code to understand that "IS NULL" and "IS NOT NULL" are not logical negations of each other.
The planner can do even better than this if you apply the IS NULL test
_specifically to one of the join columns_. When a join condition is
strict (which it almost always is), then testing the column from the
nullable side is provably (to the planner) equivalent to testing the
existence of the matching row, which allows it to transform the join
from an outer join to an anti-join.
What is confusing me is why the planner can't convert "[entire row] IS NULL" into a test for existence of the matching row (assuming there is at least one NOT NULL column).
OK, let's put it this way: if I manage to dig into the planner internals enough to figure out how to make the planner understand this, and write a decent patch, would I have a good chance of getting it accepted? From here the figuring out part seems like a long-shot: messing with planner code scares me a little and I already have a feature request that I want to work on and a reasonable workaround for this one, but I'd like an assessment nevertheless. Maybe I'll have more energy this year!
Thanks again for taking the time to provide a detailed response. I very much appreciate it.
Isaac Morland <isaac.morland@gmail.com> writes: > What is confusing me is why the planner can't convert "[entire row] IS > NULL" into a test for existence of the matching row (assuming there is at > least one NOT NULL column). The reasons why the planner does very little with row-level IS [NOT] NULL conditions are (1) so few people use them that it doesn't really seem worth expending cycles or development effort on such cases, and (2) the SQL spec is vague enough about the semantics of these predicates that we've never been entirely sure whether we implement them correctly. Thus it didn't seem worth expending a lot of effort developing deduction logic that might turn out to be completely wrong. I suspect (1) is not unrelated to (2) ... The semantic vaguenesses are also twofold: (a) It's not quite clear whether the spec intends to draw a distinction between a composite value that is in itself NULL and one that is a tuple of all NULL fields. There is certainly a physical difference, but it looks like the IS [NOT] NULL predicates are designed not to be able to tell the difference. (b) It's not at all clear whether these predicates are meant to be recursive for nested composite types. Depending on how you read it, it could be that a composite field that is NULL satisfies an IS NULL predicate on the parent row, but a composite field that is ROW(NULL, NULL, ...) does not. That is in fact how we implement it, but it sure seems weird given (a). So personally, I've got zero confidence in these predicates and don't especially wish to sink development effort into something that critically depends on having the right semantics for them. The shortage of field demand for doing better doesn't help. Circling back to your interest in using a "row IS NULL" predicate to conclude that a left join is actually an antijoin, these questions are really critical, because if the join column is itself composite, the deduction would hold *only* if our theory that "row IS NULL" is non-recursive is correct. If our theory is wrong, and the spec intends that ROW(NULL, ROW(NULL, NULL), NULL) IS NULL should be TRUE, then it wouldn't be correct to draw the inference that the join has to be an antijoin. And that is also connected to the fact that record_cmp considers ROW(NULL, NULL) to be equal to ROW(NULL, NULL), which is (or at least seems to be) wrong per spec, but tough: we have to have a total order for composite values, or they wouldn't be indexable or sortable. If your head's not hurting yet, you just need to think harder about these issues. It's all a mess, and IMO we're best off not adding any more dependencies on these semantics than we have to. regards, tom lane