Обсуждение: Strange query behaviour

Поиск
Список
Период
Сортировка

Strange query behaviour

От
Isaac Morland
Дата:
I'm finding a massive difference in query execution time between two queries that should be identical:

Very slow:
select ... from x natural left join y where y is null

Fast:
select ... from x natural left join y where y.primary_key_column is null

A fact that I suspect is important is that y has a column whose contents is PDFs with a total size of  35608033659. However, I can query that size using a query that looks like this:

select sum (length (pdf_field_1) + length (pdf_field_2)) from y

This runs very fast (2.8ms for 2324 rows).

So it is as if checking the whole tuple for NULL requires reading the PDF bytea columns, but checking just the primary key for NULL or even reading the lengths of the PDFs does not.

For the moment I'm going to fix it by just using "y.primary_key_column IS NULL" instead of "y IS NULL" where I want to check whether I have a row from y corresponding to a given row in x. But this seems like strange behaviour. I can think of a couple of potential enhancements that this suggests:

1) when checking an entire row for null, start with a primary key field or other NOT NULL field. In the common case of checking what happened with a left join, this is all that needs to be done - either there is a row, in which case the field cannot be NULL, or there is no row and all the other fields must also be NULL.

2) when checking a field for NULL, is it really necessary to load the field contents? It feels like whether or not a value is NULL should be possible to determine without de-toasting (if I have the right terminology).

Any ideas anybody might have would be much appreciated.

Re: Strange query behaviour

От
Andrew Gierth
Дата:
>>>>> "Isaac" == Isaac Morland <isaac.morland@gmail.com> writes:

 Isaac> So it is as if checking the whole tuple for NULL requires
 Isaac> reading the PDF bytea columns, but checking just the primary key
 Isaac> for NULL or even reading the lengths of the PDFs does not.

That is almost certainly exactly what happens. If the PDF columns are of
type bytea, or if they are of type text and the database encoding is
single-byte, then length() does not need to detoast the column in order
to get the size (the byte size of the toasted datum is stored in the
toast pointer).

However, constructing a whole-row datum does require detoasting any
externally-stored columns (this used not to be the case, but that caused
a lot of bugs).

 Isaac> For the moment I'm going to fix it by just using
 Isaac> "y.primary_key_column IS NULL" instead of "y IS NULL" where I
 Isaac> want to check whether I have a row from y corresponding to a
 Isaac> given row in x.

What you should actually use in these cases for your IS NULL check is
one of the columns of the join condition. That allows the planner to
detect that the query is in fact an anti-join, and optimize accordingly.

The other, and IMO better, way to write anti-join queries is to use an
explicit NOT EXISTS. (Note, do _not_ use NOT IN, since that has its own
issues with NULL handling.)

 Isaac> 1) when checking an entire row for null,

This isn't a very common operation and the SQL-standard semantics for it
are actually quite weird (for example, x IS NULL is not the opposite
condition to x IS NOT NULL). So I don't think we need to encourage it.

 Isaac> start with a primary key field or other NOT NULL field. In the
 Isaac> common case of checking what happened with a left join, this is
 Isaac> all that needs to be done - either there is a row, in which case
 Isaac> the field cannot be NULL, or there is no row and all the other
 Isaac> fields must also be NULL.

The planner can do even better than this if you apply the IS NULL test
_specifically to one of the join columns_. When a join condition is
strict (which it almost always is), then testing the column from the
nullable side is provably (to the planner) equivalent to testing the
existence of the matching row, which allows it to transform the join
from an outer join to an anti-join.

-- 
Andrew (irc:RhodiumToad)


Re: Strange query behaviour

От
Isaac Morland
Дата:
On Tue, 22 Jan 2019 at 15:32, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Isaac" == Isaac Morland <isaac.morland@gmail.com> writes:

 Isaac> So it is as if checking the whole tuple for NULL requires
 Isaac> reading the PDF bytea columns, but checking just the primary key
 Isaac> for NULL or even reading the lengths of the PDFs does not.

That is almost certainly exactly what happens. If the PDF columns are of
type bytea, or if they are of type text and the database encoding is
single-byte, then length() does not need to detoast the column in order
to get the size (the byte size of the toasted datum is stored in the
toast pointer).

Thanks very much for the detailed response! I just tested and indeed query performance has gone back to something like what I would expect. I feel more confident, however, with your confirmation and elaboration on the underlying details.

What you should actually use in these cases for your IS NULL check is
one of the columns of the join condition. That allows the planner to
detect that the query is in fact an anti-join, and optimize accordingly.

The other, and IMO better, way to write anti-join queries is to use an
explicit NOT EXISTS. (Note, do _not_ use NOT IN, since that has its own
issues with NULL handling.)

Thanks for the hint. As it happens, in the actual situation, I had a view which is defined something like:

CREATE ... AS
    SELECT ..., NOT y IS NULL AS has_y
        FROM x LEFT JOIN y USING (primary_key_field);

I used the simpler example when I found it would exhibit the same symptoms. However, using a join key field as you suggest still seems to be working to fix my problem.

 Isaac> 1) when checking an entire row for null,

This isn't a very common operation and the SQL-standard semantics for it
are actually quite weird (for example, x IS NULL is not the opposite
condition to x IS NOT NULL). So I don't think we need to encourage it.

In my use case, I think "y IS NULL" is better code than "y.primary_key_field IS NULL". In the second snippet, it raises the question, "why primary_key_field?" The answer is, "because otherwise the query planner will get confused". The first snippet is what I really mean, and also happens to be shorter.

This does require people reading the code to understand that "IS NULL" and "IS NOT NULL" are not logical negations of each other.
 
The planner can do even better than this if you apply the IS NULL test
_specifically to one of the join columns_. When a join condition is
strict (which it almost always is), then testing the column from the
nullable side is provably (to the planner) equivalent to testing the
existence of the matching row, which allows it to transform the join
from an outer join to an anti-join.

What is confusing me is why the planner can't convert "[entire row] IS NULL" into a test for existence of the matching row (assuming there is at least one NOT NULL column).

OK, let's put it this way: if I manage to dig into the planner internals enough to figure out how to make the planner understand this, and write a decent patch, would I have a good chance of getting it accepted? From here the figuring out part seems like a long-shot: messing with planner code scares me a little and I already have a feature request that I want to work on and a reasonable workaround for this one, but I'd like an assessment nevertheless. Maybe I'll have more energy this year!

Thanks again for taking the time to provide a detailed response. I very much appreciate it.

Re: Strange query behaviour

От
Tom Lane
Дата:
Isaac Morland <isaac.morland@gmail.com> writes:
> What is confusing me is why the planner can't convert "[entire row] IS
> NULL" into a test for existence of the matching row (assuming there is at
> least one NOT NULL column).

The reasons why the planner does very little with row-level IS [NOT] NULL
conditions are (1) so few people use them that it doesn't really seem
worth expending cycles or development effort on such cases, and (2) the
SQL spec is vague enough about the semantics of these predicates that
we've never been entirely sure whether we implement them correctly.  Thus
it didn't seem worth expending a lot of effort developing deduction logic
that might turn out to be completely wrong.

I suspect (1) is not unrelated to (2) ...

The semantic vaguenesses are also twofold:

(a) It's not quite clear whether the spec intends to draw a distinction
between a composite value that is in itself NULL and one that is a tuple
of all NULL fields.  There is certainly a physical difference, but it
looks like the IS [NOT] NULL predicates are designed not to be able to
tell the difference.

(b) It's not at all clear whether these predicates are meant to be
recursive for nested composite types.  Depending on how you read it,
it could be that a composite field that is NULL satisfies an IS NULL
predicate on the parent row, but a composite field that is ROW(NULL,
NULL, ...) does not.  That is in fact how we implement it, but it
sure seems weird given (a).

So personally, I've got zero confidence in these predicates and don't
especially wish to sink development effort into something that
critically depends on having the right semantics for them.  The shortage
of field demand for doing better doesn't help.

Circling back to your interest in using a "row IS NULL" predicate to
conclude that a left join is actually an antijoin, these questions are
really critical, because if the join column is itself composite, the
deduction would hold *only* if our theory that "row IS NULL" is
non-recursive is correct.  If our theory is wrong, and the spec
intends that ROW(NULL, ROW(NULL, NULL), NULL) IS NULL should be TRUE,
then it wouldn't be correct to draw the inference that the join has
to be an antijoin.  And that is also connected to the fact that
record_cmp considers ROW(NULL, NULL) to be equal to ROW(NULL, NULL),
which is (or at least seems to be) wrong per spec, but tough: we have
to have a total order for composite values, or they wouldn't be
indexable or sortable.

If your head's not hurting yet, you just need to think harder
about these issues.  It's all a mess, and IMO we're best off
not adding any more dependencies on these semantics than we
have to.

            regards, tom lane