Обсуждение: how to tell the difference between empty field and null field

Поиск
Список
Период
Сортировка

how to tell the difference between empty field and null field

От
Alex Howansky
Дата:
Assuming a table such as this:

create table users ( user text, password text, name text, domain text
);

...and data such as this:

insert into users values ('frank','zCeZ6f2f.NUKU','Frank Farley','domain.com');

insert into users values ('joe','QJixz/XLXvio2','Joe Blogg','');

insert into users values ('sam','kAdhVr3URa4Y.','Sam Stooge');

Note that joe has a blank domain field, while sam has none.

I want to know what users don't have a domain specified in their domain field.
But the query:

select * from users where domain = '';

only shows me joe, and the query:

select * from users where domain = null;

only shows me sam.

So, I use:

select * from users where domain = '' or domain = null;

Here's my question: if I have a zillion records in this table, and it's indexed
by user+domain, how can I run this query without losing the benefit of the
index?

--
Alex Howansky
alex@wankwood.com
http://www.wankwood.com/



Re: [SQL] how to tell the difference between empty field and null field

От
Tom Lane
Дата:
Alex Howansky <alex@wankwood.com> writes:
> select * from users where domain = '' or domain = null;

OK, that'll work, but if you'll pardon a nitpick: "= NULL" is not
standard, it is Microsoft brain damage.  "IS NULL" is standard.

> Here's my question: if I have a zillion records in this table, and
> it's indexed by user+domain, how can I run this query without losing
> the benefit of the index?

An index on (user, domain) is perfectly useless for the above query,
because the user field isn't mentioned anywhere in the query.  An index
on domain alone could be used, though, and should be pretty effective.
(We do have some performance problems if you get into dozens of OR
terms, but for just a couple, no sweat.)

As a rule of thumb, multi-column indexes are quite inflexible, and
you will not find that they are worth their cost of upkeep unless
you know that you have a specific kind of query you use a *lot* that
can exploit the multi-column index.  Not only that, but that the
heavily used query is significantly faster than it'd be if it only
had an index on the first column named in the multi-column index.
In most scenarios, the first column gives you most of the gain and
any extra columns are marginal.

In short, unless you've done careful analysis and testing, you should
not make an index on (user, domain) but two indexes on user and domain
separately.  The latter setup will be a lot more flexible in terms of
being reasonably quick for a variety of queries.
        regards, tom lane


Re: [SQL] how to tell the difference between empty field and null field

От
Alex Howansky
Дата:
> An index on (user, domain) is perfectly useless for the above query,
> because the user field isn't mentioned anywhere in the query.  An index
> on domain alone could be used, though, and should be pretty effective.
> (We do have some performance problems if you get into dozens of OR
> terms, but for just a couple, no sweat.)

Heh heh, oops. Index on domain alone is what I meant, sorry, I was trying to
mix two different examples in my head without actually running them. :)

My main concern is with the use of the 'or' in the query. My experience is
mostly with Progress -- it (at least the ancient version that I'm used to)
can't utilize the index on a field _at all_ if your query uses an 'or' on that
field. As a result, I've become extremely cautious about doing this with
Postgres. You seem to be saying that it's no a big deal -- that the index will
still be utilized and that performance will not suffer significantly. Is this
something that I can finally forget worrying about?

Thanks for the help and the speedy reply.

-- 
Alex Howansky
alex@wankwood.com
http://www.wankwood.com/




Re: [SQL] how to tell the difference between empty field and null field

От
Tom Lane
Дата:
Alex Howansky <alex@wankwood.com> writes:
> My main concern is with the use of the 'or' in the query. My
> experience is mostly with Progress -- it (at least the ancient version
> that I'm used to) can't utilize the index on a field _at all_ if your
> query uses an 'or' on that field. As a result, I've become extremely
> cautious about doing this with Postgres. You seem to be saying that
> it's no a big deal -- that the index will still be utilized and that
> performance will not suffer significantly. Is this something that I
> can finally forget worrying about?

In current sources it definitely works.  For example,

regression=> explain select * from tenk1 where unique1 = 33;
NOTICE:  QUERY PLAN:

Index Scan using tenk1_unique1 on tenk1  (cost=9.00 rows=100 width=148)

EXPLAIN

regression=> explain select * from tenk1 where unique1 = 33 or unique1 = 44;
NOTICE:  QUERY PLAN:

Index Scan using tenk1_unique1, tenk1_unique1 on tenk1  (cost=18.00 rows=200 width=148)

EXPLAIN

Notice that the index is mentioned twice in the second EXPLAIN.  That
means there are actually two index scans being done: the first pulls out
the entries matching the first OR subclause, and the second gets the
entries matching the other clause.  (Yes, the right thing happens for
tuples that match both, although this isn't possible in the above query.)

Version 6.5.* is a little flakier about whether it will apply multiple
index scans for an OR where-clause; the capability is in there but I
don't trust the optimizer to recognize it's a good strategy all the
times it should.  Check and see what you get from EXPLAIN.

BTW, I think I spoke too soon in claiming that IS NULL would work as
an index OR clause; it doesn't seem to, in some quick tests.  I'll have
to see if anything can be done about that...
        regards, tom lane


Re: [SQL] how to tell the difference between empty field and null field

От
"Moray McConnachie"
Дата:
Far be it from me to defend Microsoft, but not all sloppiness is down
to Microsoft. All Microsoft databases support IS NULL, and indeed
advocate it: from the Access help file...

Searching for Null values or zero-length strings
----------------------------------------------------------------
If you're using a query to search for Null values or zero-length
strings, type Is Null into the Criteria cell to search for Null
values, or type two double quotation marks (" ") into the Criteria
cell to search for zero-length strings (don't type a space between the
quotation marks).

Yours,
Moray
----------------------------------------------------------------------
----------------
Moray.McConnachie@computing-services.oxford.ac.uk
----- Original Message -----
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Alex Howansky <alex@wankwood.com>
Cc: <pgsql-sql@postgreSQL.org>
Sent: Monday, December 13, 1999 5:14 AM
Subject: Re: [SQL] how to tell the difference between empty field and
null field


> Alex Howansky <alex@wankwood.com> writes:
> > select * from users where domain = '' or domain = null;
>
> OK, that'll work, but if you'll pardon a nitpick: "= NULL" is not
> standard, it is Microsoft brain damage.  "IS NULL" is standard.
>
> > Here's my question: if I have a zillion records in this table, and
> > it's indexed by user+domain, how can I run this query without
losing
> > the benefit of the index?
>
> An index on (user, domain) is perfectly useless for the above query,
> because the user field isn't mentioned anywhere in the query.  An
index
> on domain alone could be used, though, and should be pretty
effective.
> (We do have some performance problems if you get into dozens of OR
> terms, but for just a couple, no sweat.)
>
> As a rule of thumb, multi-column indexes are quite inflexible, and
> you will not find that they are worth their cost of upkeep unless
> you know that you have a specific kind of query you use a *lot* that
> can exploit the multi-column index.  Not only that, but that the
> heavily used query is significantly faster than it'd be if it only
> had an index on the first column named in the multi-column index.
> In most scenarios, the first column gives you most of the gain and
> any extra columns are marginal.
>
> In short, unless you've done careful analysis and testing, you
should
> not make an index on (user, domain) but two indexes on user and
domain
> separately.  The latter setup will be a lot more flexible in terms
of
> being reasonably quick for a variety of queries.
>
> regards, tom lane
>
> ************
>
>