Обсуждение: how to tell the difference between empty field and null field
Assuming a table such as this: create table users ( user text, password text, name text, domain text ); ...and data such as this: insert into users values ('frank','zCeZ6f2f.NUKU','Frank Farley','domain.com'); insert into users values ('joe','QJixz/XLXvio2','Joe Blogg',''); insert into users values ('sam','kAdhVr3URa4Y.','Sam Stooge'); Note that joe has a blank domain field, while sam has none. I want to know what users don't have a domain specified in their domain field. But the query: select * from users where domain = ''; only shows me joe, and the query: select * from users where domain = null; only shows me sam. So, I use: select * from users where domain = '' or domain = null; Here's my question: if I have a zillion records in this table, and it's indexed by user+domain, how can I run this query without losing the benefit of the index? -- Alex Howansky alex@wankwood.com http://www.wankwood.com/
Alex Howansky <alex@wankwood.com> writes: > select * from users where domain = '' or domain = null; OK, that'll work, but if you'll pardon a nitpick: "= NULL" is not standard, it is Microsoft brain damage. "IS NULL" is standard. > Here's my question: if I have a zillion records in this table, and > it's indexed by user+domain, how can I run this query without losing > the benefit of the index? An index on (user, domain) is perfectly useless for the above query, because the user field isn't mentioned anywhere in the query. An index on domain alone could be used, though, and should be pretty effective. (We do have some performance problems if you get into dozens of OR terms, but for just a couple, no sweat.) As a rule of thumb, multi-column indexes are quite inflexible, and you will not find that they are worth their cost of upkeep unless you know that you have a specific kind of query you use a *lot* that can exploit the multi-column index. Not only that, but that the heavily used query is significantly faster than it'd be if it only had an index on the first column named in the multi-column index. In most scenarios, the first column gives you most of the gain and any extra columns are marginal. In short, unless you've done careful analysis and testing, you should not make an index on (user, domain) but two indexes on user and domain separately. The latter setup will be a lot more flexible in terms of being reasonably quick for a variety of queries. regards, tom lane
> An index on (user, domain) is perfectly useless for the above query, > because the user field isn't mentioned anywhere in the query. An index > on domain alone could be used, though, and should be pretty effective. > (We do have some performance problems if you get into dozens of OR > terms, but for just a couple, no sweat.) Heh heh, oops. Index on domain alone is what I meant, sorry, I was trying to mix two different examples in my head without actually running them. :) My main concern is with the use of the 'or' in the query. My experience is mostly with Progress -- it (at least the ancient version that I'm used to) can't utilize the index on a field _at all_ if your query uses an 'or' on that field. As a result, I've become extremely cautious about doing this with Postgres. You seem to be saying that it's no a big deal -- that the index will still be utilized and that performance will not suffer significantly. Is this something that I can finally forget worrying about? Thanks for the help and the speedy reply. -- Alex Howansky alex@wankwood.com http://www.wankwood.com/
Alex Howansky <alex@wankwood.com> writes: > My main concern is with the use of the 'or' in the query. My > experience is mostly with Progress -- it (at least the ancient version > that I'm used to) can't utilize the index on a field _at all_ if your > query uses an 'or' on that field. As a result, I've become extremely > cautious about doing this with Postgres. You seem to be saying that > it's no a big deal -- that the index will still be utilized and that > performance will not suffer significantly. Is this something that I > can finally forget worrying about? In current sources it definitely works. For example, regression=> explain select * from tenk1 where unique1 = 33; NOTICE: QUERY PLAN: Index Scan using tenk1_unique1 on tenk1 (cost=9.00 rows=100 width=148) EXPLAIN regression=> explain select * from tenk1 where unique1 = 33 or unique1 = 44; NOTICE: QUERY PLAN: Index Scan using tenk1_unique1, tenk1_unique1 on tenk1 (cost=18.00 rows=200 width=148) EXPLAIN Notice that the index is mentioned twice in the second EXPLAIN. That means there are actually two index scans being done: the first pulls out the entries matching the first OR subclause, and the second gets the entries matching the other clause. (Yes, the right thing happens for tuples that match both, although this isn't possible in the above query.) Version 6.5.* is a little flakier about whether it will apply multiple index scans for an OR where-clause; the capability is in there but I don't trust the optimizer to recognize it's a good strategy all the times it should. Check and see what you get from EXPLAIN. BTW, I think I spoke too soon in claiming that IS NULL would work as an index OR clause; it doesn't seem to, in some quick tests. I'll have to see if anything can be done about that... regards, tom lane
Re: [SQL] how to tell the difference between empty field and null field
От
"Moray McConnachie"
Дата:
Far be it from me to defend Microsoft, but not all sloppiness is down to Microsoft. All Microsoft databases support IS NULL, and indeed advocate it: from the Access help file... Searching for Null values or zero-length strings ---------------------------------------------------------------- If you're using a query to search for Null values or zero-length strings, type Is Null into the Criteria cell to search for Null values, or type two double quotation marks (" ") into the Criteria cell to search for zero-length strings (don't type a space between the quotation marks). Yours, Moray ---------------------------------------------------------------------- ---------------- Moray.McConnachie@computing-services.oxford.ac.uk ----- Original Message ----- From: Tom Lane <tgl@sss.pgh.pa.us> To: Alex Howansky <alex@wankwood.com> Cc: <pgsql-sql@postgreSQL.org> Sent: Monday, December 13, 1999 5:14 AM Subject: Re: [SQL] how to tell the difference between empty field and null field > Alex Howansky <alex@wankwood.com> writes: > > select * from users where domain = '' or domain = null; > > OK, that'll work, but if you'll pardon a nitpick: "= NULL" is not > standard, it is Microsoft brain damage. "IS NULL" is standard. > > > Here's my question: if I have a zillion records in this table, and > > it's indexed by user+domain, how can I run this query without losing > > the benefit of the index? > > An index on (user, domain) is perfectly useless for the above query, > because the user field isn't mentioned anywhere in the query. An index > on domain alone could be used, though, and should be pretty effective. > (We do have some performance problems if you get into dozens of OR > terms, but for just a couple, no sweat.) > > As a rule of thumb, multi-column indexes are quite inflexible, and > you will not find that they are worth their cost of upkeep unless > you know that you have a specific kind of query you use a *lot* that > can exploit the multi-column index. Not only that, but that the > heavily used query is significantly faster than it'd be if it only > had an index on the first column named in the multi-column index. > In most scenarios, the first column gives you most of the gain and > any extra columns are marginal. > > In short, unless you've done careful analysis and testing, you should > not make an index on (user, domain) but two indexes on user and domain > separately. The latter setup will be a lot more flexible in terms of > being reasonably quick for a variety of queries. > > regards, tom lane > > ************ > >