Обсуждение: Indexing null dates
Hi I'm building a web app where changes to customer orders are logged in the following table, and I worried about the time that it will take to locate records that need further processing/actioning. Here's the table: create table CUSTOMER.WCCustOrderStatusLog ( WCCustOrderID integer, WCOrderStatusID integer, -- date/time at which some process acknowledged the status -- change, and took the appropriate action, like raising -- an e-mail confirmation Acknowledged timestamp, -- stamped with now() when the e-mailer process begins to process -- the order/email message. When complete Processing goes back -- to null and Acknowledged is stamped with now() Processing timestamp, -- date status change occured LastUpdated timestamp, primary key (WCCustOrderID, WCOrderStatusID) ) without oids; I need a separate e-mailing process to locate orders (using the above table) that have WCOrderStatusID = (4,5 or 99) AND a null value for "Acknowledged" and "Processing" I have a function which provides the next order number for processing: CREATE OR REPLACE FUNCTION CUSTOMER.GetNextCustEmailAck(integer) RETURNS integer AS ' -- locates the next order number that requires an e-mail confirmation -- to be sent - this generally occurs as the order passes from -- one status to the next. These status charges are recorded -- within the WCCustOrderStatusLog table DECLARE pStatusID ALIAS FOR $1; vCustOrderID integer := null; BEGIN -- locate the next order select WCCustOrderID into vCustOrderID from CUSTOMER.WCCustOrderStatusLog where WCOrderStatusID = pStatusID and Acknowledged is null and Processing is null for update limit 1; if (vCustOrderID is not null) then -- mark the record as being processed update CUSTOMER.WCCustOrderStatusLog set Processing = now() where WCCustOrderID = vCustOrderID and WCOrderStatusID = pStatusID; end if; -- return -1 to indicate failure to locate order if (vCustOrderID is null) then return -1; end if; -- return the order number return vCustOrderID; END; ' LANGUAGE 'plpgsql'; My question is whether postgres can index null values, and if not, do I have to accept a full table scan when locating records. Or can I mitigate this by the use either of date values that signify null, but are non-null (eg 1-Jan-1970). Or am I better off adding extra flag fields (integer) which always have a Y/N (1,0) value corresponding to whether the appropriate date field is null or not, and then use these to locate the records. eg create table CUSTOMER.WCCustOrderStatusLog ( WCCustOrderID integer, WCOrderStatusID integer, -- New flag field AcknowledgedIsNull integer, Acknowledged timestamp, -- New flag field ProcessingIsNull integer, Processing timestamp, -- date status change occured LastUpdated timestamp, primary key (WCCustOrderID, WCOrderStatusID) ) without oids; Also will adding an index to WCOrderStatusID reduce the cost of the sequential scan? Or is there some other strategy that would be better? Thanks. John Sidney-Woollett
"John Sidney-Woollett" <johnsw@wardbrook.com> writes: > [ needs to make this fast: ] > -- locate the next order > select WCCustOrderID into vCustOrderID > from CUSTOMER.WCCustOrderStatusLog > where WCOrderStatusID = pStatusID > and Acknowledged is null > and Processing is null > for update > limit 1; > My question is whether postgres can index null values, and if not, do I > have to accept a full table scan when locating records. It indexes them, but "is null" is not an indexable operator, so you can't directly solve the above with a 3-column index. What you can do instead is use a partial index, for instance create index i on CUSTOMER.WCCustOrderStatusLog (WCOrderStatusID) where Acknowledged is null and Processing is null; regards, tom lane
Tom Lane said: > "John Sidney-Woollett" <johnsw@wardbrook.com> writes: >> [ needs to make this fast: ] >> -- locate the next order >> select WCCustOrderID into vCustOrderID >> from CUSTOMER.WCCustOrderStatusLog >> where WCOrderStatusID = pStatusID >> and Acknowledged is null >> and Processing is null >> for update >> limit 1; > >> My question is whether postgres can index null values, and if not, do I >> have to accept a full table scan when locating records. > > It indexes them, but "is null" is not an indexable operator, so you > can't directly solve the above with a 3-column index. What you can do > instead is use a partial index, for instance > > create index i on CUSTOMER.WCCustOrderStatusLog (WCOrderStatusID) > where Acknowledged is null and Processing is null; That's a very nifty trick and exactly the sort of answer I was after! Many thanks John Sidney-Woollett
John Sidney-Woollett wrote: > Tom Lane said: > > "John Sidney-Woollett" <johnsw@wardbrook.com> writes: > >> [ needs to make this fast: ] > >> -- locate the next order > >> select WCCustOrderID into vCustOrderID > >> from CUSTOMER.WCCustOrderStatusLog > >> where WCOrderStatusID = pStatusID > >> and Acknowledged is null > >> and Processing is null > >> for update > >> limit 1; > > > >> My question is whether postgres can index null values, and if not, do I > >> have to accept a full table scan when locating records. > > > > It indexes them, but "is null" is not an indexable operator, so you > > can't directly solve the above with a 3-column index. What you can do > > instead is use a partial index, for instance > > > > create index i on CUSTOMER.WCCustOrderStatusLog (WCOrderStatusID) > > where Acknowledged is null and Processing is null; > > That's a very nifty trick and exactly the sort of answer I was after! Yes, nifty. CREATE INDEX docs updated with: + <literal>NULL</> values are not indexed by default. The best way + to index <literal>NULL</> values is to create a partial index using + an <literal>IS NULL</> comparison. <literal>IS NULL</> is more + of a function call than a value comparison, and this is why a partial + index works. Full patch attached. Thanks. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: doc/src/sgml/ref/create_index.sgml =================================================================== RCS file: /cvsroot/pgsql-server/doc/src/sgml/ref/create_index.sgml,v retrieving revision 1.44 diff -c -c -r1.44 create_index.sgml *** doc/src/sgml/ref/create_index.sgml 29 Nov 2003 19:51:38 -0000 1.44 --- doc/src/sgml/ref/create_index.sgml 20 Apr 2004 00:54:29 -0000 *************** *** 66,72 **** When the <literal>WHERE</literal> clause is present, a <firstterm>partial index</firstterm> is created. A partial index is an index that contains entries for only a portion of ! a table, usually a portion that is somehow more interesting than the rest of the table. For example, if you have a table that contains both billed and unbilled orders where the unbilled orders take up a small fraction of the total table and yet that is an often used section, you --- 66,72 ---- When the <literal>WHERE</literal> clause is present, a <firstterm>partial index</firstterm> is created. A partial index is an index that contains entries for only a portion of ! a table, usually a portion that is more useful for indexing than the rest of the table. For example, if you have a table that contains both billed and unbilled orders where the unbilled orders take up a small fraction of the total table and yet that is an often used section, you *************** *** 77,85 **** </para> <para> The expression used in the <literal>WHERE</literal> clause may refer ! only to columns of the underlying table (but it can use all columns, ! not only the one(s) being indexed). Presently, subqueries and aggregate expressions are also forbidden in <literal>WHERE</literal>. The same restrictions apply to index fields that are expressions. </para> --- 77,93 ---- </para> <para> + <literal>NULL</> values are not indexed by default. The best way + to index <literal>NULL</> values is to create a partial index using + an <literal>IS NULL</> comparison. <literal>IS NULL</> is more + of a function call than a value comparison, and this is why a partial + index works. + </para> + + <para> The expression used in the <literal>WHERE</literal> clause may refer ! only to columns of the underlying table, but it can use all columns, ! not just the ones being indexed. Presently, subqueries and aggregate expressions are also forbidden in <literal>WHERE</literal>. The same restrictions apply to index fields that are expressions. </para>
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Yes, nifty. CREATE INDEX docs updated with: > > + <literal>NULL</> values are not indexed by default. The best way > + to index <literal>NULL</> values is to create a partial index using > + an <literal>IS NULL</> comparison. <literal>IS NULL</> is more > + of a function call than a value comparison, and this is why a partial > + index works. Uh, this is wrong. NULLs are indexed. It's just that IS NULL cannot take advantage of it due to technical details. These are NOT the same thing. Saying "NULLs are not indexed" will confuse people because it will make them think that they're not present in the index at all which is what Oracle does. That has real consequences on queries. The most obvious being that SELECT * FROM foo ORDER BY bar cannot take advantage of an index on bar. Oracle programmers are accustomed to having to had a "WHERE bar IS NOT NULL" or else live with the full table scan and sort. -- greg
Bruce Momjian <pgman@candle.pha.pa.us> writes: > + <literal>NULL</> values are not indexed by default. This is quite incorrect. The nulls *are* indexed (at least in btree indexes); the issue is whether there is any way to use the index to search for them. I do not think it helps anyone for the documentation to get this basic point wrong, even if the distinction is subtle. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > + <literal>NULL</> values are not indexed by default. > > This is quite incorrect. The nulls *are* indexed (at least in btree > indexes); the issue is whether there is any way to use the index to > search for them. I do not think it helps anyone for the documentation > to get this basic point wrong, even if the distinction is subtle. OK, docs updated with: Indexes can not be used with <literal>IS NULL</> clauses by default. The best way to use indexes in such cases is to create a partial index using an <literal>IS NULL</> comparison. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073