Обсуждение: Weird ..... (a=1 or a=2) <> (a=2 or a=1)
Hello Today, one user complained that one of the tickets in our system had disappeared some places but could be accessed other places. I thought this was weird and startet debugging. I have found out the sql statement with 'problems'. Can anybody explain me why A) returns 12 rows and B) returns 13 rows?. The only different is the "open OR new" / "new OR open" part. Should not they return the samme result? PostgreSQL ver.: 7.4.12 -------------------------------------------------- A) -------------------------------------------------- rtprod=# SELECT id,effectiveid,status,type,queue FROM Tickets main WHERE ((main.EffectiveId = main.id)) AND ((main.Status != 'deleted')) AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status = 'open') OR (main.Status = 'new') ) ) ORDER BY main.id ASC LIMIT 50; id | effectiveid | status | type | queue -------+-------------+--------+--------+------- 37968 | 37968 | open | ticket | 29 38052 | 38052 | open | ticket | 29 38176 | 38176 | open | ticket | 29 38185 | 38185 | open | ticket | 29 38386 | 38386 | open | ticket | 29 38394 | 38394 | open | ticket | 29 38403 | 38403 | open | ticket | 29 38406 | 38406 | open | ticket | 29 38422 | 38422 | open | ticket | 29 38467 | 38467 | open | ticket | 29 38474 | 38474 | open | ticket | 29 38530 | 38530 | open | ticket | 29 (12 rows) -------------------------------------------------- B) -------------------------------------------------- rtprod=# SELECT id,effectiveid,status,type,queue FROM Tickets main WHERE ((main.EffectiveId = main.id)) AND ((main.Status != 'deleted')) AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status = 'new') OR (main.Status = 'open') ) ) ORDER BY main.id ASC LIMIT 50; id | effectiveid | status | type | queue -------+-------------+--------+--------+------- 37775 | 37775 | new | ticket | 29 37968 | 37968 | open | ticket | 29 38052 | 38052 | open | ticket | 29 38176 | 38176 | open | ticket | 29 38185 | 38185 | open | ticket | 29 38386 | 38386 | open | ticket | 29 38394 | 38394 | open | ticket | 29 38403 | 38403 | open | ticket | 29 38406 | 38406 | open | ticket | 29 38422 | 38422 | open | ticket | 29 38467 | 38467 | open | ticket | 29 38474 | 38474 | open | ticket | 29 38530 | 38530 | open | ticket | 29 (13 rows) -------------------------------------------------- The ticket with id=37775 is the one that disappear some places in the application. Here is the explain analyze of these statements: -------------------------------------------------- rtprod=# explain analyze SELECT id,effectiveid,status,type,queue FROM Tickets main WHERE ((main.EffectiveId = main.id)) AND ((main.Status != 'deleted')) AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status = 'open') OR (main.Status = 'new') ) ) ORDER BY main.id ASC LIMIT 50; QUERY PLAN -------------------------------------------------------------------------------------- Limit (cost=1658.83..1658.84 rows=1 width=33) (actual time=4.003..4.028 rows=12 loops=1) -> Sort (cost=1658.83..1658.84 rows=1 width=33) (actual time=3.999..4.008 rows=12 loops=1) Sort Key: id -> Index Scan using tickets6, tickets6 on tickets main (cost=0.00..1658.82 rows=1 width=33) (actual time=1.001..3.969 rows=12 loops=1) Index Cond: (((status)::text = 'open'::text) OR ((status)::text = 'new'::text)) Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND (("type")::text = 'ticket'::text) AND (queue = 29)) Total runtime: 4.104 ms (7 rows) rtprod=# explain analyze SELECT id,effectiveid,status,type,queue FROM Tickets main WHERE ((main.EffectiveId = main.id)) AND ((main.Status != 'deleted')) AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status = 'new') OR (main.Status = 'open') ) ) ORDER BY main.id ASC LIMIT 50; QUERY PLAN ---------------------------------------------------------------------------------------- Limit (cost=1658.83..1658.84 rows=1 width=33) (actual time=3.946..3.974 rows=13 loops=1) -> Sort (cost=1658.83..1658.84 rows=1 width=33) (actual time=3.944..3.954 rows=13 loops=1) Sort Key: id -> Index Scan using tickets6, tickets6 on tickets main (cost=0.00..1658.82 rows=1 width=33) (actual time=0.790..3.909 rows=13 loops=1) Index Cond: (((status)::text = 'new'::text) OR ((status)::text = 'open'::text)) Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND (("type")::text = 'ticket'::text) AND (queue = 29)) Total runtime: 4.052 ms (7 rows) -------------------------------------------------- Thanks in advance -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
"Rafael Martinez, Guerrero" <r.m.guerrero@usit.uio.no> writes: > I have found out the sql statement with 'problems'. Can anybody explain > me why A) returns 12 rows and B) returns 13 rows?. The only different is > the "open OR new" / "new OR open" part. Should not they return the samme > result? That is ... simply bizarre. Could we see all the system columns for the rows in question, ie select ctid,oid,xmin,xmax,cmin,cmax, id,effectiveid, ... (the rest as in your second query that gets all the rows) Leave out "oid" if you have the table made without oids. I suspect this may be a question of a corrupt index, in which case REINDEXing the index being used would fix it. But before you try that, please save a physical copy of the index file (immediately after doing a CHECKPOINT, if the database is being actively modified). I would like to dig through it and try to understand the nature of the corruption, if that's what the problem is. regards, tom lane
On Tue, 2006-05-16 at 11:14 -0400, Tom Lane wrote: > "Rafael Martinez, Guerrero" <r.m.guerrero@usit.uio.no> writes: > > I have found out the sql statement with 'problems'. Can anybody explain > > me why A) returns 12 rows and B) returns 13 rows?. The only different is > > the "open OR new" / "new OR open" part. Should not they return the samme > > result? > > That is ... simply bizarre. Could we see all the system columns for the > rows in question, ie > select ctid,oid,xmin,xmax,cmin,cmax, id,effectiveid, ... > (the rest as in your second query that gets all the rows) > Leave out "oid" if you have the table made without oids. > rtprod=# SELECT ctid,oid,xmin,xmax,cmin,cmax,id,effectiveid,status,type,queue FROM Tickets main WHERE ((main.EffectiveId = main.id)) AND ((main.Status != 'deleted'))AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status= 'new') OR (main.Status = 'open') ) ) ORDER BY main.id ASC LIMIT 50; ctid | oid | xmin | xmax | cmin | cmax | id | effectiveid | status | type | queue -----------+---------+-----------+------+------+------+-------+-------------+--------+--------+------- (880,18) | 5080015 | 431831923 | 149 | 149 | 0 | 37775 | 37775 | new | ticket | 29 (1004,13) | 5103190 | 440233858 | 0 | 0 | 0 | 37968 | 37968 | open | ticket | 29 (1004,10) | 5112604 | 440233792 | 0 | 0 | 0 | 38052 | 38052 | open | ticket | 29 (995,13) | 5130149 | 440233870 | 0 | 0 | 0 | 38176 | 38176 | open | ticket | 29 (1020,2) | 5132134 | 441184224 | 0 | 0 | 0 | 38185 | 38185 | open | ticket | 29 (1004,5) | 5160459 | 440828297 | 38 | 38 | 0 | 38386 | 38386 | open | ticket | 29 (1004,3) | 5161571 | 440233745 | 0 | 0 | 0 | 38394 | 38394 | open | ticket | 29 (1020,5) | 5163792 | 441195836 | 38 | 38 | 0 | 38403 | 38403 | open | ticket | 29 (1019,3) | 5164449 | 441183696 | 38 | 38 | 0 | 38406 | 38406 | open | ticket | 29 (1015,14) | 5167225 | 441188439 | 38 | 38 | 0 | 38422 | 38422 | open | ticket | 29 (1021,3) | 5172082 | 441185101 | 38 | 38 | 0 | 38474 | 38474 | open | ticket | 29 (968,37) | 5176170 | 440990670 | 0 | 0 | 0 | 38530 | 38530 | open | ticket | 29 (1015,11) | 5177554 | 441183605 | 0 | 0 | 0 | 38539 | 38539 | open | ticket | 29 (13 rows) > I suspect this may be a question of a corrupt index, in which case > REINDEXing the index being used would fix it. This is what I thought when I found out the problem. So before I sent my first e-mail I executed a "reindex index tickets6" but it did not help. > But before you try that, > please save a physical copy of the index file (immediately after doing a > CHECKPOINT, if the database is being actively modified). I would like > to dig through it and try to understand the nature of the corruption, > if that's what the problem is. > -------------------------------------------------- rtprod=# SELECT relname,relfilenode from pg_class where relname = 'tickets6'; relname | relfilenode ----------+------------- tickets6 | 5177103 rtprod=# checkpoint; CHECKPOINT -bash-2.05b$ ls -l 5177103 -rw------- 1 postgres pgdba 1056768 May 16 18:10 5177103 -------------------------------------------------- The index file after a CHECKPOINT can be downloaded from http://folk.uio.no/rafael/5177103 A new reindex does not help: -------------------------------------------------- rtprod=# reindex index tickets6; REINDEX SELECT id,effectiveid,status,type,queue FROM Tickets main WHERE ((main.EffectiveId = main.id)) AND ((main.Status != 'deleted'))AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status= 'open') OR (main.Status = 'new') ) ) ORDER BY main.id ASC LIMIT 50; id | effectiveid | status | type | queue -------+-------------+--------+--------+------- 37968 | 37968 | open | ticket | 29 38052 | 38052 | open | ticket | 29 38176 | 38176 | open | ticket | 29 38185 | 38185 | open | ticket | 29 38386 | 38386 | open | ticket | 29 38394 | 38394 | open | ticket | 29 38403 | 38403 | open | ticket | 29 38406 | 38406 | open | ticket | 29 38422 | 38422 | open | ticket | 29 38474 | 38474 | open | ticket | 29 38530 | 38530 | open | ticket | 29 38539 | 38539 | open | ticket | 29 (12 rows) rtprod=# SELECT id,effectiveid,status,type,queue FROM Tickets main WHERE ((main.EffectiveId = main.id)) AND ((main.Status != 'deleted'))AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status= 'new') OR (main.Status = 'open') ) ) ORDER BY main.id ASC LIMIT 50; id | effectiveid | status | type | queue -------+-------------+--------+--------+------- 37775 | 37775 | new | ticket | 29 37968 | 37968 | open | ticket | 29 38052 | 38052 | open | ticket | 29 38176 | 38176 | open | ticket | 29 38185 | 38185 | open | ticket | 29 38386 | 38386 | open | ticket | 29 38394 | 38394 | open | ticket | 29 38403 | 38403 | open | ticket | 29 38406 | 38406 | open | ticket | 29 38422 | 38422 | open | ticket | 29 38474 | 38474 | open | ticket | 29 38530 | 38530 | open | ticket | 29 38539 | 38539 | open | ticket | 29 (13 rows) -------------------------------------------------- -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
Rafael Martinez <r.m.guerrero@usit.uio.no> writes: > On Tue, 2006-05-16 at 11:14 -0400, Tom Lane wrote: >> I suspect this may be a question of a corrupt index, in which case >> REINDEXing the index being used would fix it. > This is what I thought when I found out the problem. So before I sent my > first e-mail I executed a "reindex index tickets6" but it did not help. So much for that theory. If you copy the table (create table foo as select * from tickets) and build a similar index on the copy, does the behavior persist in the copy? I'm wondering a little bit about encoding issues. What encoding does the database have (see \l) and what do SHOW LC_COLLATE and SHOW LC_CTYPE show? regards, tom lane
On Tue, 2006-05-16 at 12:32 -0400, Tom Lane wrote: > Rafael Martinez <r.m.guerrero@usit.uio.no> writes: > > On Tue, 2006-05-16 at 11:14 -0400, Tom Lane wrote: > >> I suspect this may be a question of a corrupt index, in which case > >> REINDEXing the index being used would fix it. > > > This is what I thought when I found out the problem. So before I sent my > > first e-mail I executed a "reindex index tickets6" but it did not help. > > So much for that theory. If you copy the table (create table foo as > select * from tickets) and build a similar index on the copy, does the > behavior persist in the copy? > The new table behaves well: ---------------------------------------------- rtprod=# CREATE TABLE foo as select * from tickets; SELECT rtprod=# CREATE INDEX foo6 on foo (status); CREATE INDEX rtprod=# SELECT id,effectiveid,status,type,queue FROM foo main WHERE ((main.EffectiveId = main.id)) AND ((main.Status != 'deleted'))AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status= 'open') OR (main.Status = 'new') ) ) ORDER BY main.id ASC LIMIT 50; id | effectiveid | status | type | queue -------+-------------+--------+--------+------- 37775 | 37775 | new | ticket | 29 37968 | 37968 | open | ticket | 29 38052 | 38052 | open | ticket | 29 38176 | 38176 | open | ticket | 29 38185 | 38185 | open | ticket | 29 38386 | 38386 | open | ticket | 29 38394 | 38394 | open | ticket | 29 38403 | 38403 | open | ticket | 29 38406 | 38406 | open | ticket | 29 38422 | 38422 | open | ticket | 29 38474 | 38474 | open | ticket | 29 38530 | 38530 | open | ticket | 29 38539 | 38539 | open | ticket | 29 (13 rows) rtprod=# SELECT id,effectiveid,status,type,queue FROM foo main WHERE ((main.EffectiveId = main.id)) AND ((main.Status != 'deleted'))AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status= 'new') OR (main.Status = 'open') ) ) ORDER BY main.id ASC LIMIT 50; id | effectiveid | status | type | queue -------+-------------+--------+--------+------- 37775 | 37775 | new | ticket | 29 37968 | 37968 | open | ticket | 29 38052 | 38052 | open | ticket | 29 38176 | 38176 | open | ticket | 29 38185 | 38185 | open | ticket | 29 38386 | 38386 | open | ticket | 29 38394 | 38394 | open | ticket | 29 38403 | 38403 | open | ticket | 29 38406 | 38406 | open | ticket | 29 38422 | 38422 | open | ticket | 29 38474 | 38474 | open | ticket | 29 38530 | 38530 | open | ticket | 29 38539 | 38539 | open | ticket | 29 (13 rows) ---------------------------------------------- > I'm wondering a little bit about encoding issues. What encoding does > the database have (see \l) SQL_ASCII > and what do SHOW LC_COLLATE and SHOW LC_CTYPE > show? > ---------------------------------------------- rtprod=# SHOW LC_COLLATE; lc_collate ------------ C (1 row) rtprod=# SHOW LC_CTYPE; lc_ctype ---------- C (1 row) ---------------------------------------------- -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
Rafael Martinez <r.m.guerrero@usit.uio.no> writes: > On Tue, 2006-05-16 at 11:14 -0400, Tom Lane wrote: >> So much for that theory. If you copy the table (create table foo as >> select * from tickets) and build a similar index on the copy, does the >> behavior persist in the copy? > The new table behaves well: Did you check that you were getting the same indexscan plans there? (If not, try ANALYZEing the copied table.) You probably were, but just in case. This is definitely pretty baffling. I'm getting to the point where I want to step through the code with a debugger. I assume that's not very practical on your live server. Would it be feasible at all to get a physical copy of the database for testing? Alternatively, do you know C and gdb well enough to try to debug it for yourself? regards, tom lane
On Tue, 2006-05-16 at 13:01 -0400, Tom Lane wrote: > Rafael Martinez <r.m.guerrero@usit.uio.no> writes: > > On Tue, 2006-05-16 at 11:14 -0400, Tom Lane wrote: > >> So much for that theory. If you copy the table (create table foo as > >> select * from tickets) and build a similar index on the copy, does the > >> behavior persist in the copy? > > > The new table behaves well: > > Did you check that you were getting the same indexscan plans there? > (If not, try ANALYZEing the copied table.) You probably were, but > just in case. > Yes I did, it looks good: ------------------------------------------------------- rtprod=# explain analyze SELECT id,effectiveid,status,type,queue FROM foo main WHERE ((main.EffectiveId = main.id)) AND ((main.Status != 'deleted'))AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status= 'open') OR (main.Status = 'new') ) ) ORDER BY main.id ASC LIMIT 50; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=1842.29..1842.30 rows=1 width=33) (actual time=4.112..4.142 rows=13 loops=1) -> Sort (cost=1842.29..1842.30 rows=1 width=33) (actual time=4.109..4.119 rows=13 loops=1) Sort Key: id -> Index Scan using foo6, foo6 on foo main (cost=0.00..1842.28 rows=1 width=33) (actual time=1.895..4.072 rows=13 loops=1) Index Cond: (((status)::text = 'open'::text) OR ((status)::text = 'new'::text)) Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND (("type")::text = 'ticket'::text) AND (queue = 29)) Total runtime: 4.216 ms (7 rows) rtprod=# explain analyze SELECT id,effectiveid,status,type,queue FROM foo main WHERE ((main.EffectiveId = main.id)) AND ((main.Status != 'deleted'))AND ((main.Type = 'ticket')) AND ((main.Queue = '29')AND ( (main.Status= 'new') OR (main.Status = 'open') ) ) ORDER BY main.id ASC LIMIT 50; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=1842.29..1842.30 rows=1 width=33) (actual time=4.121..4.149 rows=13 loops=1) -> Sort (cost=1842.29..1842.30 rows=1 width=33) (actual time=4.117..4.128 rows=13 loops=1) Sort Key: id -> Index Scan using foo6, foo6 on foo main (cost=0.00..1842.28 rows=1 width=33) (actual time=0.800..4.084 rows=13 loops=1) Index Cond: (((status)::text = 'new'::text) OR ((status)::text = 'open'::text)) Filter: ((effectiveid = id) AND ((status)::text <> 'deleted'::text) AND (("type")::text = 'ticket'::text) AND (queue = 29)) Total runtime: 4.228 ms (7 rows) ------------------------------------------------------- > This is definitely pretty baffling. I'm getting to the point where > I want to step through the code with a debugger. I assume that's not > very practical on your live server. Would it be feasible at all to > get a physical copy of the database for testing? Alternatively, do > you know C and gdb well enough to try to debug it for yourself? > I am going to make a physical copy of the database and install it in a test server (it is ca.3.6GB). I can C and some gdb but I do not work with them on a daily basis and it is a long time since I battled with them. I think I don't have the level needed to find the reason of this problem in this case. I have to get the approval from the system owner before an external person can get access to the test server (tomorrow is the national day here in Norway, so I can not do anything until thursday). I don't think this will be a problem, the best thing will be to get your public ssh key so you can login without a password. -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/