Обсуждение: Fwd: Postgres update
Hello, What does this mean????? ---------- Forwarded Message ---------- Subject: Postgres update Date: Fri, 28 Jul 2000 04:00:09 -0500 (EST) From: webmail@webmailstation.com (WebmailStation User) CREATE DROP ALTER psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: NOTICE: --Relation pg_largeobject-- psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: NOTICE: Pages 577: Changed 0, reaped 549, Empty 0, New 0; Tup105038: Vac 12554, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 36, MaxLen 36; Re-using: Free/Avail. Space 469760/465060; EndEmpty/Avail.Pages 0/548. CPU 0.00s/0.38u sec. psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: NOTICE: Index pg_largeobject_loid_index: Pages 209; Tuples 105038:Deleted 0. CPU 0.01s/0.50u sec. psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: NOTICE: Rel pg_largeobject: Pages: 577 --> 515; Tuple(s) moved:10696. CPU 0.11s/1.08u sec. psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: NOTICE: Index pg_largeobject_loid_index: Pages 250; Tuples 105038:Deleted 10696. CPU 0.01s/0.55u sec. psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: NOTICE: FlushRelationBuffers(pg_largeobject, 515): block 504is referenced (private 0, global 1) psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: FATAL 1: VACUUM (repair_frag): FlushRelationBuffers returned-2 pqReadData() -- backend closed the channel unexpectedly.This probably means the backend terminated abnormallybefore or whileprocessing the request. psql:/home/www/www.webmailstation.com/sql/reindex.lo.sql:5: connection to server was lost ------------------------------------------------------- -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
Denis Perchine <dyp@perchine.com> writes: > NOTICE: FlushRelationBuffers(pg_largeobject, 515): block 504 is referenced (private 0, global 1) > FATAL 1: VACUUM (repair_frag): FlushRelationBuffers returned -2 Hmm, there's another report of that in the archives. You've got a buffer that has a positive reference count even though (presumably) no one is using it. VACUUM is quitting out of paranoia that maybe some other backend is accessing the table --- there's unlikely to be any actual data corruption here, just (over?) caution. You can get back to a state where VACUUM will work on the table by restarting the postmaster, but to fix the real problem we need to figure out how the system got into this state in the first place. Can you produce a repeatable example of a series of queries that gets you into this state? regards, tom lane
> > NOTICE: FlushRelationBuffers(pg_largeobject, 515): block 504 is referenced (private 0, global 1) > > FATAL 1: VACUUM (repair_frag): FlushRelationBuffers returned -2 > > Hmm, there's another report of that in the archives. You've got a > buffer that has a positive reference count even though (presumably) > no one is using it. VACUUM is quitting out of paranoia that maybe > some other backend is accessing the table --- there's unlikely to be > any actual data corruption here, just (over?) caution. > > You can get back to a state where VACUUM will work on the table by > restarting the postmaster, but to fix the real problem we need to figure > out how the system got into this state in the first place. Can you > produce a repeatable example of a series of queries that gets you into > this state? I get this after the following: psql:/home/www/www.webmailstation.com/sql/reindex.sql:75: NOTICE: !!! write error seems permanent !!!psql:/home/www/www.webmailstation.com/sql/reindex.sql:75:NOTICE: !!! now kill all backends and reset postmaster !!! psql:/home/www/www.webmailstation.com/sql/reindex.sql:75: ERROR: cannot write block 175 of ix_q_b_1 [webmailstation] blind pqReadData() -- backend closed the channel unexpectedly.This probably means the backend terminated abnormallybefore or whileprocessing the request. psql:/home/www/www.webmailstation.com/sql/reindex.sql:75: connection to server was lost This was the command which should create unique index. Something happend and index became corrupted. After that postgres starts to eat up memory and I killed him. I recognized that this happend on update of the table on which the index was build and that update uses the index. It is hard to reproduce this... I would like to give you binary data, but unfortunatly I was forced to rebuild index ASAP and has finished investigation later... -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
Denis Perchine <dyp@perchine.com> writes: >>>> NOTICE: FlushRelationBuffers(pg_largeobject, 515): block 504 is referenced (private 0, global 1) >>>> FATAL 1: VACUUM (repair_frag): FlushRelationBuffers returned -2 > I get this after the following: > NOTICE: !!! write error seems permanent !!! > NOTICE: !!! now kill all backends and reset postmaster !!! > ERROR: cannot write block 175 of ix_q_b_1 [webmailstation] blind > pqReadData() -- backend closed the channel unexpectedly. Oh, that's interesting. The NOTICEs are coming out of AbortBufferIO() which is invoked during error processing (in other words, I bet the ERROR actually happened first. It's a libpq artifact that the NOTICEs are presented first on the client side. If you are keeping the postmaster log output you could confirm the sequence of events by looking in the log). The backend shutdown is then forced by AbortBufferIO(). AbortBufferIO() seems rather badly designed, but given that it forces a database-wide restart, I'm not sure how this could relate to the later FlushRelationBuffers problem. The restart should get rid of the old buffers anyway. > This was the command which should create unique index. Was the index on the same table that FlushRelationBuffers later had trouble with (ie, "pg_largeobject")? What version are you running, anyway? There is no "pg_largeobject" in either 6.5 or current AFAIK. regards, tom lane
> -----Original Message----- > From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On > Behalf Of Tom Lane > > Denis Perchine <dyp@perchine.com> writes: > >>>> NOTICE: FlushRelationBuffers(pg_largeobject, 515): block > 504 is referenced (private 0, global 1) > >>>> FATAL 1: VACUUM (repair_frag): FlushRelationBuffers returned -2 > > > I get this after the following: > > > NOTICE: !!! write error seems permanent !!! > > NOTICE: !!! now kill all backends and reset postmaster !!! > > ERROR: cannot write block 175 of ix_q_b_1 [webmailstation] blind > > pqReadData() -- backend closed the channel unexpectedly. > > Oh, that's interesting. The NOTICEs are coming out of AbortBufferIO() > which is invoked during error processing (in other words, I bet the > ERROR actually happened first. It's a libpq artifact that the NOTICEs > are presented first on the client side. If you are keeping the > postmaster log output you could confirm the sequence of events by > looking in the log). The backend shutdown is then forced by > AbortBufferIO(). > > AbortBufferIO() seems rather badly designed, but given that it forces > a database-wide restart, I'm not sure how this could relate to the It was me who introduced xxxxBufferIO routines to avoid io_in_progress spinlock freezing. Unfortunately I didn't think of any elegant way to recover parmanent write error then. I'm judging the error is parmanent when write error for the buffer occurs twice. Because you changed bufmgr before 7.0,we may be able to avoid a database-wide restart. However don't we have to leave the buffer still dirty even after your change ? If I recognize correctly,we couldn't know which backends dirtied the buffer. Even though we could know it,we couldn't know if the transactions which dirtied the buffer are still running. Regards. Hiroshi Inoue
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > It was me who introduced xxxxBufferIO routines to avoid io_in_progress > spinlock freezing. Unfortunately I didn't think of any elegant way to > recover parmanent write error then. I'm judging the error is parmanent > when write error for the buffer occurs twice. > Because you changed bufmgr before 7.0,we may be able to avoid a > database-wide restart. However don't we have to leave the buffer > still dirty even after your change ? If I recognize correctly,we couldn't > know which backends dirtied the buffer. Even though we could know > it,we couldn't know if the transactions which dirtied the buffer are still > running. Right, that's why we can't just mark the buffer broken and recycle it. If there is another transaction in progress that's dirtied that same buffer, it would come back, find the disk page no longer present in that buffer, conclude that its changes were successfully written, and go ahead and commit. No good, it has to fail. So, leave the buffer sitting there marked BM_DIRTY + BM_IO_ERROR. When the other xact wants to commit, it will try to write the buffer, fail, and abort. (Or, if perchance it doesn't fail, that's OK too.) The only way to discard the buffer short of a postmaster reset would be if we knew no current transaction could have dirtied the buffer. Hmm, I wonder if we could have VACUUM do that before it starts the main vacuuming process? If we have exclusive lock on the table, then presumably there can be no other xacts with pending changes. So maybe the first step of VACUUM could be to sweep the buffer table for buffers of that relation that are marked BM_DIRTY + BM_IO_ERROR, and discard them? Not sure about this. Also, I think deleting the relation that owns the broken buffer will clean it up correctly in current sources --- ReleaseRelationBuffers doesn't care if the buffer is dirty or not. regards, tom lane
Hello Tom, > >>>> NOTICE: FlushRelationBuffers(pg_largeobject, 515): block 504 is referenced (private 0, global 1) > >>>> FATAL 1: VACUUM (repair_frag): FlushRelationBuffers returned -2 > > > I get this after the following: > > > NOTICE: !!! write error seems permanent !!! > > NOTICE: !!! now kill all backends and reset postmaster !!! > > ERROR: cannot write block 175 of ix_q_b_1 [webmailstation] blind > > pqReadData() -- backend closed the channel unexpectedly. > > Oh, that's interesting. The NOTICEs are coming out of AbortBufferIO() > which is invoked during error processing (in other words, I bet the > ERROR actually happened first. It's a libpq artifact that the NOTICEs > are presented first on the client side. If you are keeping the > postmaster log output you could confirm the sequence of events by > looking in the log). The backend shutdown is then forced by > AbortBufferIO(). > > AbortBufferIO() seems rather badly designed, but given that it forces > a database-wide restart, I'm not sure how this could relate to the > later FlushRelationBuffers problem. The restart should get rid of the > old buffers anyway. > > > This was the command which should create unique index. > > Was the index on the same table that FlushRelationBuffers later had > trouble with (ie, "pg_largeobject")? > > What version are you running, anyway? There is no "pg_largeobject" > in either 6.5 or current AFAIK. :-))) Sorry. Just to concatenate the pieces... I use modified 7.0.2. I applied my patch for largeobject (that one with files in hash dirs). That's why you can see pg_largeobject. But this is not an issue here. That patch modifies only large object related stuff. I get vacuum error first on pg_largeobject. Later index was automaticaly recreated (I have a cron job) and all became fine. And when I replied on your mail I get an error in table queue. It started when I noticed that postmaster starts to eat up memory. I shut it down and look at the log. The last query was update on queue table. I tried to vacuum the table and get the same error as in the last time. Then I droped index and recreate it and all became fine. When later I go through the reports of cron (I do dropping and recreateing of indices each day) I found out the error message during recreating the index for this table. That is all. -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > Because you changed bufmgr before 7.0,we may be able to avoid a > > database-wide restart. However don't we have to leave the buffer > > still dirty even after your change ? If I recognize > correctly,we couldn't > > know which backends dirtied the buffer. Even though we could know > > it,we couldn't know if the transactions which dirtied the > buffer are still > > running. > > Right, that's why we can't just mark the buffer broken and recycle it. > If there is another transaction in progress that's dirtied that same > buffer, it would come back, find the disk page no longer present in that > buffer, conclude that its changes were successfully written, and go > ahead and commit. No good, it has to fail. So, leave the buffer > sitting there marked BM_DIRTY + BM_IO_ERROR. When the other xact wants > to commit, it will try to write the buffer, fail, and abort. (Or, > if perchance it doesn't fail, that's OK too.) > Agreed. However,BM_DIRTY+BM_IO_ERROR buffers seems to have to be excluded from freelist of buffers. Then IO errors would be limited to backends which really need to update the buffers. > The only way to discard the buffer short of a postmaster reset would be > if we knew no current transaction could have dirtied the buffer. Hmm, > I wonder if we could have VACUUM do that before it starts the main > vacuuming process? If we have exclusive lock on the table, then > presumably there can be no other xacts with pending changes. So maybe You are right. > the first step of VACUUM could be to sweep the buffer table for buffers > of that relation that are marked BM_DIRTY + BM_IO_ERROR, and discard > them? Not sure about this. > Sounds reasonbale. Regards. Hiroshi Inoue
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > However,BM_DIRTY+BM_IO_ERROR buffers seems to have to be > excluded from freelist of buffers. Then IO errors would be limited > to backends which really need to update the buffers. Good idea, that will prevent unrelated transactions from getting in trouble by trying to flush and re-use the buffer. regards, tom lane