Re: database corruption
От | Chris Travers |
---|---|
Тема | Re: database corruption |
Дата | |
Msg-id | 42606A69.9010102@travelamericas.com обсуждение исходный текст |
Ответ на | database corruption (Ian Westmacott <ianw@intellivid.com>) |
Ответы |
Re: database corruption
("Ian Westmacott" <ianw@intellivid.com>)
|
Список | pgsql-admin |
Hi Ian; I think it is important to figure out why this is happening. I would not want to run any production databases on systems that were failing like this. I am trying to figure out what are the likely causes of the errors... 1) Any other computers suffer random application crashes, power downs, etc. in your building? 2) I take it there are no Raid controllers involved? 3) RAM is non-ECC? 4) Are the systems on UPS's? If I could make a wild (and probably wrong) guess, I would wonder if something external to the system (like electrical supply) was introducing glitches into memory, causing bad data to be written. I am only mentioning it because I have implicated electrical supply in other cases where rare computer failurres weer affecting many systems... Ian Westmacott wrote: >For several weeks now we have been experiencing fairly >severe database corruption upon clean reboot. It is very >repeatable, and the corruption is of the following forms: > >ERROR: could not access status of transaction foo >DETAIL: could not open file "bar": No such file or directory > >ERROR: invalid page header in block foo of relation "bar" > >ERROR: uninitialized page in block foo of relation "bar" > > >At first, we believed this was related to XFS, and have >been pursuing investigations along those lines. However, >we have now experienced the exact same problem with JFS. > >Here are some details: > >- Postgres 7.4.2 >- 2.6.6 kernel.org kernel >- dedicated database partition >- repeatable with XFS and JFS (have not seen on ext3) >- repeatable with and without Linux software RAID 0 >- repeatable with IDE and SATA >- repeatable with and without fsync, and with fdatasync >- repeatable on multiple systems > > >I have two questions: > >- any known reason why this might be occurring? (we must > have something wrong, for this high rate of severe > error). > >- if I don't care about losing data, and am not interested > in trying to recover anything, how can I arrange for > Postgres to proceed normally? I know about > zero_damaged_pages, but this doesn't help with missing > transaction files and such. Is there any way to get > Postgres to chuck anything bad and proceed? > >Thanks, > > --Ian > > > >---------------------------(end of broadcast)--------------------------- >TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > > >
В списке pgsql-admin по дате отправления: