Detecting corruption

Поиск
Список
Период
Сортировка
От Anthony Nowocien
Тема Detecting corruption
Дата
Msg-id CAH5RRoM1MzAeDMersv92U-=aOQaJUf61v6vpr+ZpXCZG5eps-A@mail.gmail.com
обсуждение исходный текст
Список pgsql-general
Hi everyone,

articles from the wiki about corruption, Christophe Petus' talk in PgCon Eu2014 were of great interest but I'm still slightly at a loss in an issue I've faced recently.

Please assume the following:
- Version 9.0.18 on Debian boxes;
- Master/slave integrated replication and a back and forth failover was executed;
- Fine recent hardware;
- Backups, restores all working fine and no corruption that I could detect;
- fsync was never touched, no power loss, no funny issue in /var/log/syslog, no recent file system check

I have found two issues that must have been caused by our custom failover procedure, as both instances gave similar results for the same objects

1) Corrupted index
select a from t1 where a = 10; gave the rather puzzling
a
1
2
10

For every other value I could test, I had correct results. The previous query was using an index, and rebuilding it finally gave me expected results. Data corruption on indexes is annoying, even more as it can give rather puzzling results, but rebuilding is enough in order to recover.

2) Corrupted table
select b from t2 worked fine on this rather write intensive table, with a being a  timestamp.
select max(b) from t2 gave me the unfortunate error "Could not read Block X of relation base/Y/Z". In retrospect, this might have also been an index issue and I still have some analyze to do to pinpoint it.

I know that PostgreSQL is very trusting regarding the underlying file system, but I would also like to diagnose those corruption issues as early as possible. Only very specific queries gave errors. As said above, pg_dump (database wise or table wise) worked fine, restores went fine and the main log did not contain any glaring errors before the unfortunate "Could not read Block X". I thought that pg_dump, reading every single row in the database, would help in finding corruption... But such was not the case.

3rd party projects exists, 9.3 allows checksums (but at a rather hefty price it seems - how usable is it? ), but is there any way I could have detected  this corruption issue with 9.0? Do you also have advice on such (semi) silent corruption?

Thanks,

Anthony

В списке pgsql-general по дате отправления:

Предыдущее
От: David Johnston
Дата:
Сообщение: Re: FW: SQL rolling window without aggregation
Следующее
От: Albe Laurenz
Дата:
Сообщение: Re: Use cases for lateral that do not involve a set returning function