Обсуждение: BUG #16352: Data corruption in few tables

Поиск
Список
Период
Сортировка

BUG #16352: Data corruption in few tables

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16352
Logged by:          KrishnaKanth BN
Email address:      Krishnakanth_bn@infosys.com
PostgreSQL version: 9.6.13
Operating system:   Docker image postgres:9.6.13
Description:

Our Application is using Postgres Version (Image- postgres:9.6.13) and we
are running it as Docker Container with Volume mounted to a NFS storage. We
came across a peculiar issue.   

We had enabled unique key constraint on a particular table, it was violated
and had duplicate data. When this table was queried in the application, it
was resulting in chunk errors. This had major outage in the application.

ERROR:  missing chunk number 0 for toast value 241012 in pg_toast_2619
STATEMENT:  SELECT distinct role_name FROM tapplication_roles INNER JOIN
troles ON tapplication_roles.role_id = troles.role_id where user_id in
('user1','value1')

We followed several posts and somehow managed to remove the duplicate rows
and reindexed and vacuumed the table. We also had to manually remove
duplicate entries in pg_statistic table and reindex and vaccum it.

There are couple of other tables in which few rows are corrupted. It is
because of a corrupt value in one of the columns.

We did not see any error in the container logs to isolate this issue when it
first occurred.

Can you please throw some light what might have caused the corruption and
how can we prevent it?


Re: BUG #16352: Data corruption in few tables

От
KrishnaKanth BN
Дата:
Hi All,

Can you please help us on this issue?

On Wed 8 Apr, 2020, 21:30 PG Bug reporting form, <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      16352
Logged by:          KrishnaKanth BN
Email address:      Krishnakanth_bn@infosys.com
PostgreSQL version: 9.6.13
Operating system:   Docker image postgres:9.6.13
Description:       

Our Application is using Postgres Version (Image- postgres:9.6.13) and we
are running it as Docker Container with Volume mounted to a NFS storage. We
came across a peculiar issue.   

We had enabled unique key constraint on a particular table, it was violated
and had duplicate data. When this table was queried in the application, it
was resulting in chunk errors. This had major outage in the application.

ERROR:  missing chunk number 0 for toast value 241012 in pg_toast_2619
STATEMENT:  SELECT distinct role_name FROM tapplication_roles INNER JOIN
troles ON tapplication_roles.role_id = troles.role_id where user_id in
('user1','value1')

We followed several posts and somehow managed to remove the duplicate rows
and reindexed and vacuumed the table. We also had to manually remove
duplicate entries in pg_statistic table and reindex and vaccum it.

There are couple of other tables in which few rows are corrupted. It is
because of a corrupt value in one of the columns.

We did not see any error in the container logs to isolate this issue when it
first occurred.

Can you please throw some light what might have caused the corruption and
how can we prevent it?

Re: BUG #16352: Data corruption in few tables

От
Tomas Vondra
Дата:
On Wed, Apr 15, 2020 at 10:39:19PM +0530, KrishnaKanth BN wrote:
>Hi All,
>
>Can you please help us on this issue?
>

Well, this very much sounds like a case of data corruption caused by
storage - either at OS or hardware level. You'll have to dig in dmesg
and other system logs I guess.

>On Wed 8 Apr, 2020, 21:30 PG Bug reporting form, <noreply@postgresql.org>
>wrote:
>
>> The following bug has been logged on the website:
>>
>> Bug reference:      16352
>> Logged by:          KrishnaKanth BN
>> Email address:      Krishnakanth_bn@infosys.com
>> PostgreSQL version: 9.6.13
>> Operating system:   Docker image postgres:9.6.13
>> Description:
>>
>> Our Application is using Postgres Version (Image- postgres:9.6.13) and we
>> are running it as Docker Container with Volume mounted to a NFS storage. We
>> came across a peculiar issue.
>>

Well, I wouldn't be surprised if this was related to NFS.

>> We had enabled unique key constraint on a particular table, it was violated
>> and had duplicate data. When this table was queried in the application, it
>> was resulting in chunk errors. This had major outage in the application.
>>

I very much doubt creating a unique constraint would cause this, even if
the creation failed because of duplicate values. It might have made the
issue visible, but that's a different thing.

>> ERROR:  missing chunk number 0 for toast value 241012 in pg_toast_2619
>> STATEMENT:  SELECT distinct role_name FROM tapplication_roles INNER JOIN
>> troles ON tapplication_roles.role_id = troles.role_id where user_id in
>> ('user1','value1')
>>

This more of less confirms it has nothing to do with the constaint,
because "missing chunk" refers to TOAST table (we don't have chunks
anywhere else) and that's only for tables, not for indexes.


>> We followed several posts and somehow managed to remove the duplicate rows
>> and reindexed and vacuumed the table. We also had to manually remove
>> duplicate entries in pg_statistic table and reindex and vaccum it.
>>
>> There are couple of other tables in which few rows are corrupted. It is
>> because of a corrupt value in one of the columns.
>>

Well, it'd be useful to actually show us what you did etc.

>> We did not see any error in the container logs to isolate this issue when
>> it
>> first occurred.
>>
>> Can you please throw some light what might have caused the corruption and
>> how can we prevent it?
>>

Most likely some sort of storage issue - not necessarily at hardware
level, but it might be at OS level. Not sure if docker, or maybe it's a
networking / NFS issue.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services