Обсуждение: What could cause CREATE TEMP... "could not read block" error?

Поиск
Список
Период
Сортировка

What could cause CREATE TEMP... "could not read block" error?

От
Chris Richards
Дата:
Howdy. I have two tables that I'm joining together and using the result to create a temporary table. Performing the join operation works fine; it produces 0 rows (as expected). If I prepend "CREATE TEMP tmp_policyqueue AS" to the SELECT then it generates this error:

ERROR:  could not read block 39 in file "base/16391/11937": read only 0 of 8192 bytes

$ psql
psql (9.3.9)
mdb=> CREATE TEMP TABLE tmp_policyqueue AS
mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id=sp.polidx;
ERROR:  could not read block 40 in file "base/16391/11937": read only 0 of 8192 bytes

You'll also observe that the block number is changing each time I execute the command. I know very little about postgres internal structure so it may be irrelevant. I've left my database in this state should extra information be needed.

Thanks for any assist.

Chris

Re: What could cause CREATE TEMP... "could not read block" error?

От
Albe Laurenz
Дата:
Chris Richards wrote:
> Howdy. I have two tables that I'm joining together and using the result to create a temporary table.
> Performing the join operation works fine; it produces 0 rows (as expected). If I prepend "CREATE TEMP
> tmp_policyqueue AS" to the SELECT then it generates this error:
> 
> ERROR:  could not read block 39 in file "base/16391/11937": read only 0 of 8192 bytes
> 
> 
> $ psql
> psql (9.3.9)
> 
> mdb=> CREATE TEMP TABLE tmp_policyqueue AS
> 
> mdb-> SELECT pq.* FROM policyqueue AS pq
> mdb-> JOIN seed_progress AS sp ON pq.id=sp.polidx;
> ERROR:  could not read block 40 in file "base/16391/11937": read only 0 of 8192 bytes
> 
> You'll also observe that the block number is changing each time I execute the command. I know very
> little about postgres internal structure so it may be irrelevant. I've left my database in this state
> should extra information be needed.

It would be interesting to know what object is affected:

SELECT s.nspname AS schemaname, t.relname AS objectname, t.relkind
FROM pg_class t JOIN
     pg_namespace s ON t.relnamespace = s.oid
WHERE t.relfilenode = 11937;

If it is an index, REINDEX should help.

What is the statement that performs the join operation and works just fine?

Yours,
Laurenz Albe

Re: What could cause CREATE TEMP... "could not read block" error?

От
Adrian Klaver
Дата:
On 11/20/2015 06:18 AM, Albe Laurenz wrote:
> Chris Richards wrote:
>> Howdy. I have two tables that I'm joining together and using the result to create a temporary table.
>> Performing the join operation works fine; it produces 0 rows (as expected). If I prepend "CREATE TEMP
>> tmp_policyqueue AS" to the SELECT then it generates this error:
>>
>> ERROR:  could not read block 39 in file "base/16391/11937": read only 0 of 8192 bytes
>>
>>
>> $ psql
>> psql (9.3.9)
>>
>> mdb=> CREATE TEMP TABLE tmp_policyqueue AS
>>
>> mdb-> SELECT pq.* FROM policyqueue AS pq
>> mdb-> JOIN seed_progress AS sp ON pq.id=sp.polidx;
>> ERROR:  could not read block 40 in file "base/16391/11937": read only 0 of 8192 bytes
>>
>> You'll also observe that the block number is changing each time I execute the command. I know very
>> little about postgres internal structure so it may be irrelevant. I've left my database in this state
>> should extra information be needed.
>
> It would be interesting to know what object is affected:
>
> SELECT s.nspname AS schemaname, t.relname AS objectname, t.relkind
> FROM pg_class t JOIN
>       pg_namespace s ON t.relnamespace = s.oid
> WHERE t.relfilenode = 11937;
>
> If it is an index, REINDEX should help.
>
> What is the statement that performs the join operation and works just fine?

If I am following correctly it is:

mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id=sp.polidx;

>
> Yours,
> Laurenz Albe
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: What could cause CREATE TEMP... "could not read block" error?

От
Adrian Klaver
Дата:
On 11/18/2015 04:31 PM, Chris Richards wrote:
> Howdy. I have two tables that I'm joining together and using the result
> to create a temporary table. Performing the join operation works fine;
> it produces 0 rows (as expected). If I prepend "CREATE TEMP
> tmp_policyqueue AS" to the SELECT then it generates this error:
>
> ERROR:  could not read block 39 in file "base/16391/11937": read only 0
> of 8192 bytes
>
> $ psql
> psql (9.3.9)
> mdb=> CREATE TEMP TABLE tmp_policyqueue AS
> mdb-> SELECT pq.* FROM policyqueue AS pq
> mdb-> JOIN seed_progress AS sp ON pq.id <http://pq.id>=sp.polidx;
> ERROR:  could not read block 40 in file "base/16391/11937": read only 0
> of 8192 bytes
>
> You'll also observe that the block number is changing each time I
> execute the command. I know very little about postgres internal
> structure so it may be irrelevant. I've left my database in this state
> should extra information be needed.
>
> Thanks for any assist.

What happens if you do?:

SELECT pq.* INTO TEMP TABLE tmp_policyqueue FROM policyqueue AS pq
JOIN seed_progress AS sp ON pq.id=sp.polidx;

What does \d show for:

policyqueue

seed_progress


>
> Chris


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: What could cause CREATE TEMP... "could not read block" error?

От
Chris Richards
Дата:
Adrian is correct. This worked by itself whereas using it in the creation of the temporary table failed.

mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id=sp.polidx;

I checked the query Albe suggested; there were two `relfilenode`s (11936 and 11937) that exhibited the error. Respectively, they were pg_depend_depender_index and pg_depend_reference_index.

Unfortunately, I didn't disable the nightly processes and something must  have(?) fixed the glitch; at midnight GMT the query ran successfully. Ugh.

If it crops up again, I have some tools to try and capture data immediately, and the suggested REINDEX since both appear to be indices.

Thanks for the help. It's appreciated.

Chris

On Fri, Nov 20, 2015 at 8:28 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/20/2015 06:18 AM, Albe Laurenz wrote:
Chris Richards wrote:
Howdy. I have two tables that I'm joining together and using the result to create a temporary table.
Performing the join operation works fine; it produces 0 rows (as expected). If I prepend "CREATE TEMP
tmp_policyqueue AS" to the SELECT then it generates this error:

ERROR:  could not read block 39 in file "base/16391/11937": read only 0 of 8192 bytes


$ psql
psql (9.3.9)

mdb=> CREATE TEMP TABLE tmp_policyqueue AS

mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id=sp.polidx;
ERROR:  could not read block 40 in file "base/16391/11937": read only 0 of 8192 bytes

You'll also observe that the block number is changing each time I execute the command. I know very
little about postgres internal structure so it may be irrelevant. I've left my database in this state
should extra information be needed.

It would be interesting to know what object is affected:

SELECT s.nspname AS schemaname, t.relname AS objectname, t.relkind
FROM pg_class t JOIN
      pg_namespace s ON t.relnamespace = s.oid
WHERE t.relfilenode = 11937;

If it is an index, REINDEX should help.

What is the statement that performs the join operation and works just fine?

If I am following correctly it is:

mdb-> SELECT pq.* FROM policyqueue AS pq
mdb-> JOIN seed_progress AS sp ON pq.id=sp.polidx;


Yours,
Laurenz Albe



--
Adrian Klaver
adrian.klaver@aklaver.com

Re: What could cause CREATE TEMP... "could not read block" error?

От
Adrian Klaver
Дата:
On 11/20/2015 07:26 AM, Chris Richards wrote:
> Adrian is correct. This worked by itself whereas using it in the
> creation of the temporary table failed.
>
> mdb-> SELECT pq.* FROM policyqueue AS pq
> mdb-> JOIN seed_progress AS sp ON pq.id <http://pq.id/>=sp.polidx;
>
> I checked the query Albe suggested; there were two `relfilenode`s (11936
> and 11937) that exhibited the error. Respectively, they were
> pg_depend_depender_index and pg_depend_reference_index.
>
> Unfortunately, I didn't disable the nightly processes and something must
>   have(?) fixed the glitch; at midnight GMT the query ran successfully. Ugh.

What are the nightly processes, or at least those that might have
touched the tables in the join?

If this pops up again, could be another clue.

>
> If it crops up again, I have some tools to try and capture data
> immediately, and the suggested REINDEX since both appear to be indices.
>
> Thanks for the help. It's appreciated.
>
> Chris
>
> On Fri, Nov 20, 2015 at 8:28 AM, Adrian Klaver
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 11/20/2015 06:18 AM, Albe Laurenz wrote:
>
>         Chris Richards wrote:
>
>             Howdy. I have two tables that I'm joining together and using
>             the result to create a temporary table.
>             Performing the join operation works fine; it produces 0 rows
>             (as expected). If I prepend "CREATE TEMP
>             tmp_policyqueue AS" to the SELECT then it generates this error:
>
>             ERROR:  could not read block 39 in file "base/16391/11937":
>             read only 0 of 8192 bytes
>
>
>             $ psql
>             psql (9.3.9)
>
>             mdb=> CREATE TEMP TABLE tmp_policyqueue AS
>
>             mdb-> SELECT pq.* FROM policyqueue AS pq
>             mdb-> JOIN seed_progress AS sp ON pq.id
>             <http://pq.id>=sp.polidx;
>             ERROR:  could not read block 40 in file "base/16391/11937":
>             read only 0 of 8192 bytes
>
>             You'll also observe that the block number is changing each
>             time I execute the command. I know very
>             little about postgres internal structure so it may be
>             irrelevant. I've left my database in this state
>             should extra information be needed.
>
>
>         It would be interesting to know what object is affected:
>
>         SELECT s.nspname AS schemaname, t.relname AS objectname, t.relkind
>         FROM pg_class t JOIN
>                pg_namespace s ON t.relnamespace = s.oid
>         WHERE t.relfilenode = 11937;
>
>         If it is an index, REINDEX should help.
>
>         What is the statement that performs the join operation and works
>         just fine?
>
>
>     If I am following correctly it is:
>
>     mdb-> SELECT pq.* FROM policyqueue AS pq
>     mdb-> JOIN seed_progress AS sp ON pq.id <http://pq.id>=sp.polidx;
>
>
>         Yours,
>         Laurenz Albe
>
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: What could cause CREATE TEMP... "could not read block" error?

От
Albe Laurenz
Дата:
Chris Richards wrote:
> Adrian is correct. This worked by itself whereas using it in the creation of the temporary table
> failed.
> mdb-> SELECT pq.* FROM policyqueue AS pq
> mdb-> JOIN seed_progress AS sp ON pq.id <http://pq.id/> =sp.polidx;
> 
> I checked the query Albe suggested; there were two `relfilenode`s (11936 and 11937) that exhibited the
> error. Respectively, they were pg_depend_depender_index and pg_depend_reference_index.
> 
> Unfortunately, I didn't disable the nightly processes and something must  have(?) fixed the glitch; at
> midnight GMT the query ran successfully. Ugh.
> 
> If it crops up again, I have some tools to try and capture data immediately, and the suggested REINDEX
> since both appear to be indices.

These indexes are both on the system table "pg_depend".

That explains why the query ran without problems but the CREATE TABLE didn't:
Creating an object creates dependencies, and you have data corruption in the
system table that tracks dependencies.

I would be pretty worried in your place if I had a corrupted catalog, even if
it seems to have "fixed itself".  There might be other tables with corruption.

I would do two things:
1) Wait for a moment when there is little database traffic and
   run "REINDEX TABLE pg_depend;" to rebuild these indexes.

2) As soon as you can schedule some downtime, pg_dumpall the database
   cluster, stop the server, throw away the database cluster, create a new
   one with "initdb" and restore the dump into that.
   That will get rid of any lurking data corruption.
   Watch out for error messages during the pg_dumpall!


The main question is of course how you got the corruption in the first place.
Are you running the latest minor release for your PostgreSQL version?
Were there any crashes recently?
Do you have "fsync = on"?
Did you make sure that you have a reliable storage system?

Yours,
Laurenz Albe

Re: What could cause CREATE TEMP... "could not read block" error?

От
Chris Richards
Дата:
> Are you running the latest minor release for your PostgreSQL version?

No. 9.3.9 whereas the lastest is 9.3.10

> Were there any crashes recently?

No[*]. See comments below about LVM snapshots.

> Do you have "fsync = on"?

Yes.

> Did you make sure that you have a reliable storage system?

RAID-6 SSDs on ext4


I've observed the problem on other than the "pg_depend" table. A normal table (called "blocks") had the problem. A "REINDEX TABLE blocks" did not fix the problem, however a "VACUUM FULL" did "fix" it.

We had thought we figured out what might have caused the problem. We were taking LVM snapshots while the database was running and then at sometime later we reverted the snapshot (while postgres was stopped). Our theory was that since postgres was running and therefore we captured its backing store in an indeterminate state. We changed our snapshot process to shutdown postgres, take the LVM snapshot, and then start postgres. 

Unfortunately, the problem cropped up again.

We had an empty database except for our created tables, took the snapshot, and then populated the tables using normal INSERT/UPDATE/DELETEs while the system ran. Then, we reboot and revert the LVM snapshot--back to the empty database. We then populated the tables by using a series "COPY <table> FROM <file>". The only things special about this is we dropped one foreign key constraint and that with our data files is that we may issue back-to-back COPY's to the same table (each COPY pulling in different data as we pre-process the data before COPY'ing it). The amount of data is relatively small from our COPY's:

dirtable - 1 row
dirpath - 1 row
cloud - 940 rows
blocks - 176 rows (before it died)

Indexes were not disabled during the bulk import.

Chris


On Mon, Nov 23, 2015 at 2:59 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Chris Richards wrote:
> Adrian is correct. This worked by itself whereas using it in the creation of the temporary table
> failed.
> mdb-> SELECT pq.* FROM policyqueue AS pq
> mdb-> JOIN seed_progress AS sp ON pq.id <http://pq.id/> =sp.polidx;
>
> I checked the query Albe suggested; there were two `relfilenode`s (11936 and 11937) that exhibited the
> error. Respectively, they were pg_depend_depender_index and pg_depend_reference_index.
>
> Unfortunately, I didn't disable the nightly processes and something must  have(?) fixed the glitch; at
> midnight GMT the query ran successfully. Ugh.
>
> If it crops up again, I have some tools to try and capture data immediately, and the suggested REINDEX
> since both appear to be indices.

These indexes are both on the system table "pg_depend".

That explains why the query ran without problems but the CREATE TABLE didn't:
Creating an object creates dependencies, and you have data corruption in the
system table that tracks dependencies.

I would be pretty worried in your place if I had a corrupted catalog, even if
it seems to have "fixed itself".  There might be other tables with corruption.

I would do two things:
1) Wait for a moment when there is little database traffic and
   run "REINDEX TABLE pg_depend;" to rebuild these indexes.

2) As soon as you can schedule some downtime, pg_dumpall the database
   cluster, stop the server, throw away the database cluster, create a new
   one with "initdb" and restore the dump into that.
   That will get rid of any lurking data corruption.
   Watch out for error messages during the pg_dumpall!


The main question is of course how you got the corruption in the first place.
Are you running the latest minor release for your PostgreSQL version?
Were there any crashes recently?
Do you have "fsync = on"?
Did you make sure that you have a reliable storage system?

Yours,
Laurenz Albe

Re: What could cause CREATE TEMP... "could not read block" error?

От
Albe Laurenz
Дата:
Chris Richards wrote:
[had corruption in pg_depend indexes]
> I've observed the problem on other than the "pg_depend" table. A normal table (called "blocks") had
> the problem. A "REINDEX TABLE blocks" did not fix the problem, however a "VACUUM FULL" did "fix" it.

Did you dump/remove cluster/initdb/restore?
That's the only reliable way to get rid of corruption.

>> Are you running the latest minor release for your PostgreSQL version?
> 
> No. 9.3.9 whereas the lastest is 9.3.10

I think no data corruption bugs have been discovered since 9.3.9.

>> Did you make sure that you have a reliable storage system?
> 
> RAID-6 SSDs on ext4

Did you test if it is reliable?
Pulled the plug on the server during a busy checkpoint?
Or read http://brad.livejournal.com/2116715.html

But that shouldn't apply to your case if there was no crash.

>> Were there any crashes recently?
> 
> No[*]. See comments below about LVM snapshots.

> We had thought we figured out what might have caused the problem. We were taking LVM snapshots while
> the database was running and then at sometime later we reverted the snapshot (while postgres was
> stopped). Our theory was that since postgres was running and therefore we captured its backing store
> in an indeterminate state. We changed our snapshot process to shutdown postgres, take the LVM
> snapshot, and then start postgres.

If the snapshot is truly atomic, it should work - that would be just like restarting
PostgreSQL after a server crash.  PostgreSQL is designed to handle such a case well,
provided that the storage does what it promises, i.e. make sure that data are really on disk
when an fsync() is issued.

Is the whole database, including pg_xlog, on one logical volume?

> Unfortunately, the problem cropped up again.
> 
> We had an empty database except for our created tables, took the snapshot, and then populated the
> tables using normal INSERT/UPDATE/DELETEs while the system ran. Then, we reboot and revert the LVM
> snapshot--back to the empty database. We then populated the tables by using a series "COPY <table>
> FROM <file>". The only things special about this is we dropped one foreign key constraint and that
> with our data files is that we may issue back-to-back COPY's to the same table (each COPY pulling in
> different data as we pre-process the data before COPY'ing it). The amount of data is relatively small
> from our COPY's:
> 
> dirtable - 1 row
> dirpath - 1 row
> cloud - 940 rows
> blocks - 176 rows (before it died)
> 
> Indexes were not disabled during the bulk import.

It "died" because of data curruption?
And you had PostgreSQL shut down when you took the snapshot?
Was this a new database cluster (not just a new database) created
with initdb just before the test?

Now that should work even if your storage lies to you.
Unless there is something quite wrong with your storage - RAID checksums
should prevent the effects of a failing disk from affecting you.

Yours,
Laurenz Albe