Обсуждение: Orphan files filling root partition after crash

Поиск
Список
Период
Сортировка

Orphan files filling root partition after crash

От
Dimitrios Apostolou
Дата:
Hello list,

yesterday I was doing:

ALTER TABLE x ADD UNIQUE ... , ADD PRIMARY KEY ...;

The table is almost a billion rows long but lies in its own TABLESPACE
that has plenty of space.  But apparently the ALTER TABLE command is
writing a lot to the default tablespace (not the temp_tablespace, that is
already moved to a different partition).

That quickly filled up the 50GB free space in my root partition:


20:18:04.222 UTC [94144] PANIC:  could not write to file "pg_wal/xlogtemp.94144": No space left on device
[...]
20:19:11.578 UTC [94140] LOG:  WAL writer process (PID 94144) was terminated by signal 6: Aborted
20:19:11.578 UTC [94140] LOG:  terminating any other active server processes


After postgresql crashed and restarted, the disk space in the root
partition was still not freed! I believe this is because of "orphaned
files" as discussed in mailing list thread [1].

[1] https://www.postgresql.org/message-id/CAN-RpxDBA7HbTsJPq4t4VznmRFJkssP2SNEMuG%3DoNJ%2B%3DsxLQew%40mail.gmail.com

I ended up doing some risky actions to remediate the problem: Find the
filenames that have no identically named "oid" in pg_class, and delete
(move to backup) the biggest ones while the database is stopped.
Fortunately the database started up fine after that!

So what is the moral of the story? How to guard against this?

Why did the database write so much to the default tablespace, even when
both the table and the temp tablespace are elsewhere?  Also should one
always keep the default tablespace away from the wal partition? (I guess
it would have helped to avoid the crash, but the ALTER TABLE command would
have still run out of space, so I'm not sure if the orphan files would
have been avoided)?

Needless to say, I would have hoped the database cleaned-up after itself
even after an uncontrolled crash, or that it provided some tools for the
job. (I tried VACUUM FULL on the table, but the orphaned files did not go
away).

My postgresql version is 16.2 installed on Ubuntu.

Thank you,
Dimitris




Re: Orphan files filling root partition after crash

От
Sergey Fukanchik
Дата:
Hi Dimitrios,
Do you have wal archiving enabled?
$PGDATA/pg_wal/ is a bit different from tablespaces (including
"default" one). It stores transaction journal.
Instructions are here:
https://www.postgresql.org/docs/current/continuous-archiving.html#BACKUP-ARCHIVING-WAL
Some more info here:
https://www.postgresql.org/docs/current/wal-intro.html and here
https://www.postgresql.org/docs/current/wal-configuration.html
---
Sergey

On Wed, 28 Feb 2024 at 14:18, Dimitrios Apostolou <jimis@gmx.net> wrote:
>
> Hello list,
>
> yesterday I was doing:
>
> ALTER TABLE x ADD UNIQUE ... , ADD PRIMARY KEY ...;
>
> The table is almost a billion rows long but lies in its own TABLESPACE
> that has plenty of space.  But apparently the ALTER TABLE command is
> writing a lot to the default tablespace (not the temp_tablespace, that is
> already moved to a different partition).
>
> That quickly filled up the 50GB free space in my root partition:
>
>
> 20:18:04.222 UTC [94144] PANIC:  could not write to file "pg_wal/xlogtemp.94144": No space left on device
> [...]
> 20:19:11.578 UTC [94140] LOG:  WAL writer process (PID 94144) was terminated by signal 6: Aborted
> 20:19:11.578 UTC [94140] LOG:  terminating any other active server processes
>
>
> After postgresql crashed and restarted, the disk space in the root
> partition was still not freed! I believe this is because of "orphaned
> files" as discussed in mailing list thread [1].
>
> [1] https://www.postgresql.org/message-id/CAN-RpxDBA7HbTsJPq4t4VznmRFJkssP2SNEMuG%3DoNJ%2B%3DsxLQew%40mail.gmail.com
>
> I ended up doing some risky actions to remediate the problem: Find the
> filenames that have no identically named "oid" in pg_class, and delete
> (move to backup) the biggest ones while the database is stopped.
> Fortunately the database started up fine after that!
>
> So what is the moral of the story? How to guard against this?
>
> Why did the database write so much to the default tablespace, even when
> both the table and the temp tablespace are elsewhere?  Also should one
> always keep the default tablespace away from the wal partition? (I guess
> it would have helped to avoid the crash, but the ALTER TABLE command would
> have still run out of space, so I'm not sure if the orphan files would
> have been avoided)?
>
> Needless to say, I would have hoped the database cleaned-up after itself
> even after an uncontrolled crash, or that it provided some tools for the
> job. (I tried VACUUM FULL on the table, but the orphaned files did not go
> away).
>
> My postgresql version is 16.2 installed on Ubuntu.
>
> Thank you,
> Dimitris
>
>
>


-- 
Sergey



Re: Orphan files filling root partition after crash

От
Dimitrios Apostolou
Дата:
Hi Sergey,

On Wed, 28 Feb 2024, Sergey Fukanchik wrote:

> Hi Dimitrios,
> Do you have wal archiving enabled?
> $PGDATA/pg_wal/ is a bit different from tablespaces (including
> "default" one). It stores transaction journal.

I don't think I have WAL archiving enabled. Here are the relevant WAL
options in my config:

wal_compression = zstd
max_wal_size = 8GB
min_wal_size = 1GB
track_wal_io_timing = on
wal_writer_flush_after = 8MB

The issue happens because the WAL directory is by default in the same
filesystem with the default tablespace (root partition for Ubuntu). So
when the default tablespace filled up because of my ALTER TABLE operation,
there was no space for WAL either.


Thanks,
Dimitris




Re: Orphan files filling root partition after crash

От
Laurenz Albe
Дата:
On Wed, 2024-02-28 at 12:16 +0100, Dimitrios Apostolou wrote:
> yesterday I was doing:
>
> ALTER TABLE x ADD UNIQUE ... , ADD PRIMARY KEY ...;
>
> The table is almost a billion rows long but lies in its own TABLESPACE
> that has plenty of space.  But apparently the ALTER TABLE command is
> writing a lot to the default tablespace (not the temp_tablespace, that is
> already moved to a different partition).
>
> That quickly filled up the 50GB free space in my root partition:
>
>
> 20:18:04.222 UTC [94144] PANIC:  could not write to file "pg_wal/xlogtemp.94144": No space left on device
> [...]
> 20:19:11.578 UTC [94140] LOG:  WAL writer process (PID 94144) was terminated by signal 6: Aborted
> 20:19:11.578 UTC [94140] LOG:  terminating any other active server processes
>
>
> After postgresql crashed and restarted, the disk space in the root
> partition was still not freed! I believe this is because of "orphaned
> files" as discussed in mailing list thread [1].
>
> [1] https://www.postgresql.org/message-id/CAN-RpxDBA7HbTsJPq4t4VznmRFJkssP2SNEMuG%3DoNJ%2B%3DsxLQew%40mail.gmail.com
>
> I ended up doing some risky actions to remediate the problem: Find the
> filenames that have no identically named "oid" in pg_class, and delete
> (move to backup) the biggest ones while the database is stopped.
> Fortunately the database started up fine after that!

Lucky you.  It should have been "relfilenode" rather than "oid",
and some catalog tables don't have their files listed in the catalog,
because they are needed *before* the database can access tables.

> So what is the moral of the story? How to guard against this?

Monitor disk usage ...

The root of the problem is that you created the index in the default
tablespace.  You should have

   ALTER TABLE x ADD UNIQUE ... USING INDEX TABLESPACE bigtblspc;

> Needless to say, I would have hoped the database cleaned-up after itself
> even after an uncontrolled crash, or that it provided some tools for the
> job. (I tried VACUUM FULL on the table, but the orphaned files did not go
> away).

That is not so simple... Also, it would slow down crash recovery.

But I agree that it would be nice to have a tool that reports or
cleans up orphaned files.

Yours,
Laurenz Albe



Re: Orphan files filling root partition after crash

От
Dimitrios Apostolou
Дата:
Thanks for the feedback Laurenz,

On Wed, 28 Feb 2024, Laurenz Albe wrote:

> On Wed, 2024-02-28 at 12:16 +0100, Dimitrios Apostolou wrote:
>>
>> I ended up doing some risky actions to remediate the problem: Find the
>> filenames that have no identically named "oid" in pg_class, and delete
>> (move to backup) the biggest ones while the database is stopped.
>> Fortunately the database started up fine after that!
>
> Lucky you.  It should have been "relfilenode" rather than "oid",
> and some catalog tables don't have their files listed in the catalog,
> because they are needed *before* the database can access tables.

I actually double checked that the filenames don't appear anywhere in
SELECT * FROM pg_class
and that the files were multi-GB in size including all the
1GB-pieces. But luck was definitely a part of the equation, I didn't know
that the files might be accessed before tables (at db startup?) or that
"relfilenode" would be more appropriate. Why is that, where can I read
more? I see that many (but not all) rows in pg_class have oid=relfilenode
but for many rows relfilenode=0 which is meaningless as filename.

>
>> So what is the moral of the story? How to guard against this?
>
> Monitor disk usage ...

It happened *fast*. And it was quite a big suprise coming
from "just" a disk-full situation.

A couple of suggestions; wouldn't it make sense:

+ for the index to be written by default to the table's tablespace?

+ for postgres to refuse to write non-wal files, if it's on
   the same device as the WAL and less than max_wal_size bytes are free?

>
> The root of the problem is that you created the index in the default
> tablespace.  You should have
>
>   ALTER TABLE x ADD UNIQUE ... USING INDEX TABLESPACE bigtblspc;
>

Thank you, was reading the docs but didn't realize this
syntax is valid. I thought it was only for CREATE/ALTER INDEX.


>> Needless to say, I would have hoped the database cleaned-up after itself
>> even after an uncontrolled crash, or that it provided some tools for the
>> job. (I tried VACUUM FULL on the table, but the orphaned files did not go
>> away).
>
> That is not so simple... Also, it would slow down crash recovery.
>
> But I agree that it would be nice to have a tool that reports or
> cleans up orphaned files.
>
> Yours,
> Laurenz Albe
>



Re: Orphan files filling root partition after crash

От
Adrian Klaver
Дата:
On 2/28/24 11:30, Dimitrios Apostolou wrote:
> Thanks for the feedback Laurenz,
> 
> On Wed, 28 Feb 2024, Laurenz Albe wrote:
> 
>> On Wed, 2024-02-28 at 12:16 +0100, Dimitrios Apostolou wrote:
>>>
>>> I ended up doing some risky actions to remediate the problem: Find the
>>> filenames that have no identically named "oid" in pg_class, and delete
>>> (move to backup) the biggest ones while the database is stopped.
>>> Fortunately the database started up fine after that!
>>
>> Lucky you.  It should have been "relfilenode" rather than "oid",
>> and some catalog tables don't have their files listed in the catalog,
>> because they are needed *before* the database can access tables.
> 
> I actually double checked that the filenames don't appear anywhere in
> SELECT * FROM pg_class
> and that the files were multi-GB in size including all the
> 1GB-pieces. But luck was definitely a part of the equation, I didn't know
> that the files might be accessed before tables (at db startup?) or that
> "relfilenode" would be more appropriate. Why is that, where can I read

https://www.postgresql.org/docs/current/storage-file-layout.html


Caution

Note that while a table's filenode often matches its OID, this is not 
necessarily the case; some operations, like TRUNCATE, REINDEX, CLUSTER 
and some forms of ALTER TABLE, can change the filenode while preserving 
the OID. Avoid assuming that filenode and table OID are the same. Also, 
for certain system catalogs including pg_class itself, 
pg_class.relfilenode contains zero. The actual filenode number of these 
catalogs is stored in a lower-level data structure, and can be obtained 
using the pg_relation_filenode() function.



-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Orphan files filling root partition after crash

От
Laurenz Albe
Дата:
On Wed, 2024-02-28 at 20:30 +0100, Dimitrios Apostolou wrote:
> Lucky you.  It should have been "relfilenode" rather than "oid",
> > and some catalog tables don't have their files listed in the catalog,
> > because they are needed *before* the database can access tables.
>
> I actually double checked that the filenames don't appear anywhere in
> SELECT * FROM pg_class
> and that the files were multi-GB in size including all the
> 1GB-pieces. But luck was definitely a part of the equation, I didn't know
> that the files might be accessed before tables (at db startup?) or that
> "relfilenode" would be more appropriate. Why is that, where can I read
> more? I see that many (but not all) rows in pg_class have oid=relfilenode
> but for many rows relfilenode=0 which is meaningless as filename.

If you are curious about such things, start reading the source.
The object ID is immutable, and initially the filenode is the save,
but it changes whenever the table is rewritten (TRUNCATE, ALTER TABLE,
VACUUM (FULL), ...).

Yours,
Laurenz Albe



Re: Orphan files filling root partition after crash

От
"Peter J. Holzer"
Дата:
On 2024-02-28 20:30:35 +0100, Dimitrios Apostolou wrote:
> On Wed, 28 Feb 2024, Laurenz Albe wrote:
> > On Wed, 2024-02-28 at 12:16 +0100, Dimitrios Apostolou wrote:
> > > So what is the moral of the story? How to guard against this?
> >
> > Monitor disk usage ...
>
> It happened *fast*. And it was quite a big suprise coming
> from "just" a disk-full situation.

Been there.

To prevent this in the future I wrote a small script to monitor disk
space (on multiple hosts and multiple file systems) every few seconds
and invoke another script (which just terminates all database
connections - a bit drastic but effective) if free space runs low:
https://github.com/hjp/platzangst

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Orphan files filling root partition after crash

От
Laurenz Albe
Дата:
On Mon, 2024-03-04 at 00:04 +0100, Peter J. Holzer wrote:
> On 2024-02-28 20:30:35 +0100, Dimitrios Apostolou wrote:
> > On Wed, 28 Feb 2024, Laurenz Albe wrote:
> > > On Wed, 2024-02-28 at 12:16 +0100, Dimitrios Apostolou wrote:
> > > > So what is the moral of the story? How to guard against this?
> > >
> > > Monitor disk usage ...
> >
> > It happened *fast*. And it was quite a big suprise coming
> > from "just" a disk-full situation.
>
> Been there.
>
> To prevent this in the future I wrote a small script to monitor disk
> space (on multiple hosts and multiple file systems) every few seconds
> and invoke another script (which just terminates all database
> connections - a bit drastic but effective) if free space runs low:
> https://github.com/hjp/platzangst

And I maintain that "temp_file_limit" is the best solution.

Yours,
Laurenz Albe



回复:Orphan files filling root partition after crash

От
"赵宇鹏(宇彭)"
Дата:
Hello, it looks like I've run into the same issue as you. I exhausted the disk
space while executing DDL operations, and then after crash recovery, I found
there were orphaned files.
I believe the reason is that due to the lack of space, some of the WAL logs were
not persisted, such as the abort-type WAL logs. During the WAL replay phase,
the absence of abort-type WAL logs resulted in missing the corresponding unlink
operations.
I can replicate a similar scenario. For example, 16394 is an orphaned file that
was generated by the DDL but was not rolled back.

1. CREATE TABLE test(id int);
2. INSERT INTO test SELECT generate_series(1, 100000000);
3. ALTER TABLE test ALTER COLUMN id TYPE bigint;
  sudo chmod 000 pg_wal
4. sudo chmod 777 pg_wal
  pg_waldump xxx
5. pg_ctl -D /data start
6. ll -h /data/base/5

1.0G Mar 26 11:47 16387.1
1.0G Mar 26 11:47 16387.2
385M Mar 26 13:38 16387.3
888K Mar 26 13:38 16387_fsm
112K Mar 26 11:51 16387_vm
1.0G Mar 26 13:47 16394
179M Mar 26 13:47 16394.1
320K Mar 26 13:47 16394_fsm

postgres=# select pg_size_pretty(pg_database_size('postgres'));
pg_size_pretty
----------------
4668 MB
(1 row)
postgres=# select pg_size_pretty(pg_total_relation_size('test'));
pg_size_pretty
----------------
3458 MB
(1 row)
postgres=# select count(*) from pg_class where pg_relation_filenode(oid) = 16394;
count
-------
    0
(1 row)




------------------------------------------------------------------
发件人:Dimitrios Apostolou <jimis@gmx.net>
发送时间:2024年3月26日(星期二) 11:15
收件人:"pgsql-general"<pgsql-general@lists.postgresql.org>
主 题:Orphan files filling root partition after crash

Hello list,

yesterday I was doing:

ALTER TABLE x ADD UNIQUE ... , ADD PRIMARY KEY ...;

The table is almost a billion rows long but lies in its own TABLESPACE
that has plenty of space.  But apparently the ALTER TABLE command is
writing a lot to the default tablespace (not the temp_tablespace, that is
already moved to a different partition).

That quickly filled up the 50GB free space in my root partition:


20:18:04.222 UTC [94144] PANIC:  could not write to file "pg_wal/xlogtemp.94144": No space left on device
[...]
20:19:11.578 UTC [94140] LOG:  WAL writer process (PID 94144) was terminated by signal 6: Aborted
20:19:11.578 UTC [94140] LOG:  terminating any other active server processes


After postgresql crashed and restarted, the disk space in the root
partition was still not freed! I believe this is because of "orphaned
files" as discussed in mailing list thread [1].

[1] https://www.postgresql.org/message-id/CAN-RpxDBA7HbTsJPq4t4VznmRFJkssP2SNEMuG%3DoNJ%2B%3DsxLQew%40mail.gmail.com

I ended up doing some risky actions to remediate the problem: Find the
filenames that have no identically named "oid" in pg_class, and delete
(move to backup) the biggest ones while the database is stopped.
Fortunately the database started up fine after that!

So what is the moral of the story? How to guard against this?

Why did the database write so much to the default tablespace, even when
both the table and the temp tablespace are elsewhere?  Also should one
always keep the default tablespace away from the wal partition? (I guess
it would have helped to avoid the crash, but the ALTER TABLE command would
have still run out of space, so I'm not sure if the orphan files would
have been avoided)?

Needless to say, I would have hoped the database cleaned-up after itself
even after an uncontrolled crash, or that it provided some tools for the
job. (I tried VACUUM FULL on the table, but the orphaned files did not go
away).

My postgresql version is 16.2 installed on Ubuntu.

Thank you,
Dimitris