Обсуждение: Options for more aggressive space reclamation in vacuuming?

Поиск
Список
Период
Сортировка

Options for more aggressive space reclamation in vacuuming?

От
Wells Oliver
Дата:
Noticed over the last few days how much more space VACUUM FULL reclaims (for obvious reasons) and I am wondering if there are any solutions for a more fruitful VACUUM without the locks of FULL?

--

Re: Options for more aggressive space reclamation in vacuuming?

От
Jeff Janes
Дата:
On Sun, May 14, 2023 at 1:46 PM Wells Oliver <wells.oliver@gmail.com> wrote:
Noticed over the last few days how much more space VACUUM FULL reclaims (for obvious reasons) and I am wondering if there are any solutions for a more fruitful VACUUM without the locks of FULL?

There are lots of theories which are all compatible with the minimal amount of info given here.  If the space is freed from the indexes, then just reindexing (which can be done concurrently in recent versions) could reclaim the space. If it is from the table itself then something like pg_squeeze (https://github.com/cybertec-postgresql/pg_squeeze/) might help. Or you can implement your own squeezing, by using a CTE with a ctid scan to delete tuples from the end of the table and reinsert them in the front, then use regular vacuum to truncate the now-free space off the end of the table.

I would think the more fruitful VACUUM would be to prevent the bloat from occuring in the first place but without knowing the root cause I can't offer anything but platitudes there.

Cheers,

Jeff

Re: Options for more aggressive space reclamation in vacuuming?

От
Wells Oliver
Дата:
I think our use-case here is nothing special: just very large tables that are deleted from and re-inserted with some frequency. They store monthly data and exist as partitions. As an example, July of 2022 (summer months are peak) had 1,630,558,336 rows. We delete from this and insert daily. We rarely get new, relevant data once the month is over. The n_dead_tup from pg_stat_all_tables here was over 7m rows, and clearing that out gave us back nearly 50GB, and the file size estimate on this partition was ~200 GB. These tables get auto-vacuumed but clearly it's not, well, aggressively reclaiming space.

Given that they are partitions, VACUUM FULL locks the parent table, so new writes/deletes can't happen, and anyway people like to run queries against these tables quite frequently.

Glad to add any detail I can.

On Sun, May 14, 2023 at 11:32 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Sun, May 14, 2023 at 1:46 PM Wells Oliver <wells.oliver@gmail.com> wrote:
Noticed over the last few days how much more space VACUUM FULL reclaims (for obvious reasons) and I am wondering if there are any solutions for a more fruitful VACUUM without the locks of FULL?

There are lots of theories which are all compatible with the minimal amount of info given here.  If the space is freed from the indexes, then just reindexing (which can be done concurrently in recent versions) could reclaim the space. If it is from the table itself then something like pg_squeeze (https://github.com/cybertec-postgresql/pg_squeeze/) might help. Or you can implement your own squeezing, by using a CTE with a ctid scan to delete tuples from the end of the table and reinsert them in the front, then use regular vacuum to truncate the now-free space off the end of the table.

I would think the more fruitful VACUUM would be to prevent the bloat from occuring in the first place but without knowing the root cause I can't offer anything but platitudes there.

Cheers,

Jeff


--

Re: Options for more aggressive space reclamation in vacuuming?

От
Paul Smith
Дата:
Have you tuned the autovacuum settings? We've found the defaults to not be great for large tables

Eg autovacuum_vacuum_scale_factor defaults to 0.2, so 20% of the tuples need to be deleted/updated before a vacuum us triggered. If there are a billion rows, that means there needs to be 200 million deleted rows to trigger the autovacuum 


Maybe try changing autovacuum_vacuum_scale_factor to 0.001 or something for that large table 

Paul

On 14 May 2023 22:03:41 Wells Oliver <wells.oliver@gmail.com> wrote:

I think our use-case here is nothing special: just very large tables that are deleted from and re-inserted with some frequency. They store monthly data and exist as partitions. As an example, July of 2022 (summer months are peak) had 1,630,558,336 rows. We delete from this and insert daily. We rarely get new, relevant data once the month is over. The n_dead_tup from pg_stat_all_tables here was over 7m rows, and clearing that out gave us back nearly 50GB, and the file size estimate on this partition was ~200 GB. These tables get auto-vacuumed but clearly it's not, well, aggressively reclaiming space.

Given that they are partitions, VACUUM FULL locks the parent table, so new writes/deletes can't happen, and anyway people like to run queries against these tables quite frequently.

Glad to add any detail I can.

On Sun, May 14, 2023 at 11:32 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Sun, May 14, 2023 at 1:46 PM Wells Oliver <wells.oliver@gmail.com> wrote:
Noticed over the last few days how much more space VACUUM FULL reclaims (for obvious reasons) and I am wondering if there are any solutions for a more fruitful VACUUM without the locks of FULL?

There are lots of theories which are all compatible with the minimal amount of info given here.  If the space is freed from the indexes, then just reindexing (which can be done concurrently in recent versions) could reclaim the space. If it is from the table itself then something like pg_squeeze (https://github.com/cybertec-postgresql/pg_squeeze/) might help. Or you can implement your own squeezing, by using a CTE with a ctid scan to delete tuples from the end of the table and reinsert them in the front, then use regular vacuum to truncate the now-free space off the end of the table.

I would think the more fruitful VACUUM would be to prevent the bloat from occuring in the first place but without knowing the root cause I can't offer anything but platitudes there.

Cheers,

Jeff


--

Re: Options for more aggressive space reclamation in vacuuming?

От
Wells Oliver
Дата:
Yeah, We've tuned them down to a 0.1 factor and they are indeed being auto-vacuumed (though perhaps not frequently enough). I feel like I am just seeing the autovacuuming releasing a _lot_ less space than a full vacuum. Of course, I know that this is generally the case, but I am surprised by the factor... Nothing we can't live with, I was just wondering if there wasn't some clever solution.

Perhaps lowering the auto vacuum factor even lower would be useful here too.



On Sun, May 14, 2023 at 3:14 PM Paul Smith <paul@pscs.co.uk> wrote:
Have you tuned the autovacuum settings? We've found the defaults to not be great for large tables

Eg autovacuum_vacuum_scale_factor defaults to 0.2, so 20% of the tuples need to be deleted/updated before a vacuum us triggered. If there are a billion rows, that means there needs to be 200 million deleted rows to trigger the autovacuum 


Maybe try changing autovacuum_vacuum_scale_factor to 0.001 or something for that large table 

Paul

On 14 May 2023 22:03:41 Wells Oliver <wells.oliver@gmail.com> wrote:

I think our use-case here is nothing special: just very large tables that are deleted from and re-inserted with some frequency. They store monthly data and exist as partitions. As an example, July of 2022 (summer months are peak) had 1,630,558,336 rows. We delete from this and insert daily. We rarely get new, relevant data once the month is over. The n_dead_tup from pg_stat_all_tables here was over 7m rows, and clearing that out gave us back nearly 50GB, and the file size estimate on this partition was ~200 GB. These tables get auto-vacuumed but clearly it's not, well, aggressively reclaiming space.

Given that they are partitions, VACUUM FULL locks the parent table, so new writes/deletes can't happen, and anyway people like to run queries against these tables quite frequently.

Glad to add any detail I can.

On Sun, May 14, 2023 at 11:32 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Sun, May 14, 2023 at 1:46 PM Wells Oliver <wells.oliver@gmail.com> wrote:
Noticed over the last few days how much more space VACUUM FULL reclaims (for obvious reasons) and I am wondering if there are any solutions for a more fruitful VACUUM without the locks of FULL?

There are lots of theories which are all compatible with the minimal amount of info given here.  If the space is freed from the indexes, then just reindexing (which can be done concurrently in recent versions) could reclaim the space. If it is from the table itself then something like pg_squeeze (https://github.com/cybertec-postgresql/pg_squeeze/) might help. Or you can implement your own squeezing, by using a CTE with a ctid scan to delete tuples from the end of the table and reinsert them in the front, then use regular vacuum to truncate the now-free space off the end of the table.

I would think the more fruitful VACUUM would be to prevent the bloat from occuring in the first place but without knowing the root cause I can't offer anything but platitudes there.

Cheers,

Jeff


--



--

Re: Options for more aggressive space reclamation in vacuuming?

От
Ron
Дата:
We regularly vacuum, and definitely vacuum after an archive-delete.

On 5/14/23 16:02, Wells Oliver wrote:
I think our use-case here is nothing special: just very large tables that are deleted from and re-inserted with some frequency. They store monthly data and exist as partitions. As an example, July of 2022 (summer months are peak) had 1,630,558,336 rows. We delete from this and insert daily. We rarely get new, relevant data once the month is over. The n_dead_tup from pg_stat_all_tables here was over 7m rows, and clearing that out gave us back nearly 50GB, and the file size estimate on this partition was ~200 GB. These tables get auto-vacuumed but clearly it's not, well, aggressively reclaiming space.

Given that they are partitions, VACUUM FULL locks the parent table, so new writes/deletes can't happen, and anyway people like to run queries against these tables quite frequently.

Glad to add any detail I can.

On Sun, May 14, 2023 at 11:32 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Sun, May 14, 2023 at 1:46 PM Wells Oliver <wells.oliver@gmail.com> wrote:
Noticed over the last few days how much more space VACUUM FULL reclaims (for obvious reasons) and I am wondering if there are any solutions for a more fruitful VACUUM without the locks of FULL?

There are lots of theories which are all compatible with the minimal amount of info given here.  If the space is freed from the indexes, then just reindexing (which can be done concurrently in recent versions) could reclaim the space. If it is from the table itself then something like pg_squeeze (https://github.com/cybertec-postgresql/pg_squeeze/) might help. Or you can implement your own squeezing, by using a CTE with a ctid scan to delete tuples from the end of the table and reinsert them in the front, then use regular vacuum to truncate the now-free space off the end of the table.

I would think the more fruitful VACUUM would be to prevent the bloat from occuring in the first place but without knowing the root cause I can't offer anything but platitudes there.

Cheers,

Jeff


--

--
Born in Arizona, moved to Babylonia.

Re: Options for more aggressive space reclamation in vacuuming?

От
Wells Oliver
Дата:
Just to be clear, you regularly vacuum full, i.e. locking full-on hard-core all-out vacuum?

On Sun, May 14, 2023 at 3:25 PM Ron <ronljohnsonjr@gmail.com> wrote:
We regularly vacuum, and definitely vacuum after an archive-delete.

On 5/14/23 16:02, Wells Oliver wrote:
I think our use-case here is nothing special: just very large tables that are deleted from and re-inserted with some frequency. They store monthly data and exist as partitions. As an example, July of 2022 (summer months are peak) had 1,630,558,336 rows. We delete from this and insert daily. We rarely get new, relevant data once the month is over. The n_dead_tup from pg_stat_all_tables here was over 7m rows, and clearing that out gave us back nearly 50GB, and the file size estimate on this partition was ~200 GB. These tables get auto-vacuumed but clearly it's not, well, aggressively reclaiming space.

Given that they are partitions, VACUUM FULL locks the parent table, so new writes/deletes can't happen, and anyway people like to run queries against these tables quite frequently.

Glad to add any detail I can.

On Sun, May 14, 2023 at 11:32 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Sun, May 14, 2023 at 1:46 PM Wells Oliver <wells.oliver@gmail.com> wrote:
Noticed over the last few days how much more space VACUUM FULL reclaims (for obvious reasons) and I am wondering if there are any solutions for a more fruitful VACUUM without the locks of FULL?

There are lots of theories which are all compatible with the minimal amount of info given here.  If the space is freed from the indexes, then just reindexing (which can be done concurrently in recent versions) could reclaim the space. If it is from the table itself then something like pg_squeeze (https://github.com/cybertec-postgresql/pg_squeeze/) might help. Or you can implement your own squeezing, by using a CTE with a ctid scan to delete tuples from the end of the table and reinsert them in the front, then use regular vacuum to truncate the now-free space off the end of the table.

I would think the more fruitful VACUUM would be to prevent the bloat from occuring in the first place but without knowing the root cause I can't offer anything but platitudes there.

Cheers,

Jeff


--

--
Born in Arizona, moved to Babylonia.


--

Re: Options for more aggressive space reclamation in vacuuming?

От
Ron
Дата:
Nope.  Bog-standard VACUUM.  It marks space free, and Postgresql uses it for new records.

On 5/14/23 17:26, Wells Oliver wrote:
Just to be clear, you regularly vacuum full, i.e. locking full-on hard-core all-out vacuum?

On Sun, May 14, 2023 at 3:25 PM Ron <ronljohnsonjr@gmail.com> wrote:
We regularly vacuum, and definitely vacuum after an archive-delete.

On 5/14/23 16:02, Wells Oliver wrote:
I think our use-case here is nothing special: just very large tables that are deleted from and re-inserted with some frequency. They store monthly data and exist as partitions. As an example, July of 2022 (summer months are peak) had 1,630,558,336 rows. We delete from this and insert daily. We rarely get new, relevant data once the month is over. The n_dead_tup from pg_stat_all_tables here was over 7m rows, and clearing that out gave us back nearly 50GB, and the file size estimate on this partition was ~200 GB. These tables get auto-vacuumed but clearly it's not, well, aggressively reclaiming space.

Given that they are partitions, VACUUM FULL locks the parent table, so new writes/deletes can't happen, and anyway people like to run queries against these tables quite frequently.

Glad to add any detail I can.

On Sun, May 14, 2023 at 11:32 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Sun, May 14, 2023 at 1:46 PM Wells Oliver <wells.oliver@gmail.com> wrote:
Noticed over the last few days how much more space VACUUM FULL reclaims (for obvious reasons) and I am wondering if there are any solutions for a more fruitful VACUUM without the locks of FULL?

There are lots of theories which are all compatible with the minimal amount of info given here.  If the space is freed from the indexes, then just reindexing (which can be done concurrently in recent versions) could reclaim the space. If it is from the table itself then something like pg_squeeze (https://github.com/cybertec-postgresql/pg_squeeze/) might help. Or you can implement your own squeezing, by using a CTE with a ctid scan to delete tuples from the end of the table and reinsert them in the front, then use regular vacuum to truncate the now-free space off the end of the table.

I would think the more fruitful VACUUM would be to prevent the bloat from occuring in the first place but without knowing the root cause I can't offer anything but platitudes there.

Cheers,

Jeff


--

--
Born in Arizona, moved to Babylonia.


--

--
Born in Arizona, moved to Babylonia.

Re: Options for more aggressive space reclamation in vacuuming?

От
Ron
Дата:
On 5/14/23 12:45, Wells Oliver wrote:
> Noticed over the last few days how much more space VACUUM FULL reclaims 
> (for obvious reasons) and I am wondering if there are any solutions for a 
> more fruitful VACUUM without the locks of FULL?

Another tactic for UPDATE-heavy environments is to reduce the table's 
fillfactor.  As long as an indexed field isn't modified, Postgresql will try 
and use in-place updates of these HOT (Heap Only Tuple) records.



-- 
Born in Arizona, moved to Babylonia.



Re: Options for more aggressive space reclamation in vacuuming?

От
Paul Smith
Дата:
An autovacuum will not release much (if any) space back to the OS. That's perfectly normal. It just marks it as available for new dará to occupy. 

So, if data is being added/removed at a similar rare, the table size will reach an equilibrium with lots of dead tuples that are being reused

If autovacuum is happening too infrequently, then there'll be far too many dead tuples. If it's happening too frequently, then there'll be too much load on the IO system 

Personally, I'd say 0.1 is still too high for a billion+ row table. That's still 100+ million dead tuples. That's a lot, plus it'll be slow to process. I'd use 0.001 (1+ million dead tuples), or maybe even lower. 

Depending on the delete rate, you could even set the scale_factor to 0, and the autovacuum_vacuum_threshold to a number appropriate for an autovacuum every few days. 

Paul

On 14 May 2023 23:20:19 Wells Oliver <wells.oliver@gmail.com> wrote:

Yeah, We've tuned them down to a 0.1 factor and they are indeed being auto-vacuumed (though perhaps not frequently enough). I feel like I am just seeing the autovacuuming releasing a _lot_ less space than a full vacuum. Of course, I know that this is generally the case, but I am surprised by the factor... Nothing we can't live with, I was just wondering if there wasn't some clever solution.

Perhaps lowering the auto vacuum factor even lower would be useful here too.



On Sun, May 14, 2023 at 3:14 PM Paul Smith <paul@pscs.co.uk> wrote:
Have you tuned the autovacuum settings? We've found the defaults to not be great for large tables

Eg autovacuum_vacuum_scale_factor defaults to 0.2, so 20% of the tuples need to be deleted/updated before a vacuum us triggered. If there are a billion rows, that means there needs to be 200 million deleted rows to trigger the autovacuum 


Maybe try changing autovacuum_vacuum_scale_factor to 0.001 or something for that large table 

Paul

On 14 May 2023 22:03:41 Wells Oliver <wells.oliver@gmail.com> wrote:

I think our use-case here is nothing special: just very large tables that are deleted from and re-inserted with some frequency. They store monthly data and exist as partitions. As an example, July of 2022 (summer months are peak) had 1,630,558,336 rows. We delete from this and insert daily. We rarely get new, relevant data once the month is over. The n_dead_tup from pg_stat_all_tables here was over 7m rows, and clearing that out gave us back nearly 50GB, and the file size estimate on this partition was ~200 GB. These tables get auto-vacuumed but clearly it's not, well, aggressively reclaiming space.

Given that they are partitions, VACUUM FULL locks the parent table, so new writes/deletes can't happen, and anyway people like to run queries against these tables quite frequently.

Glad to add any detail I can.

On Sun, May 14, 2023 at 11:32 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Sun, May 14, 2023 at 1:46 PM Wells Oliver <wells.oliver@gmail.com> wrote:
Noticed over the last few days how much more space VACUUM FULL reclaims (for obvious reasons) and I am wondering if there are any solutions for a more fruitful VACUUM without the locks of FULL?

There are lots of theories which are all compatible with the minimal amount of info given here.  If the space is freed from the indexes, then just reindexing (which can be done concurrently in recent versions) could reclaim the space. If it is from the table itself then something like pg_squeeze (https://github.com/cybertec-postgresql/pg_squeeze/) might help. Or you can implement your own squeezing, by using a CTE with a ctid scan to delete tuples from the end of the table and reinsert them in the front, then use regular vacuum to truncate the now-free space off the end of the table.

I would think the more fruitful VACUUM would be to prevent the bloat from occuring in the first place but without knowing the root cause I can't offer anything but platitudes there.

Cheers,

Jeff


--



--

Re: Options for more aggressive space reclamation in vacuuming?

От
Peter Geoghegan
Дата:
On Sun, May 14, 2023 at 2:03 PM Wells Oliver <wells.oliver@gmail.com> wrote:
> I think our use-case here is nothing special: just very large tables that are deleted from and re-inserted with some
frequency.

What Postgres version are you using?

If you're not on a more recent version, but upgrade (ideally to 14 or
15) then you may find the index bloat situation to be much improved.

--
Peter Geoghegan



Re: Options for more aggressive space reclamation in vacuuming?

От
Laurenz Albe
Дата:
On Sun, 2023-05-14 at 14:02 -0700, Wells Oliver wrote:
> I think our use-case here is nothing special: just very large tables that are deleted
> from and re-inserted with some frequency. They store monthly data and exist as partitions.
> As an example, July of 2022 (summer months are peak) had 1,630,558,336 rows. We delete
> from this and insert daily. We rarely get new, relevant data once the month is over.
> The n_dead_tup from pg_stat_all_tables here was over 7m rows, and clearing that out
> gave us back nearly 50GB, and the file size estimate on this partition was ~200 GB.
> These tables get auto-vacuumed but clearly it's not, well, aggressively reclaiming space.

As a rule, normal VACUUM will not reclaim space at all (only if after the run, the last
pages of a tabel are empty).  That's why you use range partitioning.  Simply drop the
partition that is expired.  Ideally, you wouldn't go to the trouble of deleting data at all.

Yours,
Laurenz Albe



Re: Options for more aggressive space reclamation in vacuuming?

От
Jeff Janes
Дата:
On Sun, May 14, 2023 at 5:03 PM Wells Oliver <wells.oliver@gmail.com> wrote:
I think our use-case here is nothing special: just very large tables that are deleted from and re-inserted with some frequency. They store monthly data and exist as partitions. As an example, July of 2022 (summer months are peak) had 1,630,558,336 rows. We delete from this and insert daily.

You delete and reinsert every row every day, or just some lesser number of rows each day?  In one giant transaction daily, or in a series of short mostly non-overlapping transactions throughout the day?
 
We rarely get new, relevant data once the month is over. The n_dead_tup from pg_stat_all_tables here was over 7m rows, and clearing that out gave us back nearly 50GB, and the file size estimate on this partition was ~200 GB.

Are these sizes for the entire relation size, or just for the core table (not TOAST, not indexes)?  Index bloat is harder to prevent than table bloat, but is easier to deal with after the fact (reindex concurrently)
 
These tables get auto-vacuumed but clearly it's not, well, aggressively reclaiming space.

50 GB out of 200 GB seems roughly in accord with the default setting of autovacuum_vacuum_scale_factor.  So maybe just lowering that from 0.2 to, say, 0.05 would  be good enough, either globally or just for these tables.
 

Given that they are partitions, VACUUM FULL locks the parent table, so new writes/deletes can't happen, and anyway people like to run queries against these tables quite frequently.

Since you rarely get new data once the month is over, you could lock the partition against concurrent changes (but still allow reads) once it is no longer "on the run" while you copy the data into a new tightly-packed table and index that and add constraints.  Then you need only a very brief access exclusive lock while you detach/drop the old partition and attach the new table in its place.  Of course you can't drop the weaker lock while you acquire the stronger one lest the old table change during the gap so lock management can be a bit tedious, but it is surely less restrictive than a VACUUM FULL.

Cheers,

Jeff

Re: Options for more aggressive space reclamation in vacuuming?

От
Wolfgang Wilhelm
Дата:
Hi,

what do you mean by "agressively reclaiming space"? Do you expect to see more usable space in the file system? If this assumption is true you are mistaken about the way autovacuum works. It deletes dead tuples _in_ the blocks and frees space only when at least a certain amount of blocks at the end of the file with tuple data are empty.

Regards
WW

Am Montag, 15. Mai 2023 um 16:40:47 MESZ hat Jeff Janes <jeff.janes@gmail.com> Folgendes geschrieben:


On Sun, May 14, 2023 at 5:03 PM Wells Oliver <wells.oliver@gmail.com> wrote:
I think our use-case here is nothing special: just very large tables that are deleted from and re-inserted with some frequency. They store monthly data and exist as partitions. As an example, July of 2022 (summer months are peak) had 1,630,558,336 rows. We delete from this and insert daily.

You delete and reinsert every row every day, or just some lesser number of rows each day?  In one giant transaction daily, or in a series of short mostly non-overlapping transactions throughout the day?
 
We rarely get new, relevant data once the month is over. The n_dead_tup from pg_stat_all_tables here was over 7m rows, and clearing that out gave us back nearly 50GB, and the file size estimate on this partition was ~200 GB.

Are these sizes for the entire relation size, or just for the core table (not TOAST, not indexes)?  Index bloat is harder to prevent than table bloat, but is easier to deal with after the fact (reindex concurrently)
 
These tables get auto-vacuumed but clearly it's not, well, aggressively reclaiming space.

50 GB out of 200 GB seems roughly in accord with the default setting of autovacuum_vacuum_scale_factor.  So maybe just lowering that from 0.2 to, say, 0.05 would  be good enough, either globally or just for these tables.
 

Given that they are partitions, VACUUM FULL locks the parent table, so new writes/deletes can't happen, and anyway people like to run queries against these tables quite frequently.

Since you rarely get new data once the month is over, you could lock the partition against concurrent changes (but still allow reads) once it is no longer "on the run" while you copy the data into a new tightly-packed table and index that and add constraints.  Then you need only a very brief access exclusive lock while you detach/drop the old partition and attach the new table in its place.  Of course you can't drop the weaker lock while you acquire the stronger one lest the old table change during the gap so lock management can be a bit tedious, but it is surely less restrictive than a VACUUM FULL.

Cheers,

Jeff

Re: Options for more aggressive space reclamation in vacuuming?

От
Wells Oliver
Дата:
This brings to mind another question.. is there a clever solution for the situation where .001 vacuum factor makes sense for a table at the end of the month where it's reached its general capacity, e.g. maybe 1bn rows, but at the beginning of the month is too aggressive and would cause too frequent vacuuming given the significantly lower volume of records? Simple manual adjustment through a month, or something easier? 

On Mon, May 15, 2023 at 8:21 AM Wolfgang Wilhelm <wolfgang20121964@yahoo.de> wrote:
Hi,

what do you mean by "agressively reclaiming space"? Do you expect to see more usable space in the file system? If this assumption is true you are mistaken about the way autovacuum works. It deletes dead tuples _in_ the blocks and frees space only when at least a certain amount of blocks at the end of the file with tuple data are empty.

Regards
WW

Am Montag, 15. Mai 2023 um 16:40:47 MESZ hat Jeff Janes <jeff.janes@gmail.com> Folgendes geschrieben:


On Sun, May 14, 2023 at 5:03 PM Wells Oliver <wells.oliver@gmail.com> wrote:
I think our use-case here is nothing special: just very large tables that are deleted from and re-inserted with some frequency. They store monthly data and exist as partitions. As an example, July of 2022 (summer months are peak) had 1,630,558,336 rows. We delete from this and insert daily.

You delete and reinsert every row every day, or just some lesser number of rows each day?  In one giant transaction daily, or in a series of short mostly non-overlapping transactions throughout the day?
 
We rarely get new, relevant data once the month is over. The n_dead_tup from pg_stat_all_tables here was over 7m rows, and clearing that out gave us back nearly 50GB, and the file size estimate on this partition was ~200 GB.

Are these sizes for the entire relation size, or just for the core table (not TOAST, not indexes)?  Index bloat is harder to prevent than table bloat, but is easier to deal with after the fact (reindex concurrently)
 
These tables get auto-vacuumed but clearly it's not, well, aggressively reclaiming space.

50 GB out of 200 GB seems roughly in accord with the default setting of autovacuum_vacuum_scale_factor.  So maybe just lowering that from 0.2 to, say, 0.05 would  be good enough, either globally or just for these tables.
 

Given that they are partitions, VACUUM FULL locks the parent table, so new writes/deletes can't happen, and anyway people like to run queries against these tables quite frequently.

Since you rarely get new data once the month is over, you could lock the partition against concurrent changes (but still allow reads) once it is no longer "on the run" while you copy the data into a new tightly-packed table and index that and add constraints.  Then you need only a very brief access exclusive lock while you detach/drop the old partition and attach the new table in its place.  Of course you can't drop the weaker lock while you acquire the stronger one lest the old table change during the gap so lock management can be a bit tedious, but it is surely less restrictive than a VACUUM FULL.

Cheers,

Jeff


--

Re: Options for more aggressive space reclamation in vacuuming?

От
Ron
Дата:
(A billion row table really should be partitioned.  That ship sailed long ago, of course...)

Sure you can "manually" adjust it.  cron is your friend!!

But... how long does it take to vacuum a freshly VACUUM FULLed table?  Because I'm wondering if it would be more efficient to proactively keep the problem at bay by, for example, doing a plain VACUUM every... 2, 3, 4, 6 (take your pick) hours, along with a plain VACUUM right after the monthly purge.

On 5/15/23 11:26, Wells Oliver wrote:
This brings to mind another question.. is there a clever solution for the situation where .001 vacuum factor makes sense for a table at the end of the month where it's reached its general capacity, e.g. maybe 1bn rows, but at the beginning of the month is too aggressive and would cause too frequent vacuuming given the significantly lower volume of records? Simple manual adjustment through a month, or something easier? 

On Mon, May 15, 2023 at 8:21 AM Wolfgang Wilhelm <wolfgang20121964@yahoo.de> wrote:
Hi,

what do you mean by "agressively reclaiming space"? Do you expect to see more usable space in the file system? If this assumption is true you are mistaken about the way autovacuum works. It deletes dead tuples _in_ the blocks and frees space only when at least a certain amount of blocks at the end of the file with tuple data are empty.

Regards
WW

Am Montag, 15. Mai 2023 um 16:40:47 MESZ hat Jeff Janes <jeff.janes@gmail.com> Folgendes geschrieben:


On Sun, May 14, 2023 at 5:03 PM Wells Oliver <wells.oliver@gmail.com> wrote:
I think our use-case here is nothing special: just very large tables that are deleted from and re-inserted with some frequency. They store monthly data and exist as partitions. As an example, July of 2022 (summer months are peak) had 1,630,558,336 rows. We delete from this and insert daily.

You delete and reinsert every row every day, or just some lesser number of rows each day?  In one giant transaction daily, or in a series of short mostly non-overlapping transactions throughout the day?
 
We rarely get new, relevant data once the month is over. The n_dead_tup from pg_stat_all_tables here was over 7m rows, and clearing that out gave us back nearly 50GB, and the file size estimate on this partition was ~200 GB.

Are these sizes for the entire relation size, or just for the core table (not TOAST, not indexes)?  Index bloat is harder to prevent than table bloat, but is easier to deal with after the fact (reindex concurrently)
 
These tables get auto-vacuumed but clearly it's not, well, aggressively reclaiming space.

50 GB out of 200 GB seems roughly in accord with the default setting of autovacuum_vacuum_scale_factor.  So maybe just lowering that from 0.2 to, say, 0.05 would  be good enough, either globally or just for these tables.
 

Given that they are partitions, VACUUM FULL locks the parent table, so new writes/deletes can't happen, and anyway people like to run queries against these tables quite frequently.

Since you rarely get new data once the month is over, you could lock the partition against concurrent changes (but still allow reads) once it is no longer "on the run" while you copy the data into a new tightly-packed table and index that and add constraints.  Then you need only a very brief access exclusive lock while you detach/drop the old partition and attach the new table in its place.  Of course you can't drop the weaker lock while you acquire the stronger one lest the old table change during the gap so lock management can be a bit tedious, but it is surely less restrictive than a VACUUM FULL.

Cheers,

Jeff


--

--
Born in Arizona, moved to Babylonia.

Re: Options for more aggressive space reclamation in vacuuming?

От
Wells Oliver
Дата:
The billion row table _is_ the monthly partition! We are in the process of moving this scale of data to Redshift anyway, but that's a topic for another mailing list. Just trying to nail down what I can in the interim. I appreciate all the insights in this thread.


On Mon, May 15, 2023 at 10:04 AM Ron <ronljohnsonjr@gmail.com> wrote:
(A billion row table really should be partitioned.  That ship sailed long ago, of course...)

Sure you can "manually" adjust it.  cron is your friend!!

But... how long does it take to vacuum a freshly VACUUM FULLed table?  Because I'm wondering if it would be more efficient to proactively keep the problem at bay by, for example, doing a plain VACUUM every... 2, 3, 4, 6 (take your pick) hours, along with a plain VACUUM right after the monthly purge.

On 5/15/23 11:26, Wells Oliver wrote:
This brings to mind another question.. is there a clever solution for the situation where .001 vacuum factor makes sense for a table at the end of the month where it's reached its general capacity, e.g. maybe 1bn rows, but at the beginning of the month is too aggressive and would cause too frequent vacuuming given the significantly lower volume of records? Simple manual adjustment through a month, or something easier? 

On Mon, May 15, 2023 at 8:21 AM Wolfgang Wilhelm <wolfgang20121964@yahoo.de> wrote:
Hi,

what do you mean by "agressively reclaiming space"? Do you expect to see more usable space in the file system? If this assumption is true you are mistaken about the way autovacuum works. It deletes dead tuples _in_ the blocks and frees space only when at least a certain amount of blocks at the end of the file with tuple data are empty.

Regards
WW

Am Montag, 15. Mai 2023 um 16:40:47 MESZ hat Jeff Janes <jeff.janes@gmail.com> Folgendes geschrieben:


On Sun, May 14, 2023 at 5:03 PM Wells Oliver <wells.oliver@gmail.com> wrote:
I think our use-case here is nothing special: just very large tables that are deleted from and re-inserted with some frequency. They store monthly data and exist as partitions. As an example, July of 2022 (summer months are peak) had 1,630,558,336 rows. We delete from this and insert daily.

You delete and reinsert every row every day, or just some lesser number of rows each day?  In one giant transaction daily, or in a series of short mostly non-overlapping transactions throughout the day?
 
We rarely get new, relevant data once the month is over. The n_dead_tup from pg_stat_all_tables here was over 7m rows, and clearing that out gave us back nearly 50GB, and the file size estimate on this partition was ~200 GB.

Are these sizes for the entire relation size, or just for the core table (not TOAST, not indexes)?  Index bloat is harder to prevent than table bloat, but is easier to deal with after the fact (reindex concurrently)
 
These tables get auto-vacuumed but clearly it's not, well, aggressively reclaiming space.

50 GB out of 200 GB seems roughly in accord with the default setting of autovacuum_vacuum_scale_factor.  So maybe just lowering that from 0.2 to, say, 0.05 would  be good enough, either globally or just for these tables.
 

Given that they are partitions, VACUUM FULL locks the parent table, so new writes/deletes can't happen, and anyway people like to run queries against these tables quite frequently.

Since you rarely get new data once the month is over, you could lock the partition against concurrent changes (but still allow reads) once it is no longer "on the run" while you copy the data into a new tightly-packed table and index that and add constraints.  Then you need only a very brief access exclusive lock while you detach/drop the old partition and attach the new table in its place.  Of course you can't drop the weaker lock while you acquire the stronger one lest the old table change during the gap so lock management can be a bit tedious, but it is surely less restrictive than a VACUUM FULL.

Cheers,

Jeff


--

--
Born in Arizona, moved to Babylonia.


--

Re: Options for more aggressive space reclamation in vacuuming?

От
Wolfgang Wilhelm
Дата:
If you are playing with this kind of thoughts how about rising the fill factor for the partitions which aren't updated anymore? Of course you have to copy data from the "sparse" partition to a new "dense" partition with detaching the sparse and attaching the dense one. From the space point of view a vacuum full is about the same.

Am Montag, 15. Mai 2023 um 18:26:46 MESZ hat Wells Oliver <wells.oliver@gmail.com> Folgendes geschrieben:


This brings to mind another question.. is there a clever solution for the situation where .001 vacuum factor makes sense for a table at the end of the month where it's reached its general capacity, e.g. maybe 1bn rows, but at the beginning of the month is too aggressive and would cause too frequent vacuuming given the significantly lower volume of records? Simple manual adjustment through a month, or something easier? 

On Mon, May 15, 2023 at 8:21 AM Wolfgang Wilhelm <wolfgang20121964@yahoo.de> wrote:
Hi,

what do you mean by "agressively reclaiming space"? Do you expect to see more usable space in the file system? If this assumption is true you are mistaken about the way autovacuum works. It deletes dead tuples _in_ the blocks and frees space only when at least a certain amount of blocks at the end of the file with tuple data are empty.

Regards
WW

Am Montag, 15. Mai 2023 um 16:40:47 MESZ hat Jeff Janes <jeff.janes@gmail.com> Folgendes geschrieben:


On Sun, May 14, 2023 at 5:03 PM Wells Oliver <wells.oliver@gmail.com> wrote:
I think our use-case here is nothing special: just very large tables that are deleted from and re-inserted with some frequency. They store monthly data and exist as partitions. As an example, July of 2022 (summer months are peak) had 1,630,558,336 rows. We delete from this and insert daily.

You delete and reinsert every row every day, or just some lesser number of rows each day?  In one giant transaction daily, or in a series of short mostly non-overlapping transactions throughout the day?
 
We rarely get new, relevant data once the month is over. The n_dead_tup from pg_stat_all_tables here was over 7m rows, and clearing that out gave us back nearly 50GB, and the file size estimate on this partition was ~200 GB.

Are these sizes for the entire relation size, or just for the core table (not TOAST, not indexes)?  Index bloat is harder to prevent than table bloat, but is easier to deal with after the fact (reindex concurrently)
 
These tables get auto-vacuumed but clearly it's not, well, aggressively reclaiming space.

50 GB out of 200 GB seems roughly in accord with the default setting of autovacuum_vacuum_scale_factor.  So maybe just lowering that from 0.2 to, say, 0.05 would  be good enough, either globally or just for these tables.
 

Given that they are partitions, VACUUM FULL locks the parent table, so new writes/deletes can't happen, and anyway people like to run queries against these tables quite frequently.

Since you rarely get new data once the month is over, you could lock the partition against concurrent changes (but still allow reads) once it is no longer "on the run" while you copy the data into a new tightly-packed table and index that and add constraints.  Then you need only a very brief access exclusive lock while you detach/drop the old partition and attach the new table in its place.  Of course you can't drop the weaker lock while you acquire the stronger one lest the old table change during the gap so lock management can be a bit tedious, but it is surely less restrictive than a VACUUM FULL.

Cheers,

Jeff


--