Обсуждение: Vacuum full: alternatives?

Поиск
Список
Период
Сортировка

Vacuum full: alternatives?

От
Job
Дата:
Hello,

we have a table with an heavy traffic of pg_bulkload and delete of records.
The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.

We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.

We do not delete everything at one (in this case the truncate woudl resolve the problem).

The autovacuum is not able (same for normal vacuum) to free the spaces.

Are there some suggestions or another way to manage this?

Thank you!
Francesco

Re: Vacuum full: alternatives?

От
Rakesh Kumar
Дата:
Any reason why you need the space back? What is wrong with space remaining constant at 4GB.



From: Job <Job@colliniconsulting.it>
To: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:18 AM
Subject: [GENERAL] Vacuum full: alternatives?

Hello, 

we have a table with an heavy traffic of pg_bulkload and delete of records.
The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.

We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.

We do not delete everything at one (in this case the truncate woudl resolve the problem).

The autovacuum is not able (same for normal vacuum) to free the spaces.

Are there some suggestions or another way to manage this?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Vacuum full: alternatives?

От
Andreas Kretschmer
Дата:

Am 20.06.2016 um 11:18 schrieb Job:
> Hello,
>
> we have a table with an heavy traffic of pg_bulkload and delete of records.
> The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.
>
> We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
> But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.
>
> We do not delete everything at one (in this case the truncate woudl resolve the problem).
>
> The autovacuum is not able (same for normal vacuum) to free the spaces.
>

autovaccum marks space as free, but don't give the space back to os.

I would suggest run only autovacuum, and with time you will see a not
more growing table. There is no need for vacuum full.

Andreas


R: Vacuum full: alternatives?

От
Job
Дата:
Hi Rakesh,
 
if i do not free disk space, after some days disk can become full.
Everyday we have a lot of pg_bulkload and delete.
 
Thank you!
Francesco
 

Da: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] per conto di Rakesh Kumar [rakeshkumar464a3@gmail.com]
Inviato: lunedì 20 giugno 2016 11.34
A: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Vacuum full: alternatives?

Any reason why you need the space back? What is wrong with space remaining constant at 4GB.



From: Job <Job@colliniconsulting.it>
To: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:18 AM
Subject: [GENERAL] Vacuum full: alternatives?

Hello, 

we have a table with an heavy traffic of pg_bulkload and delete of records.
The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.

We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.

We do not delete everything at one (in this case the truncate woudl resolve the problem).

The autovacuum is not able (same for normal vacuum) to free the spaces.

Are there some suggestions or another way to manage this?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

R: Vacuum full: alternatives?

От
Job
Дата:
Hi Andreas,

>I would suggest run only autovacuum, and with time you will see a not
>more growing table. There is no need for vacuum full.

So new record, when will be pg_bulkloaded, will replace "marked-free" location?

Thank you!
Francesco


________________________________________
Da: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] per conto di Andreas Kretschmer
[andreas@a-kretschmer.de]
Inviato: lunedì 20 giugno 2016 11.37
A: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Vacuum full: alternatives?

Am 20.06.2016 um 11:18 schrieb Job:
> Hello,
>
> we have a table with an heavy traffic of pg_bulkload and delete of records.
> The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.
>
> We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
> But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.
>
> We do not delete everything at one (in this case the truncate woudl resolve the problem).
>
> The autovacuum is not able (same for normal vacuum) to free the spaces.
>

autovaccum marks space as free, but don't give the space back to os.

I would suggest run only autovacuum, and with time you will see a not
more growing table. There is no need for vacuum full.

Andreas


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: R: Vacuum full: alternatives?

От
Robert Wysocki
Дата:


On Mon, 2016-06-20 at 11:43 +0200, Job wrote:
> Hi Andreas,
>
> >I would suggest run only autovacuum, and with time you will see a not
> >more growing table. There is no need for vacuum full.
>
> So new record, when will be pg_bulkloaded, will replace "marked-free" location?

Yes, but you may have to make autovacuum process more aggressive. Even
then it might not cope with the frequency of your bulk I/U/D.

There are many tools to use instead of VACUUM FULL though, have a look
at pg_reorg and pgcompact for example. Do not be afraid to use an
awesome tool called Google as well ;-) (This is like the very basic
problem everyone asks about, so you'll find many more in-depth answers
and articles; the phrase you want to google for is "postgresql bloat")

Cheers,
R.


NET-A-PORTER.COM




CONFIDENTIALITY NOTICE
The information in this email is confidential and is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, you must not read, use or disseminate the information. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Net-A-Porter Group Limited.

The Net-A-Porter Group Limited is a company registered in England & Wales Number: 3820604 Registered Office: 1 The Village Offices, Westfield, Ariel Way, London, W12 7GF



 

Re: R: Vacuum full: alternatives?

От
Andreas Kretschmer
Дата:

Am 20.06.2016 um 11:43 schrieb Job:
> Hi Andreas,
>
>> I would suggest run only autovacuum, and with time you will see a not
>> more growing table. There is no need for vacuum full.
> So new record, when will be pg_bulkloaded, will replace "marked-free" location?


exactly, that's the task for vacuum


Andreas


Re: R: Vacuum full: alternatives?

От
Rakesh Kumar
Дата:
But then autovaccum avoids this. Granted it won't give back free space to OS, but it won't let it grow too (or am I missing something).



From: Job <Job@colliniconsulting.it>
To: Rakesh Kumar <rakeshkumar464a3@gmail.com>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:39 AM
Subject: R: [GENERAL] Vacuum full: alternatives?

#yiv6783361115 #yiv6783361115 --P { MARGIN-BOTTOM:0px;MARGIN-TOP:0px;} #yiv6783361115
Hi Rakesh,
 
if i do not free disk space, after some days disk can become full.
Everyday we have a lot of pg_bulkload and delete.
 
Thank you!
Francesco
 

Da: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] per conto di Rakesh Kumar [rakeshkumar464a3@gmail.com]
Inviato: lunedì 20 giugno 2016 11.34
A: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Vacuum full: alternatives?

Any reason why you need the space back? What is wrong with space remaining constant at 4GB.



From: Job <Job@colliniconsulting.it>
To: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:18 AM
Subject: [GENERAL] Vacuum full: alternatives?

Hello, 

we have a table with an heavy traffic of pg_bulkload and delete of records.
The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.

We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.

We do not delete everything at one (in this case the truncate woudl resolve the problem).

The autovacuum is not able (same for normal vacuum) to free the spaces.

Are there some suggestions or another way to manage this?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



Re: R: Vacuum full: alternatives?

От
Melvin Davidson
Дата:


On Mon, Jun 20, 2016 at 6:13 AM, Andreas Kretschmer <andreas@a-kretschmer.de> wrote:


Am 20.06.2016 um 11:43 schrieb Job:
Hi Andreas,

I would suggest run only autovacuum, and with time you will see a not
more growing table. There is no need for vacuum full.
So new record, when will be pg_bulkloaded, will replace "marked-free" location?


exactly, that's the task for vacuum



Andreas


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


>We do not delete everything at one (in this case the truncate woudl resolve the problem).

Please, it is very important you provide PostgreSQL version & O/S, as improvements to VACUUM may play a role here.

Is there any reason you cannot partition the table? Moving the data to separate partitions
(based on a date or key field) will allow you to vacuum full only 1 partition at a time.

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: R: Vacuum full: alternatives?

От
Melvin Davidson
Дата:

On Mon, Jun 20, 2016 at 8:29 AM, Rakesh Kumar <rakeshkumar464a3@gmail.com> wrote:
But then autovaccum avoids this. Granted it won't give back free space to OS, but it won't let it grow too (or am I missing something).



From: Job <Job@colliniconsulting.it>
To: Rakesh Kumar <rakeshkumar464a3@gmail.com>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:39 AM
Subject: R: [GENERAL] Vacuum full: alternatives?

Hi Rakesh,
 
if i do not free disk space, after some days disk can become full.
Everyday we have a lot of pg_bulkload and delete.
 
Thank you!
Francesco
 

Da: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] per conto di Rakesh Kumar [rakeshkumar464a3@gmail.com]
Inviato: lunedì 20 giugno 2016 11.34
A: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Vacuum full: alternatives?

Any reason why you need the space back? What is wrong with space remaining constant at 4GB.



From: Job <Job@colliniconsulting.it>
To: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:18 AM
Subject: [GENERAL] Vacuum full: alternatives?

Hello, 

we have a table with an heavy traffic of pg_bulkload and delete of records.
The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.

We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.

We do not delete everything at one (in this case the truncate woudl resolve the problem).

The autovacuum is not able (same for normal vacuum) to free the spaces.

Are there some suggestions or another way to manage this?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




> but it won't let it grow too (or am I missing something).

Yes, you are missing something. By partioning and {Vacuum Full only the table with data no longer needed}, the rest of the data remains available to the users
AND space is reclaimed by the O/S, so it's the best of both worlds.

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: R: Vacuum full: alternatives?

От
Adarsh Sharma
Дата:


On Mon, Jun 20, 2016 at 6:20 PM, Melvin Davidson <melvin6925@gmail.com> wrote:

On Mon, Jun 20, 2016 at 8:29 AM, Rakesh Kumar <rakeshkumar464a3@gmail.com> wrote:
But then autovaccum avoids this. Granted it won't give back free space to OS, but it won't let it grow too (or am I missing something).



From: Job <Job@colliniconsulting.it>
To: Rakesh Kumar <rakeshkumar464a3@gmail.com>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:39 AM
Subject: R: [GENERAL] Vacuum full: alternatives?

Hi Rakesh,
 
if i do not free disk space, after some days disk can become full.
Everyday we have a lot of pg_bulkload and delete.
 
Thank you!
Francesco
 

Da: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] per conto di Rakesh Kumar [rakeshkumar464a3@gmail.com]
Inviato: lunedì 20 giugno 2016 11.34
A: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Vacuum full: alternatives?

Any reason why you need the space back? What is wrong with space remaining constant at 4GB.



From: Job <Job@colliniconsulting.it>
To: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:18 AM
Subject: [GENERAL] Vacuum full: alternatives?

Hello, 

we have a table with an heavy traffic of pg_bulkload and delete of records.
The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.

We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.

We do not delete everything at one (in this case the truncate woudl resolve the problem).

The autovacuum is not able (same for normal vacuum) to free the spaces.

Are there some suggestions or another way to manage this?

Thank you!
Francesco

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




> but it won't let it grow too (or am I missing something).

Yes, you are missing something. By partioning and {Vacuum Full only the table with data no longer needed}, the rest of the data remains available to the users
AND space is reclaimed by the O/S, so it's the best of both worlds.

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.



---

Few things you can try :

1. Partition your table daily
2. Tune your autovacuum parameters if you think autovacuum is not keeping up with the fragmentation speed. fore.g :

alter table table_name set (autovacuum_enabled=true, autovacuum_vacuum_threshold=5000, autovacuum_analyze_threshold=5000, autovacuum_vacuum_scale_factor=0.1, autovacuum_analyze_scale_factor=0.2);

3. If you can recreate/alter your table, create/alter with a fillfactor of 20 so that your deleted rows resides in the same page.It might use extra space but you will face less fragmentation problems.
link : https://www.postgresql.org/docs/8.3/static/sql-createtable.html

However, i have faced one problem in past where we have streaming replication setup of one master and 4 slaves. After all these tunings , autovacuum is not able to remove dead tuples and queries are getting slower and slower.
After stopping all applications and streaming replicated slaves, i was able to defrag the table properly. The doc says autovacuum will not remove any dead tuples if it has any reference to those  dead tuples anywhere but i am not sure how to find those dead tuples which are still being referenced :)

Thanks,
Adarsh Sharma


Re: R: Vacuum full: alternatives?

От
Martín Marqués
Дата:
El 20/06/16 a las 09:50, Melvin Davidson escribió:
>
>
>>but it won't let it grow too (or am I missing something).
>
> Yes, you are missing something. By partioning and {Vacuum Full only the
> table with data no longer needed}, the rest of the data remains
> available to the users
> AND space is reclaimed by the O/S, so it's the best of both worlds.

That's not entirely true. Think about a SELECT which has to scan all
child tables.

Your are also adding another layer of complexity to the system.

--
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Re: R: Vacuum full: alternatives?

От
Vik Fearing
Дата:
On 20/06/16 16:23, Martín Marqués wrote:
> El 20/06/16 a las 09:50, Melvin Davidson escribió:
>>
>>
>>> but it won't let it grow too (or am I missing something).
>>
>> Yes, you are missing something. By partioning and {Vacuum Full only the
>> table with data no longer needed}, the rest of the data remains
>> available to the users
>> AND space is reclaimed by the O/S, so it's the best of both worlds.
>
> That's not entirely true. Think about a SELECT which has to scan all
> child tables.

Or any SELECT on the parent at all.  The planner needs to examine the
CHECK constraints on the children and can't do it if the child is locked
in ACCESS EXCLUSIVE mode.
--
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: R: Vacuum full: alternatives?

От
Alex Ignatov
Дата:
On 20.06.2016 17:30, Vik Fearing wrote:
> On 20/06/16 16:23, Martín Marqués wrote:
>> El 20/06/16 a las 09:50, Melvin Davidson escribió:
>>>
>>>> but it won't let it grow too (or am I missing something).
>>> Yes, you are missing something. By partioning and {Vacuum Full only the
>>> table with data no longer needed}, the rest of the data remains
>>> available to the users
>>> AND space is reclaimed by the O/S, so it's the best of both worlds.
>> That's not entirely true. Think about a SELECT which has to scan all
>> child tables.
> Or any SELECT on the parent at all.  The planner needs to examine the
> CHECK constraints on the children and can't do it if the child is locked
> in ACCESS EXCLUSIVE mode.
+1


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: R: Vacuum full: alternatives?

От
Martín Marqués
Дата:
2016-06-20 11:30 GMT-03:00 Vik Fearing <vik@2ndquadrant.fr>:
> On 20/06/16 16:23, Martín Marqués wrote:
>>
>> That's not entirely true. Think about a SELECT which has to scan all
>> child tables.
>
> Or any SELECT on the parent at all.  The planner needs to examine the
> CHECK constraints on the children and can't do it if the child is locked
> in ACCESS EXCLUSIVE mode.

Yeah, totally skipped my mind that, so partitioning is actually a bad
idea, if that's all they are looking to solve.

Thanks Vik for showing the oversight

--
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Re: R: Vacuum full: alternatives?

От
Jeff Janes
Дата:
On Mon, Jun 20, 2016 at 3:13 AM, Andreas Kretschmer
<andreas@a-kretschmer.de> wrote:
>
>
> Am 20.06.2016 um 11:43 schrieb Job:
>>
>> Hi Andreas,
>>
>>> I would suggest run only autovacuum, and with time you will see a not
>>> more growing table. There is no need for vacuum full.
>>
>> So new record, when will be pg_bulkloaded, will replace "marked-free"
>> location?
>
> exactly, that's the task for vacuum

Are you sure that that is the case with pg_bulkload specifically?  It
bypasses the shared buffers, so it would not surprise  me if it
bypasses the free space map as well, and thus always appends its data
to the end of the table.


Cheers,

Jeff


Re: R: Vacuum full: alternatives?

От
Scott Mead
Дата:


On Mon, Jun 20, 2016 at 6:13 AM, Andreas Kretschmer <andreas@a-kretschmer.de> wrote:


Am 20.06.2016 um 11:43 schrieb Job:
Hi Andreas,

I would suggest run only autovacuum, and with time you will see a not
more growing table. There is no need for vacuum full.
So new record, when will be pg_bulkloaded, will replace "marked-free" location?


exactly, that's the task for vacuum


I believe that free space is only available to UPDATE, not INSERT.

 


Andreas


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
--
Scott Mead
Sr. Architect
OpenSCG

Re: R: Vacuum full: alternatives?

От
Melvin Davidson
Дата:


On Mon, Jun 20, 2016 at 11:03 AM, Scott Mead <scottm@openscg.com> wrote:


On Mon, Jun 20, 2016 at 6:13 AM, Andreas Kretschmer <andreas@a-kretschmer.de> wrote:


Am 20.06.2016 um 11:43 schrieb Job:
Hi Andreas,

I would suggest run only autovacuum, and with time you will see a not
more growing table. There is no need for vacuum full.
So new record, when will be pg_bulkloaded, will replace "marked-free" location?


exactly, that's the task for vacuum


I believe that free space is only available to UPDATE, not INSERT.

 


Andreas


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
--
Scott Mead
Sr. Architect
OpenSCG


Martin and Vik,

>...Think about a SELECT which has to scan all child tables.

You are really digging for a corner case.
If a scan has to scan all child tables, then
A. it negates the ability to make partitions which are not used
and
B. The SELECT query is poorly crafted.

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: R: Vacuum full: alternatives?

От
Guillaume Lelarge
Дата:
2016-06-20 17:03 GMT+02:00 Scott Mead <scottm@openscg.com>:


On Mon, Jun 20, 2016 at 6:13 AM, Andreas Kretschmer <andreas@a-kretschmer.de> wrote:


Am 20.06.2016 um 11:43 schrieb Job:
Hi Andreas,

I would suggest run only autovacuum, and with time you will see a not
more growing table. There is no need for vacuum full.
So new record, when will be pg_bulkloaded, will replace "marked-free" location?


exactly, that's the task for vacuum


I believe that free space is only available to UPDATE, not INSERT.


No, it's available for both.


--

Re: R: Vacuum full: alternatives?

От
Martín Marqués
Дата:
El 20/06/16 a las 12:06, Melvin Davidson escribió:
>
> Martin and Vik,
>
>>...Think about a SELECT which has to scan all child tables.
>
> You are really digging for a corner case.
> If a scan has to scan all child tables, then
> A. it negates the ability to make partitions which are not used
> and
> B. The SELECT query is poorly crafted.

And you haven't read Vik's reply. :)

--
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Re: R: Vacuum full: alternatives?

От
Melvin Davidson
Дата:


On Mon, Jun 20, 2016 at 11:18 AM, Martín Marqués <martin@2ndquadrant.com> wrote:
El 20/06/16 a las 12:06, Melvin Davidson escribió:
>
> Martin and Vik,
>
>>...Think about a SELECT which has to scan all child tables.
>
> You are really digging for a corner case.
> If a scan has to scan all child tables, then
> A. it negates the ability to make partitions which are not used
> and
> B. The SELECT query is poorly crafted.

And you haven't read Vik's reply. :)

--
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

>And you haven't read Vik's reply. :)
Yes I have. Vacuum wll not lock all tables at once, only the ones it is currently working on, so the planner may have a slight delay,
but it will not be gigantic.
I have proposed a reasonable solution to solve the problem in it's entirety. Do you have a better one?

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: R: Vacuum full: alternatives?

От
Martín Marqués
Дата:
El 20/06/16 a las 11:52, Jeff Janes escribió:
> On Mon, Jun 20, 2016 at 3:13 AM, Andreas Kretschmer
> <andreas@a-kretschmer.de> wrote:
>>
>>
>> Am 20.06.2016 um 11:43 schrieb Job:
>>>
>>> Hi Andreas,
>>>
>>>> I would suggest run only autovacuum, and with time you will see a not
>>>> more growing table. There is no need for vacuum full.
>>>
>>> So new record, when will be pg_bulkloaded, will replace "marked-free"
>>> location?
>>
>> exactly, that's the task for vacuum
>
> Are you sure that that is the case with pg_bulkload specifically?  It
> bypasses the shared buffers, so it would not surprise  me if it
> bypasses the free space map as well, and thus always appends its data
> to the end of the table.

I didn't do a super intensive check of pg_bulkload, but AFAICS it does
batches of COPY with PQputCopyData.

If the relation has free space which was reclaimed by vacuum/autovacuum
it will try to use that space and not extend the relation (which is more
expensive). This happens if used space on those pages is lower than the
fillfactor set for that table.

IMO, he should start setting autovacuum more aggressively, or running
aggressive vacuum, and see how that works.

Also, install pgstattuple and check free space on the relation to see
how much dead tuples and free space there is.

Those are my 2 cents.

--
Martín Marqués                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Re: R: Vacuum full: alternatives?

От
John R Pierce
Дата:
On 6/20/2016 8:03 AM, Scott Mead wrote:
>
> I believe that free space is only available to UPDATE, not INSERT.

incorrect.   in fact, an update is performed identically to an INSERT +
DELETE(old)


--
john r pierce, recycling bits in santa cruz



Re: Vacuum full: alternatives?

От
Chris Ernst
Дата:
On 06/20/2016 03:18 AM, Job wrote:
> Hello,
>
> we have a table with an heavy traffic of pg_bulkload and delete of records.
> The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.
>
> We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
> But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.
>
> We do not delete everything at one (in this case the truncate woudl resolve the problem).
>
> The autovacuum is not able (same for normal vacuum) to free the spaces.
>
> Are there some suggestions or another way to manage this?

Hi Francesco,

We use pg_repack (http://reorg.github.io/pg_repack/) for a similar
workload.  It allows what amounts to an online vacuum full.  The only
caveat is that you need to have the available disk space to fully
rebuild the table in parallel.

Hope that helps.

Cheers!

    - Chris


Re: R: Vacuum full: alternatives?

От
Jeff Janes
Дата:
On Mon, Jun 20, 2016 at 7:23 AM, Martín Marqués <martin@2ndquadrant.com> wrote:
> El 20/06/16 a las 09:50, Melvin Davidson escribió:
>>
>>
>>>but it won't let it grow too (or am I missing something).
>>
>> Yes, you are missing something. By partioning and {Vacuum Full only the
>> table with data no longer needed}, the rest of the data remains
>> available to the users
>> AND space is reclaimed by the O/S, so it's the best of both worlds.
>
> That's not entirely true. Think about a SELECT which has to scan all
> child tables.
>

Yes, for the partitioning to be a good option, you would probably have
to arrange it such that you can prove that all tuples in a given
partition are eligible for deletion (or have already been deleted),
and then either truncate or dis-inherit the partition.  That still
requires a stringent lock, but it is only held for a very short time.

> Your are also adding another layer of complexity to the system.

I think that using pg_bulkload adds more complexity to the system than
partitioning would.  I wonder if they really need to use that, or if
they just picked it over COPY because it sounded like a free lunch.

I've just tested pg_bulkload with the default settings, and it
definitely isn't using the fsm to re-use freed space in the table.  If
they use WRITER = BUFFERED it would, though.

Cheers,

Jeff


Re: Vacuum full: alternatives?

От
"David G. Johnston"
Дата:
On Monday, June 20, 2016, John R Pierce <pierce@hogranch.com> wrote:
On 6/20/2016 8:03 AM, Scott Mead wrote:

I believe that free space is only available to UPDATE, not INSERT.

incorrect.   in fact, an update is performed identically to an INSERT + DELETE(old)


Except for heap-only-tuple optimization, right?  We cannot build a HOT chain if the user requests a delete separately since their is no longer an association to trace from the old record.

I suspect this affects free space usage to some degree as well but I agree and believe that the reclaimed space is not forbidden to be used (I wouldn't rely on my word though and haven't tried to find relevant documentation).

David J.

 

Re: Vacuum full: alternatives?

От
John R Pierce
Дата:
On 6/20/2016 8:51 AM, David G. Johnston wrote:

incorrect.   in fact, an update is performed identically to an INSERT + DELETE(old)


Except for heap-only-tuple optimization, right?  We cannot build a HOT chain if the user requests a delete separately since their is no longer an association to trace from the old record.

I suspect this affects free space usage to some degree as well but I agree and believe that the reclaimed space is not forbidden to be used (I wouldn't rely on my word though and haven't tried to find relevant documentation).

yeah, HOT only works on updates that don't modify any indexed fields, and only if there's adequate free space in the same block.   If you have a update intensive table thats a candidate for HOT, I've been recommending setting that table's fill factor to 50-70% prior to populating it to leave freespace in every block.



-- 
john r pierce, recycling bits in santa cruz

Re: R: Vacuum full: alternatives?

От
Vik Fearing
Дата:
On 20/06/16 17:25, Melvin Davidson wrote:
>>And you haven't read Vik's reply. :)
>
> Yes I have. Vacuum wll not lock all tables at once, only the ones it is
> currently working on, so the planner may have a slight delay,
> but it will not be gigantic.

I think you should try it.

> I have proposed a reasonable solution to solve the problem in it's
> entirety. Do you have a better one?

You mean by partitioning?  That doesn't really solve any problem, except
that vacfull-ing a partition should be faster than doing the whole
enchilada.
--
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


Re: R: Vacuum full: alternatives?

От
Melvin Davidson
Дата:
On Mon, Jun 20, 2016 at 1:53 PM, Vik Fearing <vik@2ndquadrant.fr> wrote:
On 20/06/16 17:25, Melvin Davidson wrote:
>>And you haven't read Vik's reply. :)
>
> Yes I have. Vacuum wll not lock all tables at once, only the ones it is
> currently working on, so the planner may have a slight delay,
> but it will not be gigantic.

I think you should try it.

> I have proposed a reasonable solution to solve the problem in it's
> entirety. Do you have a better one?

You mean by partitioning?  That doesn't really solve any problem, except
that vacfull-ing a partition should be faster than doing the whole
enchilada.
--
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Vik,
Your comments make no sense to me.

>Or any SELECT on the parent at all.  The planner needs to examine the
>CHECK constraints on the children and can't do it if the child is locked
>in ACCESS EXCLUSIVE mode.

Nowhere in the documentation does it say that the planner needs to take locks
or is even concerned with them. Locks are transient, so they do not figure into
the query plan. If I am wrong, kindly point me to the documentation or url that
shows contrary.

>I think you should try it.

Why would I even attempt that? We do not know the PostgreSQL version or O/S as yet.
I do not have any info regarding table structure or any data. I have given a suggestion
that will probably help solve the problem. I am not here to do any actual work.

>That doesn't really solve any problem, except
>that vacfull-ing a partition should be faster than doing the whole
>enchilada.

That is exactly the point, because based on the original problem description, the
data is transient.

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: Vacuum full: alternatives?

От
Scott Marlowe
Дата:
On Mon, Jun 20, 2016 at 3:18 AM, Job <Job@colliniconsulting.it> wrote:
>
> Hello,
>
> we have a table with an heavy traffic of pg_bulkload and delete of records.
> The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.
>
> We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
> But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.
>
> We do not delete everything at one (in this case the truncate woudl resolve the problem).
>
> The autovacuum is not able (same for normal vacuum) to free the spaces.
>
> Are there some suggestions or another way to manage this?

First off, from your comments I'm not sure you really get postgresql's
way of freeing space and reusing it via autovacuum. Basically
postgresql, autovacuum process marks space as free, and the backend
writes new data (inserts or updates) into the free space. You
eventually reach equilibrium of a sort when the vacuum is freeing up
space as quickly as it's being consumed, or faster. The problem occurs
when vacuum can't keep up with your delete / write and update rate
combined. If this is happening you need to:

A: Make sure your IO Subsystem is fast enough to handle BOTH your
update rate AND your vacuuming needed to keep up, You're better off
with a machine that can do 15,000 transactions per second running a
load of 1,000 than trying to handle it with a machine that can do
1,500 tps etc. Sizing the hardware is a whole other conversation.

AND

B: Make your autovacuum aggressive enough to NOT fall behind.

It's important to remember that autovacuum was built and designed in a
time when most databases lived on spinning media. It's designed to not
overload spinning discs with too much random IO. A super fast RAID-10
array from that time period could do 200 to 1,000 transactions per
second and that only with a top notch RAID controller etc. Regular
spinning discs have a maximum random write ops per second that measure
in the 100 per second range.

My laptop with no pgsql optimizations, can do 850 tps on it's SSD. A
server with 10 SSDs in RAID-5 can do 15,000 tps.  If you have a fast
IO subsystem and wish to utilize it with pgsql you're going to have to
examine whether or not autovacuum with default settings is fast enough
to keep up. Remmeber, Auto-vacuum, by design, is slow and tries not to
get in the way. It's fast enough for most mundane uses, but can't keep
up with a fast machine running hard. The default settings for
autovacuum to look at here are first these two.:

autovacuum_vacuum_cost_delay = 20ms
autovacuum_vacuum_cost_limit = 200

They govern how hard autovac works. By default autovac doesn't work
hard. Making it work too hard for a given machine can cause system
performance issues. I.e. it "gets in the way".

Lowering cost_delay is usually enough. As you approach 1ms autovac
starts to use a lot more bandwidth. I find that even on pretty fast
machines that are routinely doing 1,000 writes per second or more, 3ms
is fast enough to keep up with a cost limit of 200. 5ms is a good
compromise without getting too aggressive.

In contrast to autovacuum, REGULAR vacuum, by default, runs at full
throttle. It hits your db server hard, performance-wise. It has  zero
cost delay, so it works very hard. If you run it midday on a hard
working server you will almost certainly see the performance drop. The
difference between regular vacuum with a delay time of 0 and autovac
with a delay of 20ms is huge.

These settings become important if you have a LOT of tables or dbs.
Otherwise they're probably fine.

autovacuum_max_workers =3 # Adjust this last, unless you have
thousands of tables or dbs.
autovacuum_naptime = 1 min # How long to wait before checking the next
db. Default is usually fine unless you have a lot of dbs.

These settings tell autovacuum when to kick in. Keeping these low
enough to keep autovac busy is a good idea too:

autovacuum_vacuum_scale_factor
autovacuum_vacuum_threshold
autovacuum_analyze_scale_factor
autovacuum_analyze_threshold

I tend to go for threshold, which is an absolute number of rows
changed before autovac kicks off. Scale factor can be dangerous
because what seems small at the beginning, gets big fast. If it's 0.1
then that's 10%. 10% of 1,000 is 100 rows, but 10% of 1,000,000 is
100,000, which is a LOT of rows to ignore until you have more than
that that need vacuuming. Setting it to something like 100 or 1,000
will keep your db from growing hundreds of thousands ofr dead tuples
in a big table.

Either way you need to make sure your autovacuum is aggressive enough
to keep up with your db's throughput.

Checking for bloat. You can see what parts of your db are getting too
big.  First, go here:
https://www.keithf4.com/checking-for-postgresql-bloat/

The script there will let you check all your tables AND indexes for
bloat. This will let you know if you've got a simple space problem or
a vacuuming problem.

Assuming you DO have bloating, one of the first things you can do is
rebuild all the indexes except for PK ones with new indexes then drop
the old ones. On some tables this saves a LOT of space. Since you can
"create index concurrently ... " this is a non-blocking operation.

You can get a list of indexes for a table:

select indexdef from pg_indexes where tablename='pgbench_accounts';

Then build "create index concurrently ..." statements for each one.

Assuming the table is bloated and you HAVE to recover space for normal
operation (let's say you've got 80MB of data in a 200GB table etc)...
I'm gonna head in a different direction here. Slony.

Slony can subscribe your table, or your whole db, up to you, to a new
one, either on a different machine or on the same machine. It's
actually pretty easy to set it up and subscribe one table, get it
caught up, drop access to db, swap tables, and bring access to the db
back up. The downtime is measured in seconds.

If one table one time is all you need that's fine but you can also use
it to setup a replica of the bloated machine, sans bloat, and
switchover the whole db operation to another machine / db.


R: R: Vacuum full: alternatives?

От
Job
Дата:
Hello,
very interesting comments and contributions, thank you.

>I've just tested pg_bulkload with the default settings, and it
>definitely isn't using the fsm to re-use freed space in the table.  If
>they use WRITER = BUFFERED it would, though.

So with WRITER = BUFFERED it should be slower but free-marked space should be reused again?

Thank you!
Francesco


________________________________________
Da: Jeff Janes [jeff.janes@gmail.com]
Inviato: lunedì 20 giugno 2016 17.51
A: Martín Marqués
Cc: Melvin Davidson; Rakesh Kumar; Job; pgsql-general@postgresql.org
Oggetto: Re: R: [GENERAL] Vacuum full: alternatives?

On Mon, Jun 20, 2016 at 7:23 AM, Martín Marqués <martin@2ndquadrant.com> wrote:
> El 20/06/16 a las 09:50, Melvin Davidson escribió:
>>
>>
>>>but it won't let it grow too (or am I missing something).
>>
>> Yes, you are missing something. By partioning and {Vacuum Full only the
>> table with data no longer needed}, the rest of the data remains
>> available to the users
>> AND space is reclaimed by the O/S, so it's the best of both worlds.
>
> That's not entirely true. Think about a SELECT which has to scan all
> child tables.
>

Yes, for the partitioning to be a good option, you would probably have
to arrange it such that you can prove that all tuples in a given
partition are eligible for deletion (or have already been deleted),
and then either truncate or dis-inherit the partition.  That still
requires a stringent lock, but it is only held for a very short time.

> Your are also adding another layer of complexity to the system.

I think that using pg_bulkload adds more complexity to the system than
partitioning would.  I wonder if they really need to use that, or if
they just picked it over COPY because it sounded like a free lunch.



Cheers,

Jeff

R: Vacuum full: alternatives?

От
Job
Дата:
Excellent Scott!
Thank you!
Francesco

________________________________________
Da: Scott Marlowe [scott.marlowe@gmail.com]
Inviato: martedì 21 giugno 2016 2.06
A: Job
Cc: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Vacuum full: alternatives?

On Mon, Jun 20, 2016 at 3:18 AM, Job <Job@colliniconsulting.it> wrote:
>
> Hello,
>
> we have a table with an heavy traffic of pg_bulkload and delete of records.
> The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.
>
> We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
> But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.
>
> We do not delete everything at one (in this case the truncate woudl resolve the problem).
>
> The autovacuum is not able (same for normal vacuum) to free the spaces.
>
> Are there some suggestions or another way to manage this?

First off, from your comments I'm not sure you really get postgresql's
way of freeing space and reusing it via autovacuum. Basically
postgresql, autovacuum process marks space as free, and the backend
writes new data (inserts or updates) into the free space. You
eventually reach equilibrium of a sort when the vacuum is freeing up
space as quickly as it's being consumed, or faster. The problem occurs
when vacuum can't keep up with your delete / write and update rate
combined. If this is happening you need to:

A: Make sure your IO Subsystem is fast enough to handle BOTH your
update rate AND your vacuuming needed to keep up, You're better off
with a machine that can do 15,000 transactions per second running a
load of 1,000 than trying to handle it with a machine that can do
1,500 tps etc. Sizing the hardware is a whole other conversation.

AND

B: Make your autovacuum aggressive enough to NOT fall behind.

It's important to remember that autovacuum was built and designed in a
time when most databases lived on spinning media. It's designed to not
overload spinning discs with too much random IO. A super fast RAID-10
array from that time period could do 200 to 1,000 transactions per
second and that only with a top notch RAID controller etc. Regular
spinning discs have a maximum random write ops per second that measure
in the 100 per second range.

My laptop with no pgsql optimizations, can do 850 tps on it's SSD. A
server with 10 SSDs in RAID-5 can do 15,000 tps.  If you have a fast
IO subsystem and wish to utilize it with pgsql you're going to have to
examine whether or not autovacuum with default settings is fast enough
to keep up. Remmeber, Auto-vacuum, by design, is slow and tries not to
get in the way. It's fast enough for most mundane uses, but can't keep
up with a fast machine running hard. The default settings for
autovacuum to look at here are first these two.:

autovacuum_vacuum_cost_delay = 20ms
autovacuum_vacuum_cost_limit = 200

They govern how hard autovac works. By default autovac doesn't work
hard. Making it work too hard for a given machine can cause system
performance issues. I.e. it "gets in the way".

Lowering cost_delay is usually enough. As you approach 1ms autovac
starts to use a lot more bandwidth. I find that even on pretty fast
machines that are routinely doing 1,000 writes per second or more, 3ms
is fast enough to keep up with a cost limit of 200. 5ms is a good
compromise without getting too aggressive.

In contrast to autovacuum, REGULAR vacuum, by default, runs at full
throttle. It hits your db server hard, performance-wise. It has  zero
cost delay, so it works very hard. If you run it midday on a hard
working server you will almost certainly see the performance drop. The
difference between regular vacuum with a delay time of 0 and autovac
with a delay of 20ms is huge.

These settings become important if you have a LOT of tables or dbs.
Otherwise they're probably fine.

autovacuum_max_workers =3 # Adjust this last, unless you have
thousands of tables or dbs.
autovacuum_naptime = 1 min # How long to wait before checking the next
db. Default is usually fine unless you have a lot of dbs.

These settings tell autovacuum when to kick in. Keeping these low
enough to keep autovac busy is a good idea too:

autovacuum_vacuum_scale_factor
autovacuum_vacuum_threshold
autovacuum_analyze_scale_factor
autovacuum_analyze_threshold

I tend to go for threshold, which is an absolute number of rows
changed before autovac kicks off. Scale factor can be dangerous
because what seems small at the beginning, gets big fast. If it's 0.1
then that's 10%. 10% of 1,000 is 100 rows, but 10% of 1,000,000 is
100,000, which is a LOT of rows to ignore until you have more than
that that need vacuuming. Setting it to something like 100 or 1,000
will keep your db from growing hundreds of thousands ofr dead tuples
in a big table.

Either way you need to make sure your autovacuum is aggressive enough
to keep up with your db's throughput.

Checking for bloat. You can see what parts of your db are getting too
big.  First, go here:
https://www.keithf4.com/checking-for-postgresql-bloat/

The script there will let you check all your tables AND indexes for
bloat. This will let you know if you've got a simple space problem or
a vacuuming problem.

Assuming you DO have bloating, one of the first things you can do is
rebuild all the indexes except for PK ones with new indexes then drop
the old ones. On some tables this saves a LOT of space. Since you can
"create index concurrently ... " this is a non-blocking operation.

You can get a list of indexes for a table:

select indexdef from pg_indexes where tablename='pgbench_accounts';

Then build "create index concurrently ..." statements for each one.

Assuming the table is bloated and you HAVE to recover space for normal
operation (let's say you've got 80MB of data in a 200GB table etc)...
I'm gonna head in a different direction here. Slony.

Slony can subscribe your table, or your whole db, up to you, to a new
one, either on a different machine or on the same machine. It's
actually pretty easy to set it up and subscribe one table, get it
caught up, drop access to db, swap tables, and bring access to the db
back up. The downtime is measured in seconds.

If one table one time is all you need that's fine but you can also use
it to setup a replica of the bloated machine, sans bloat, and
switchover the whole db operation to another machine / db.