Обсуждение: Non-pausing table scan on 9.6 replica?

Поиск
Список
Период
Сортировка

Non-pausing table scan on 9.6 replica?

От
Mark Fletcher
Дата:
Hi All,

On a 9.6 streaming replica, we do table scans for stats and other things. During these scans, the replication is paused (the 'recovering' postgres process has 'waiting' appended to it). We're not using transactions with these scans. Is there anything we can do to prevent the pausing?

Thanks,
Mark

Re: Non-pausing table scan on 9.6 replica?

От
Andreas Kretschmer
Дата:

Am 06.03.19 um 01:26 schrieb Mark Fletcher:
> Hi All,
>
> On a 9.6 streaming replica, we do table scans for stats and other 
> things. During these scans, the replication is paused (the 
> 'recovering' postgres process has 'waiting' appended to it). We're not 
> using transactions with these scans. Is there anything we can do to 
> prevent the pausing?
>
>


have you set ```max_standby_streaming_delay``? The default is 30 
seconds, which means that this will be the maximum time allowed for a 
replication lag caused by a conflicting query.
You can use ``hot_standby_feedback = on``, but the downside will be more 
bloat on the tables.


Regards, Andreas

-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



Re: Non-pausing table scan on 9.6 replica?

От
Mark Fletcher
Дата:
Thank you for responding to my email.

On Tue, Mar 5, 2019 at 9:20 PM Andreas Kretschmer <andreas@a-kretschmer.de> wrote:

have you set ```max_standby_streaming_delay``? The default is 30
seconds, which means that this will be the maximum time allowed for a
replication lag caused by a conflicting query.

Yes, we've bumped that up a lot.
 
You can use ``hot_standby_feedback = on``, but the downside will be more
bloat on the tables.

I'm not sure I understand. I'm not worried about the query being cancelled on the replica. max_standby_streaming_delay fixes that for us. I'm worried about the streaming replication being paused while this table scan is running. If the table scan takes several minutes, then the replica becomes several minutes out of sync with the master. I'd prefer that not to happen and I'm wondering if there's a way to do that.

Thanks,
Mark
 

Re: Non-pausing table scan on 9.6 replica?

От
Andreas Kretschmer
Дата:

Am 06.03.19 um 06:41 schrieb Mark Fletcher:
> Thank you for responding to my email.
>
> On Tue, Mar 5, 2019 at 9:20 PM Andreas Kretschmer 
> <andreas@a-kretschmer.de <mailto:andreas@a-kretschmer.de>> wrote:
>
>
>     have you set ```max_standby_streaming_delay``? The default is 30
>     seconds, which means that this will be the maximum time allowed for a
>     replication lag caused by a conflicting query.
>
>
> Yes, we've bumped that up a lot.

i tought so.


>     You can use ``hot_standby_feedback = on``, but the downside will
>     be more
>     bloat on the tables.
>
> I'm not sure I understand. I'm not worried about the query being 
> cancelled on the replica. max_standby_streaming_delay fixes that for 
> us. I'm worried about the streaming replication being paused while 
> this table scan is running. If the table scan takes several minutes, 
> then the replica becomes several minutes out of sync with the master. 
> I'd prefer that not to happen and I'm wondering if there's a way to do 
> that.
>

You can choose. max_standby_streaming_delay with delay in the streaming 
(hence the name) or hot_standby_feedback, with the downside of possible 
more bloat.


Regards, Andreas


-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



Re: Non-pausing table scan on 9.6 replica?

От
Sameer Kumar
Дата:


On Wed, Mar 6, 2019 at 1:41 PM Mark Fletcher <markf@corp.groups.io> wrote:
Thank you for responding to my email.

On Tue, Mar 5, 2019 at 9:20 PM Andreas Kretschmer <andreas@a-kretschmer.de> wrote:

have you set ```max_standby_streaming_delay``? The default is 30
seconds, which means that this will be the maximum time allowed for a
replication lag caused by a conflicting query.

Yes, we've bumped that up a lot.

Did you encounter cases where the queries were canceled on standby because of streaming replication? Did you face any kind of recovery conflict because of which you bumped this up?

 
You can use ``hot_standby_feedback = on``, but the downside will be more
bloat on the tables.

I'm not sure I understand. I'm not worried about the query being cancelled on the replica. max_standby_streaming_delay fixes that for us. I'm worried about the streaming replication being paused while this table scan is running. If the table scan takes several minutes, then the replica becomes several minutes out of sync with the master. I'd prefer that not to happen and I'm wondering if there's a way to do that.

Typically all operations on Master will be replicated as soon as possible and standby will apply them as it receives them. There could be situations when the operation on master cannot be replicated because it would conflict with the running query e.g. you fired a select (a long-running report) and in the meantime someone updated a row. Now because of MVCC, if your SELECT query was running on master it would still give consistent result. It will also be the case with standby, standby will also maintain older version and select would get consistent result.
Then VACUUM gets fired by virtue of acuto-vacuum. If a SELECT was running on master, the auto-vacuum will factor in the older version of rows which currently running SELECT queries would require and will not remove them.
But master has no idea about what is happening on standby and will remove rows which are used/required by a SELECT on standby. When this VACUUM gets replicated to standby it will realize that there is a conflict in on going query and replicated VACUUM operation so it will be stalled (by the virtue of max_standby_streaming_delay). Hence you end up with replication delay. A much smarter thing would be if Master knew which row versions should not be removed (because they are in use on standby), that is what "hot_standby_feedback offers to achieve.

But conflict on standby could also happen because of other operations e.g. a query is reading from a table on standby and you did a DDL operation on the master that modifies the table. In that case an exclusive lock will be acquired on master, but when that gets replicated it will result in conflict and standby will stall replication.

If your priority is to have replica as close as possible to master then disable max_standby_streaming_delay



Thanks,
Mark
 

Re: Non-pausing table scan on 9.6 replica?

От
Mark Fletcher
Дата:
Andreas, Sameer,

Thank you for replying. I did not understand the purpose of hot_standby_feedback, and your explanations helped. I turned it on, and the pausing stopped.

Thanks,
Mark

Re: Non-pausing table scan on 9.6 replica?

От
Sameer Kumar
Дата:


On Thu, Mar 7, 2019 at 5:16 AM Mark Fletcher <markf@corp.groups.io> wrote:
Andreas, Sameer,

Thank you for replying. I did not understand the purpose of hot_standby_feedback, and your explanations helped. I turned it on, and the pausing stopped.

Great!
But do bear in mind that this is also not without its own implications. As pointed out by Andreas, this could lead to bloats.

Which version of PostgreSQL are you using? Sorry if I have missed that information.


 Thanks,
Mark