Re: Non-pausing table scan on 9.6 replica?

Поиск
Список
Период
Сортировка
От Sameer Kumar
Тема Re: Non-pausing table scan on 9.6 replica?
Дата
Msg-id CADp-Sm7ijpiBf3pKPAk3dZGZj__kHvkhG=0=xDLebDAnxqDQEg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Non-pausing table scan on 9.6 replica?  (Mark Fletcher <markf@corp.groups.io>)
Ответы Re: Non-pausing table scan on 9.6 replica?  (Mark Fletcher <markf@corp.groups.io>)
Список pgsql-general


On Wed, Mar 6, 2019 at 1:41 PM Mark Fletcher <markf@corp.groups.io> wrote:
Thank you for responding to my email.

On Tue, Mar 5, 2019 at 9:20 PM Andreas Kretschmer <andreas@a-kretschmer.de> wrote:

have you set ```max_standby_streaming_delay``? The default is 30
seconds, which means that this will be the maximum time allowed for a
replication lag caused by a conflicting query.

Yes, we've bumped that up a lot.

Did you encounter cases where the queries were canceled on standby because of streaming replication? Did you face any kind of recovery conflict because of which you bumped this up?

 
You can use ``hot_standby_feedback = on``, but the downside will be more
bloat on the tables.

I'm not sure I understand. I'm not worried about the query being cancelled on the replica. max_standby_streaming_delay fixes that for us. I'm worried about the streaming replication being paused while this table scan is running. If the table scan takes several minutes, then the replica becomes several minutes out of sync with the master. I'd prefer that not to happen and I'm wondering if there's a way to do that.

Typically all operations on Master will be replicated as soon as possible and standby will apply them as it receives them. There could be situations when the operation on master cannot be replicated because it would conflict with the running query e.g. you fired a select (a long-running report) and in the meantime someone updated a row. Now because of MVCC, if your SELECT query was running on master it would still give consistent result. It will also be the case with standby, standby will also maintain older version and select would get consistent result.
Then VACUUM gets fired by virtue of acuto-vacuum. If a SELECT was running on master, the auto-vacuum will factor in the older version of rows which currently running SELECT queries would require and will not remove them.
But master has no idea about what is happening on standby and will remove rows which are used/required by a SELECT on standby. When this VACUUM gets replicated to standby it will realize that there is a conflict in on going query and replicated VACUUM operation so it will be stalled (by the virtue of max_standby_streaming_delay). Hence you end up with replication delay. A much smarter thing would be if Master knew which row versions should not be removed (because they are in use on standby), that is what "hot_standby_feedback offers to achieve.

But conflict on standby could also happen because of other operations e.g. a query is reading from a table on standby and you did a DDL operation on the master that modifies the table. In that case an exclusive lock will be acquired on master, but when that gets replicated it will result in conflict and standby will stall replication.

If your priority is to have replica as close as possible to master then disable max_standby_streaming_delay



Thanks,
Mark
 

В списке pgsql-general по дате отправления:

Предыдущее
От: Andreas Kretschmer
Дата:
Сообщение: Re: Non-pausing table scan on 9.6 replica?
Следующее
От: Rob Sargent
Дата:
Сообщение: query has no destination for result data