Re: Non-pausing table scan on 9.6 replica?

From: Sameer Kumar <sameer(dot)kumar(at)ashnik(dot)com>
To: Mark Fletcher <markf(at)corp(dot)groups(dot)io>
Cc: Andreas Kretschmer <andreas(at)a-kretschmer(dot)de>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Non-pausing table scan on 9.6 replica?
Date: 2019-03-06 05:56:56
Message-ID: CADp-Sm7ijpiBf3pKPAk3dZGZj__kHvkhG=0=xDLebDAnxqDQEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Mar 6, 2019 at 1:41 PM Mark Fletcher <markf(at)corp(dot)groups(dot)io> wrote:

> Thank you for responding to my email.
>
> On Tue, Mar 5, 2019 at 9:20 PM Andreas Kretschmer <andreas(at)a-kretschmer(dot)de>
> wrote:
>
>>
>> have you set ```max_standby_streaming_delay``? The default is 30
>> seconds, which means that this will be the maximum time allowed for a
>> replication lag caused by a conflicting query.
>>
>
> Yes, we've bumped that up a lot.
>

Did you encounter cases where the queries were canceled on standby because
of streaming replication? Did you face any kind of recovery conflict
because of which you bumped this up?

> You can use ``hot_standby_feedback = on``, but the downside will be more
>> bloat on the tables.
>>
>> I'm not sure I understand. I'm not worried about the query being
> cancelled on the replica. max_standby_streaming_delay fixes that for us.
> I'm worried about the streaming replication being paused while this table
> scan is running. If the table scan takes several minutes, then the replica
> becomes several minutes out of sync with the master. I'd prefer that not to
> happen and I'm wondering if there's a way to do that.
>

Typically all operations on Master will be replicated as soon as possible
and standby will apply them as it receives them. There could be situations
when the operation on master cannot be replicated because it would conflict
with the running query e.g. you fired a select (a long-running report) and
in the meantime someone updated a row. Now because of MVCC, if your SELECT
query was running on master it would still give consistent result. It will
also be the case with standby, standby will also maintain older version and
select would get consistent result.
Then VACUUM gets fired by virtue of acuto-vacuum. If a SELECT was running
on master, the auto-vacuum will factor in the older version of rows which
currently running SELECT queries would require and will not remove them.
But master has no idea about what is happening on standby and will remove
rows which are used/required by a SELECT on standby. When this VACUUM gets
replicated to standby it will realize that there is a conflict in on going
query and replicated VACUUM operation so it will be stalled (by the virtue
of max_standby_streaming_delay). Hence you end up with replication delay. A
much smarter thing would be if Master knew which row versions should not be
removed (because they are in use on standby), that is what
"hot_standby_feedback offers to achieve.

But conflict on standby could also happen because of other operations e.g.
a query is reading from a table on standby and you did a DDL operation on
the master that modifies the table. In that case an exclusive lock will be
acquired on master, but when that gets replicated it will result in
conflict and standby will stall replication.

If your priority is to have replica as close as possible to master then
disable max_standby_streaming_delay

> Thanks,
> Mark
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Etsuro Fujita 2019-03-06 06:10:38 Re: Update does not move row across foreign partitions in v11
Previous Message Andreas Kretschmer 2019-03-06 05:47:49 Re: Non-pausing table scan on 9.6 replica?