Re: [PROPOSAL] Use SnapshotAny in get_actual_variable_range

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dmitriy Sarafannikov <dsarafannikov(at)yandex(dot)ru>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Borodin Vladimir <root(at)simply(dot)name>, Хомик Кирилл <khomikki(at)yandex-team(dot)ru>
Subject: Re: [PROPOSAL] Use SnapshotAny in get_actual_variable_range
Date: 2017-09-07 21:30:57
Message-ID: 25744.1504819857@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dmitriy Sarafannikov <dsarafannikov(at)yandex(dot)ru> writes:
> [ snapshot_non_vacuumable_v3.patch ]

Starting to look at this. I think that the business with choosing
RecentGlobalXmin vs. RecentGlobalDataXmin is just wrong. What we
want to do is accept any tuple that would not be considered killable
by heap_hot_search_buffer, and that looks only at RecentGlobalXmin.

It's possible that it'd be sensible for heap_hot_search_buffer to
use RecentGlobalDataXmin when dealing with a non-catalog table,
allowing it to consider more tuples killable. But I'm not sure;
it looks like the decision which to use costs some effort, which
we probably don't want to be spending in heap_hot_search_buffer.
In any case that would be a different patch topic.

With the patch as it stands, if we have a user-data tuple that is dead
more recently than RecentGlobalXmin but not more recently than
RecentGlobalDataXmin, the snapshot will reject it so the planner's
indexscan will keep searching. But heap_hot_search_buffer won't
tell the index AM to kill the tuple, so the index entry stays marked
as live, which means the next get_actual_variable_range call will have
to scan past it again. So we don't get the hoped-for benefit with
tuple deaths in that range. I do not know whether this effect might
explain your finding that the patch didn't entirely fix your performance
problem.

In short I think we should just set up the threshold as RecentGlobalXmin.
However, this seems like a decision that might be pretty specific to
get_actual_variable_range's use-case, so I'm inclined to have the
threshold be chosen by get_actual_variable_range itself not hard-wired
into InitNonVacuumableSnapshot.

Another point is that I think it might be possible for RecentGlobalXmin
to advance after get_actual_variable_range looks at it (for example,
if we have to take a new catalog snapshot during index scan setup).
This creates the possibility that the snapshot accepts tuples that
heap_hot_search_buffer would consider dead. That seems fairly harmless
though, so I'm inclined not to fuss about it. We could arrange for
HeapTupleSatisfiesNonVacuumable to use RecentGlobalXmin directly,
but that seems like it makes it too special-purpose.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2017-09-07 21:34:41 Re: GnuTLS support
Previous Message Tomas Vondra 2017-09-07 21:08:14 Re: Remove 1MB size limit in tsvector