Re: Set visibility map bit after HOT prune

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Set visibility map bit after HOT prune
Date: 2013-01-08 06:17:17
Message-ID: CABOikdPpMWT4tDyEJ4j9XM1_rM=sL-L+Ds6hWOG4WcCbSMxWBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry for a long pause on this thread. A new arrival at home kept me
occupied all the time.

This thread saw a lot of ideas and suggestions from different people. I
don't think we had an agreement one way or the other on any of them, but
let me summarize the discussion for archival and taking further action if
deemed necessary.

*Suggestion 1: Set the visibility map bit after HOT prune
*
The rational for this idea is to improve the chances of an index-only scan
happening after HOT prune. This is especially interesting when the a table
gets random updates or deletes, each of which will clear the VM bit. The
table may not still come up for vacuum, either because the number of
updates/deletes are not over the vac threshold or because subsequent HOT
prune did not leave any work for vacuum. The only place where we set the VM
bits again is during vacuum. So this idea would add another path where VM
bits are set. This would also help vacuums to avoid visiting those heap
pages that don't have any work to be done.

The main objection to this idea is that this may result in too much
flip-flopping of the bit, especially if the HOT prune is to be followed by
an UPDATE to the page. This is a valid concern. But the way HOT prune works
today, it has no linkage to the future UPDATE operations other than the
fact that it frees up space for future UPDATE operations. But the prune can
happen even in a select code path. Suggestion 2 below is about changing
this behavior, but my point is to consider 1 unless and until we do 2. Tom
and Simon opposed saying we need to take a holistic view. Another concern
with this idea is that VM bit set operation now generates WAL and repeated
setting/clearing of the bit may increase WAL activity. I suggested to piggy
back the VM bit set logging with the HOT prune WAL log. Robert raised some
doubts regarding increased full-page writes if VM set LSN is recorded in
the heap page LSN. I am not sure if that applies if we piggy back the
operation though because HOT prune WAL would anyway record LSN in the heap
page.

If we do this, we should also consider updating FSM after prune because
vacuum may not scan this page at all.

*Suggestion 2: Move HOT prune logic somewhere else
*
Tom/Simon suggested that we ought to consider moving HOT prune to some
other code path. When we implemented HOT a few years back, we wanted to
make it as less invasive as possible. But now the code has proven
stability, we can experiment a few more things. Especially, we would like
to prune only if the page is going to receive an UPDATE soon. Otherwise,
pruning may unnecessarily add overheads to a simple read-only query and the
space freed up by prune may not even be used soon/ever. Tom suggested that
we can teach planner/executor to distinguish between a scan on a normal
relation vs result relation. I'd some concerns that even if we have such
mechanism, it may not be enough because a scan does not guarantee that the
tuple will be finally updated because it may fail qualification etc.

Simon has strong views regarding burdening SELECTs with maintenance work,
but Robert and I are not convinced that its necessarily a bad idea to let
SELECTs do a little extra work which can help to keep the overall state
healthy. But in general, it might be a good idea to try such approaches
and see if we can extract more out of the system. Suggestion 5 and 6 also
popped up to handle this problem in a slightly different way.

*Suggestion 3: Don't clear visibility map bit after HOT update
*
I proposed this during the course of discussion and Andreas F
liked/supported the idea. This could be useful when most/all updates are
HOT updates. So the page does not need any work during vacuum (assuming HOT
prune will take care of it) and index-only scans still work because the
index pointers will be pointing to a valid HOT chain. Tom/Simon didn't
quite like it because they were worried that this will change the meaning
on the VM. I (and I think even Andreas) don't think that way. Of course,
there are some concerns because this will break the use of PD_ALL_VISIBLE
flag for avoiding MVCC checks during heap scans. There are couple of
suggestions to fix that like having another page level bit to differentiate
these two states. Doing that will help us skip MVCC checks even if there
are one or more DEAD line pointers in the page. We should also run some
performance tests to see how much benefit is really served by skipping MVCC
checks in heap scans. We can weigh that against the benefit of keeping the
VM bit set.

*Suggestion 4: Freeze tuples during HOT prune*
*
*
I suggested that we can also freeze tuples during HOT prune. The rational
for doing it this way is to remove unnecessary work from vacuum by
piggy-backing the freeze logging in the HOT prune WAL record. Today vacuum
will generate additional WAL and dirty the buffers again just to freeze the
tuples. There are couple of objections to this idea. One is pushes
background work into foreground operation. My explanation to that is the
additional work is not much since we are already doing a lot other things
in HOT prune and the extra work is justified because it will save us much
more extra work later on. Another objection to the idea is that we might
freeze a tuple which gets unfreezed again because say it gets updated or
deleted. Also, we may lose forensic information by overwriting the xmin too
soon. A possible solution (and Robert mentioned that it was suggested even
before) for the latter issue is to have a separate tuple header flag to
designate frozen tuples. This might be good in any case since we then don't
lose forensic information ever.
*
Suggestion 5: Add a cost model so that a transaction does only limited
maintenance activity such as hint bit setting and HOT prune*
*
*
Simon liked the idea and suggested having a GUC like
transaction_cleanup_cost. One can set this at a session level and the
amount of extra work that is done by any transaction will be governed by
its value. The default would be the current behavior and we might place
some lower limit so that every transaction at least contribute that much
toward maintenance activities. I did not see any objections to the idea.
*
Suggestion 6: Move maintenance activity to background workers
*
The idea here is to leave the maintenance activities to the background
threads. For example, when a backend finds that a heap page needs pruning,
it can send that information to the background thread instead of doing that
itself. Once 9.3 has background thread infrastructure, this might be worth
exploring further.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message james 2013-01-08 06:45:50 Re: json api WIP patch
Previous Message Bruce Momjian 2013-01-08 03:51:21 pg_upgrade with parallel tablespace copying