Re: pgsql: Improve gist XLOG code to follow the coding

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: pgsql-committers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: pgsql: Improve gist XLOG code to follow the coding
Date: 2006-03-31 15:35:26
Message-ID: 8471.1143819326@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> Here I am. Oleg now is in expedition to solar eclipse (should return soon).

Cool ... I hope to see one someday ...

> * gistxlog.c:gistContinueInsert
> /*
> * XXX fall out to avoid making LOG message at bottom of routine.
> * I think the logic for when to emit that message is all wrong...
> */
> At line 543 itup vector was filled by invalid tuple, so newly filled root
> will contains only invalid tuples. Hence, reindex/vacuum full is required.

Hmm. That probably needs to be redesigned then. The problem is that as
the thing is written, you *must* reinit the buffer here --- the code I
removed that checked the page LSN was unsafe. If you want to depend on
the prior page contents during replay, you have to give XLogInsert the
opportunity to save the whole page when the xlog entry is made.

> Sorry, I missed something, what is torn-page problem?

The torn-page problem is where a page only gets partially written to
disk during a power failure. When we come back up, the LSN might look
like the page is up to date, when in reality the data is
old/inconsistent. We deal with this by having the first WAL record that
touches a given page after a checkpoint contain a copy of the whole
page, and then we can rewrite the page from that copy instead of trying
to replay the incremental update. I've fixed the code to do this
correctly for the gistRedoPageUpdateRecord case, but I'm not sure what
we do for gistContinueInsert.

BTW, there's another problem with gistContinueInsert: it's not making
WAL entries for the actions it takes. It needs to do so --- consider
for example the PITR case, where the log will be shipped somewhere else
and then followed verbatim. So you've got to add WAL entries for any
recovery actions you take after reading the existing WAL entries.

The critical-section problem is not related to that; it's about being
sure that we don't make any changes to shared buffers that don't have
an associated WAL record. (See "WAL dirty-buffer management bug" thread
on -hackers.) I fixed the gist code so that it holds a critical section
throughout doing an insertion or vacuum operation, but that's really far
too wide --- there's too much chance of an elog() somewhere in the
process. In particular, we really don't want to call any user-defined
datatype functions inside the critical section. So the process needs to
be to compute all the required changes *but not make them*, then start
the critical section, then apply the changes and make the WAL record.
This seemed like a large enough change that I figured I'd better talk
with you about it. One idea I had was to generate the WAL record as the
output of the decision-making code, and then the critical section would
use the WAL record as its guide to what to do to the buffers.

BTW, I'm confused about gistadjscans: seems to me that's either broken
or unnecessary. Since we no longer hold an exclusive lock on a gist
index while inserting into it, the comment at gistscan.c line 33 is
certainly wrong now. If the adjustment is still necessary then some
other mechanism needs to be devised to get the information to other
backends. If it's not necessary then I think we should take it out.
I'm not totally familiar with the gist logic, but it looks to me like
gistadjscans is only called during an insert into a non-leaf page,
which makes me think it might be unnecessary --- do gist index scans
ever stop on non-leaf pages?

regards, tom lane

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Teodor Sigaev 2006-03-31 16:27:40 Re: pgsql: Improve gist XLOG code to follow the coding
Previous Message Teodor Sigaev 2006-03-31 09:35:23 Re: pgsql: Improve gist XLOG code to follow the coding