Re: WIP: Access method extendability

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Access method extendability
Date: 2016-03-29 16:25:43
Message-ID: 20160329162543.GA854956@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Petr Jelinek wrote:

> And here it is. It's not perfect but it's better (I am not native speaker
> either). It's same as v12, just changed comments somewhat.

I think this can still be improved a bit more, in particular this large
comment, which I edit inline for expediency.

> + /*-------------------------------------------------------------------------
> + * API for construction of generic xlog records
> + *
> + * This API allows user to construct generic xlog records which describe
> + * difference between pages in a generic way. This is useful for
> + * extensions which provide custom access methods because they can't
> + * register their own WAL redo routines.
> + *
> + * Each record must be constructed by following these steps:
> + * 1) GenericXLogStart(relation) - start construction of a generic xlog
> + * record for the given relation.
> + * 2) GenericXLogRegister(buffer, isNew) - register one or more buffers
> + * for the record. This function returns a copy of the page
> + * image where modifications can be performed. The second argument
> + * indicates if the block is new (i.e. a full page image should be taken).
> + * 3) Apply modification of page images obtained in the previous step.
> + * 4) GenericXLogFinish() - finish construction of generic xlog record.
> + *
> + * The xlog record construction can be canceled at any step by calling
> + * GenericXLogAbort(). All changes made to page images copies will be
> + * discarded.
> + *
> + * Please, note the following points when constructing generic xlog records.
> + * - No direct modifications of page images are allowed! All modifications
> + * must be done in the copies returned by GenericXLogRegister(). In other
> + * words the code which makes generic xlog records must never call
> + * BufferGetPage().
> + * - Registrations of buffers (step 2) and modifications of page images
> + * (step 3) can be mixed in any sequence. The only restriction is that
> + * you can only modify page image after registration of corresponding
> + * buffer.
> + * - After registration, the buffer also can be unregistered by calling
> + * GenericXLogUnregister(buffer). In this case the changes made in
> + * that particular page image copy will be discarded.
> + * - Generic xlog assumes that pages are using standard layout, i.e., all
> + * data between pd_lower and pd_upper will be discarded.
> + * - Maximum number of buffers simultaneously registered for a generic xlog
> + * record is MAX_GENERIC_XLOG_PAGES. An error will be thrown if this limit
> + * is exceeded.
> + * - Since you modify copies of page images, GenericXLogStart() doesn't
> + * start a critical section. Thus, you can do memory allocation, error
> + * throwing etc between GenericXLogStart() and GenericXLogFinish().
> + * The actual critical section is present inside GenericXLogFinish().
> + * - GenericXLogFinish() takes care of marking buffers dirty and setting their
> + * LSNs. You don't need to do this explicitly.
> + * - For unlogged relations, everything works the same except there is no
> + * WAL record produced. Thus, you typically don't need to do any explicit
> + * checks for unlogged relations.
> + * - If registered buffer isn't new, generic xlog record contains delta
> + * between old and new page images. This delta is produced by per byte
> + * comparison. This current delta mechanism is not effective for data shifts
> + * inside the page and may be improved in the future.
> + * - Generic xlog redo function will acquire exclusive locks on buffers
> + * in the same order they were registered. After redo of all changes,
> + * the locks will be released in the same order.
> + *
> + *
> + * Internally, delta between pages consists of set of fragments. Each
> + * fragment represents changes made in given region of page. A fragment is
> + * described as follows:
> + *
> + * - offset of page region (OffsetNumber)
> + * - length of page region (OffsetNumber)
> + * - data - the data to place into described region ('length' number of bytes)
> + *
> + * Unchanged regions of page are not represented in the delta. As a result,
> + * the delta can be more compact than full page image. But if the unchanged region
> + * of the page is less than fragment header (offset and length) the delta
> + * would be bigger than the full page image. For this reason we break into fragments
> + * only if the unchanged region is bigger than MATCH_THRESHOLD.
> + *
> + * The worst case for delta size is when we didn't find any unchanged region
> + * in the page. Then size of delta would be size of page plus size of fragment
> + * header.
> + */
> + #define FRAGMENT_HEADER_SIZE (2 * sizeof(OffsetNumber))
> + #define MATCH_THRESHOLD FRAGMENT_HEADER_SIZE
> + #define MAX_DELTA_SIZE BLCKSZ + FRAGMENT_HEADER_SIZE

> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("GenericXLogUnregister: registered buffer not found")));
> + }

These error messages do not conform to our style guideline. This
particular seems like an internal error, perhaps it should be reported
with elog()? Not really sure about this. As for most of the others I
saw, they all have the calling function name in the errmsg() which we
don't do.

> diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
> new file mode 100644
> index 13af485..262deb2
> *** a/src/backend/replication/logical/decode.c
> --- b/src/backend/replication/logical/decode.c
> *************** LogicalDecodingProcessRecord(LogicalDeco
> *** 143,148 ****
> --- 143,149 ----
> case RM_BRIN_ID:
> case RM_COMMIT_TS_ID:
> case RM_REPLORIGIN_ID:
> + case RM_GENERIC_ID:
> /* just deal with xid, and done */
> ReorderBufferProcessXid(ctx->reorder, XLogRecGetXid(record),
> buf.origptr);

I'm really unsure about this one. Here we're saying that logical
decoding won't deal at all with stuff that was done using generic WAL
records. I think that's fine for index AMs, since logical decoding
doesn't deal with indexes anyway, but if somebody tries to implement
something that's not an index using generic WAL records, it will fail
miserably. Now maybe that's a restriction we're okay with, but if
that's so we should say that explicitely.

I think this patch should be documented in the WAL chapter.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-03-29 16:28:40 Re: [PROPOSAL] Client Log Output Filtering
Previous Message Tom Lane 2016-03-29 16:24:04 Re: More stable query plans via more predictable column statistics