Re: Proposal : For Auto-Prewarm.

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal : For Auto-Prewarm.
Date: 2017-05-24 12:11:16
Message-ID: CA+TgmoZ9OZ5M0wAfq27z_GjfvZ8zqJA4Nv5r-j_Nax+mFj7OBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 24, 2017 at 6:28 AM, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com> wrote:
> On Tue, May 23, 2017 at 7:06 PM, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com> wrote:
>> Thanks, Andres,
>>
>> I have tried to fix all of your comments.
>
> There was a typo issue in previous patch 07 where instead of == I have
> used !=. And, a mistake in comments. I have corrected same now.

+/*
+ * autoprewarm :
+ *
+ * What is it?
+ * ===========
+ * A bgworker which automatically records information about blocks which were
+ * present in buffer pool before server shutdown and then prewarm the buffer
+ * pool upon server restart with those blocks.
+ *
+ * How does it work?
+ * =================
+ * When the shared library "pg_prewarm" is preloaded, a
+ * bgworker "autoprewarm" is launched immediately after the server has reached
+ * consistent state. The bgworker will start loading blocks recorded in the
+ * format BlockInfoRecord
+ * <<DatabaseId,TableSpaceId,RelationId,Forknum,BlockNum>> in
+ * $PGDATA/AUTOPREWARM_FILE, until there is no free buffer left in the buffer
+ * pool. This way we do not replace any new blocks which were loaded either by
+ * the recovery process or the querying clients.
+ *
+ * Once the "autoprewarm" bgworker has completed its prewarm task, it will
+ * start a new task to periodically dump the BlockInfoRecords related to blocks
+ * which are currently in shared buffer pool. Upon next server restart, the
+ * bgworker will prewarm the buffer pool by loading those blocks. The GUC
+ * pg_prewarm.dump_interval will control the dumping activity of the bgworker.
+ */

Make this part of the file header comment. Also, add an enabling GUC.
The default can be true, but it should be possible to preload the
library so that the SQL functions are available without a dynamic
library load without requiring you to get the auto-prewarm behavior.
I suggest pg_prewarm.autoprewarm = true / false.

Your SigtermHandler and SighupHandler routines are still capitalized
in a way that's not very consistent with what we do elsewhere. I
think all of our other signal handlers have names_like_this() not
NamesLikeThis().

+ * ============== types and variables used by autoprewam =============

Spelling.

+ * Meta-data of each persistent block which is dumped and used to load.

Metadata

+typedef struct BlockInfoRecord
+{
+ Oid database; /* database */
+ Oid spcNode; /* tablespace */
+ Oid filenode; /* relation's filenode. */
+ ForkNumber forknum; /* fork number */
+ BlockNumber blocknum; /* block number */
+} BlockInfoRecord;

spcNode is capitalized differently from all of the other members.

+ LWLock *lock; /* protects SharedState */

Just declare this as LWLock lock, and initialize it using
LWLockInitialize. The way you're doing it is more complicated.

+static dsa_area *AutoPrewarmDSA = NULL;

DSA seems like more than you need here. There's only one allocation
being performed. I think it would be simpler and less error-prone to
use DSM directly. I don't even think you need a shm_toc. You could
just do:

seg = dsm_create(segsize, 0);
blkinfo = dsm_segment_address(seg);

Then pass dsm_segment_handle(seg) to the worker using bgw_main_arg or
bgw_extra, and have it call dsm_attach. An advantage of this approach
is that you only allocate the memory you actually need, whereas DSA
will allocate more, expecting that you might do further allocations.

+ pg_qsort(block_info_array, num_blocks, sizeof(BlockInfoRecord),
+ blockinfo_cmp);

I think it would make more sense to sort the array on the load side,
in the autoprewarm worker, rather than when dumping. First, many
dumps will never be reloaded, so there's no real reason to waste time
sorting them. Second, the way you've got the code right now, it
relies heavily on the assumption that the dump file will be sorted,
but that doesn't seem like a necessary assumption. We don't really
expect users to hand-edit the dump files, but if they do, it'd be nice
if it didn't randomly break things unnecessarily.

+ errmsg("autoprewarm: could not open \"%s\": %m",
+ dump_file_path)));

Use standard error message wordings! Don't create new translatable
messages by adding "autoprewarm" to error messages. There are
multiple instances of this throughout the patch.

+ snprintf(dump_file_path, sizeof(dump_file_path),
+ "%s", AUTOPREWARM_FILE);

This is completely pointless. You can get rid of the dump_file_path
variable and just use AUTOPREWARM_FILE wherever you would have used
dump_file_path. It's just making a copy of a string you already have.
Also, this code interrupts the flow of the surrounding logic in a
weird way; even if we needed it, it would make more sense to put it up
higher, where we construct the temporary path, or down lower, where
the value is actually needed.

+ SPI_connect();
+ PushActiveSnapshot(GetTransactionSnapshot());

It is really unclear why you need this, since the code does not use
SPI for anything, or do anything that needs a snapshot.

+ goto end_load;

I think you should try to rewrite this function so that it doesn't
need "goto". I also think in general that this function could be
written in a much more direct way by getting rid of the switch and the
BLKTYPE_* constants altogether. connect_to_db() can only happen once,
so do that the beginning. After that, the logic can look roughly like
this:

BlockInfoRecord *old_blk = NULL;

while (!got_sigterm && pos < maxpos && have_free_buffer())
{
BlockInfoRecord *blk = block_info[pos];

/* Quit if we've reached records for another database. */
if (old_blk != NULL && old_blk->dbOid != blk->dbOid)
break;

/*
* When we reach a new relation, close the old one. Note, however,
* that the previous try_relation_open may have failed, in which case
* rel will be NULL.
*/
if (old_blk != NULL && old_blk->relOid != blk->relOid && rel != NULL)
{
relation_close(rel, AccessShareLock);
rel = NULL;
}

/*
* Try to open each new relation, but only once, when we first
* encounter it. If it's been dropped, skip the associated blocks.
*/
if (old_blk == NULL || old_blk->relOid != blk->relOid)
{
Assert(rel == NULL);
rel = try_relation_open(blk->relOid, AccessShareLock);
}
if (!rel)
{
++pos;
continue;
}

/* Once per fork, check for fork existence and size. */
if (old_blk == NULL || old_blk->forkNumber != blk->forkNumber)
{
RelationOpenSmgr(rel);
if (smgrexists(rel->rd_smgr, blk->forkNumber))
nblocks = RelationGetNumberOfBlocksInFork(...);
else
nblocks = 0;
}

/* Prewarm buffer. */
buf = ReadBufferExtended(...);
...
++pos;
}

+ errhint("Kill all remaining database processes and restart"
+ " the database.")));

Don't break strings across lines. Just put it all on one (long) line.

I don't think you should really need default_database. It seems like
it should be possible to jigger things so that those blocks are loaded
together with some other database (say, let the first worker do it).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-05-24 12:26:24 Re: Index created in BEFORE trigger not updated during INSERT
Previous Message Dmitriy Sarafannikov 2017-05-24 11:27:52 Re: Broken hint bits (freeze)