Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby
Date: 2016-02-21 12:44:42
Message-ID: CAB7nPqSO6HVJ0T6LUT84PCy+_ihitdt64Ze2D+SJrHZy72Y0wg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Fri, Feb 19, 2016 at 10:33 PM, Amit Kapila wrote:
> On Fri, Feb 12, 2016 at 6:57 PM, Michael Paquier wrote:
> You doesn't seem to have taken care of below typo in your patch as
> pointed out by me earlier.
>
> + * to not rely on taking an exclusive lock an all the WAL insertion locks,
>
> /an all/on all

Sorry about that. Hopefully fixed now.

> Does this in anyway take care of the case when there is an activity
> on unlogged tables?
> I think avoiding to perform checkpoints in an idle system is introduced
> in commit 4d14fe0048cf80052a3ba2053560f8aab1bb1b22 whereas
> unlogged relation is introduced by commit
> 53dbc27c62d8e1b6c5253feba04a5094cb8fe046 which is much later.
> Now, I think one might consider it okay to not do anything for
> unlogged tables as it is not done previously and this patch is
> anyway improving the current situation, but I feel if we agree
> that checkpoints will get skipped if the only operations that
> are happening in the system are on unlogged relations, then
> it is better to add it in comments as an improvement even if we
> don't want to address it as part of this patch.

Nope, nothing is done, just to not complicate the patch more than
necessary. I agree though that we had better not update the progress
LSN when logging something related to INIT_FORKNUM, there is little
point to run non-forced checkpoints in this case as on recovery
unlogged relations are wiped out.

> + elog(LOG, "Not a forced or shutdown checkpoint: progress_lsn %X/%X, ckpt
> %X/%X",
> + (uint32) (progress_lsn >> 32), (uint32) progress_lsn,
> + (uint32) (ControlFile->checkPoint >> 32), (uint32)
> ControlFile->checkPoint);
>
> Are you proposing to have the newly introduced elog messages
> as part of commit or are these just for debugging purpose? If you
> are proposing for commit, then I think it is worth to justify the need
> of same and we should discuss what is the appropriate log level,
> otherwise, I think you can have these in an additional patch just for
> verification as the main patch is now very close to being
> ready for committer.

I have never intended to have those logs being part of the patch that
should be committed, and putting them at LOG level avoided to have a
bunch of garbage in the logs during the tests (got lazy to grep on the
entries for those tests). I am no:t against adding a comment, here is
one I just came up with:
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -448,6 +448,12 @@ typedef struct XLogwrtResult
* hence reducing the impact of the activity lookup. This takes also
* advantage to avoid 8-byte torn reads on some platforms by using the
* fact that each insert lock is located on the same cache line.
+ * XXX: There is still room for more improvements here, particularly
+ * WAL operations related to unlogged relations (INIT_FORKNUM) should not
+ * update the progress LSN as those relations are reset during crash
+ * recovery so enforcing buffers of such relations to be flushed for
+ * example in the case of a load only on unlogged relations is a waste
+ * of disk write.
*/

> Also, I think it is worth to once take the performance data for
> write tests (using pgbench 30 minute run or some other way) with
> minimum checkpoint_timeout (i.e 30s) to see if the additional locking
> has any impact on performance. I think taking locks at intervals
> of 15s or 30s should not matter much, but it is better to be safe.

I don't think performance will be impacted, but there are no reasons
to not do any measurements either. I'll try to get some graphs
tomorrow with runs on my laptop, mainly looking for any effects of
this patch on TPS when checkpoints show up.

For now attached is an updated patch, with a second patch (b)
including the logs for testing.
--
Michael

Attachment Content-Type Size
hs-checkpoints-v7-b.patch text/x-patch 1.7 KB
hs-checkpoints-v7-a.patch text/x-patch 47.8 KB

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrew Dunstan 2016-02-21 15:46:18 Re: BUG #13936: jsonb_object() -> ERROR: unknown type of jsonb container
Previous Message Brian Ghidinelli 2016-02-20 03:16:45 Re: BUG #13970: Vacuum hangs on particular table; cannot be terminated - requires `kill -QUIT pid`

Browse pgsql-hackers by date

  From Date Subject
Next Message Bill Moran 2016-02-21 12:56:09 Re: JDBC behaviour
Previous Message Christoph Berg 2016-02-21 11:32:36 Re: Relaxing SSL key permission checks