Re: Proposal : For Auto-Prewarm.

From: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Thom Brown <thom(at)linux(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal : For Auto-Prewarm.
Date: 2017-07-05 12:55:50
Message-ID: CAD__OuiPDAUstffn2TtA1gc-kQB7Cn5xFag_efLUiSX9xFp2XA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 3, 2017 at 3:34 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> Few comments on the latest patch:
>
> 1.
> + LWLockRelease(&apw_state->lock);
> + if (!is_bgworker)
> + ereport(ERROR,
> + (errmsg("could not perform block dump because dump file is being
> used by PID %d",
> + apw_state->pid_using_dumpfile)));
> + ereport(LOG,
> + (errmsg("skipping block dump because it is already being performed by PID %d",
> + apw_state->pid_using_dumpfile)));
>
>
> The above code looks confusing as both the messages are saying the
> same thing in different words. I think you keep one message (probably
> the first one) and decide error level based on if this is invoked for
> bgworker. Also, you can move LWLockRelease after error message,
> because if there is any error, then it will automatically release all
> lwlocks.

ERROR is used for autoprewarm_dump_now which is called from the backend.
LOG is used for bgworker.
wordings used are to match the context if failing to dump is
acceptable or not. In the case of bgworker, it is acceptable we are
not particular about the start time of dump but the only interval
between the dumps. So if already somebody doing it is acceptable. But
one who calls autoprewarm_dump_now might be particular about the start
time of dump so we throw error making him retry same.

The wording's are suggested by Robert(below snapshot) in one of his
previous comments and I also agree with it. If you think I should
reconsider this and I am missing something I am open to suggestions.

On Wed, May 31, 2017 at 10:18 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
+If we go to perform
+an immediate dump process and finds a non-zero value already just does
+ereport(ERROR, ...), including the PID of the other process in the
+message (e.g. "unable to perform block dump because dump file is being
+used by PID %d"). In a background worker, if we go to dump and find
+the file in use, log a message (e.g. "skipping block dump because it
+is already being performed by PID %d", "skipping prewarm because block
+dump file is being rewritten by PID %d").

Thanks moved the LWLockRelease after ERROR call.

> 2.
> +autoprewarm_dump_now(PG_FUNCTION_ARGS)
> +{
> + uint32 num_blocks = 0;
> +
> ..
> + PG_RETURN_INT64(num_blocks);
> ..
> }
>
> Is there any reason for using PG_RETURN_INT64 instead of PG_RETURN_UINT32?

Return type autoprewarm_dump_now() is pg_catalog.int8 to accommodate
uint32 so I used PG_RETURN_INT64. I think PG_RETURN_UINT32 can be used
as well I have replaced now.

> 3.
> +dump_now(bool is_bgworker)
> {
> ..
> + if (buf_state & BM_TAG_VALID)
> + {
> + block_info_array[num_blocks].database = bufHdr->tag.rnode.dbNode;
> + block_info_array[num_blocks].tablespace = bufHdr->tag.rnode.spcNode;
> + block_info_array[num_blocks].filenode = bufHdr->tag.rnode.relNode;
> + block_info_array[num_blocks].forknum = bufHdr->tag.forkNum;
> + block_info_array[num_blocks].blocknum = bufHdr->tag.blockNum;
> + ++num_blocks;
> + }
> ..
> }
>I think there is no use of writing Unlogged buffers unless the dump is
>for the shutdown. You might want to use BufferIsPermanent to detect the same.

-- I do not think that is true pages of the unlogged table are also
read into buffers for read-only purpose. So if we miss to dump them
while we shut down then the previous dump should be used.

> 4.
> +static uint32
> +dump_now(bool is_bgworker)
> {
> ..
> + for (num_blocks = 0, i = 0; i < NBuffers; i++)
> + {
> + uint32 buf_state;
> +
> + if (!is_bgworker)
> + CHECK_FOR_INTERRUPTS();
> ..
> }
>
> Why checking for interrupts is only for non-bgwroker cases?

-- autoprewarm_dump_now is directly called from the backend. In such
case, we have to handle signals registered for backend in dump_now().
For bgworker dump_block_info_periodically caller of dump_now() handles
SIGTERM, SIGUSR1 which we are interested in.

> 5.
> + * Each entry of BlockRecordInfo consists of database, tablespace, filenode,
> + * forknum, blocknum. Note that this is in the text form so that the dump
> + * information is readable and can be edited, if required.
> + */
>
> In the above comment, you are saying that the dump file is in text
> form whereas in the code you are using binary form. I think code
> should match comments. Is there a reason of preferring binary over
> text or vice versa?

-- Previously I used the write() on file descriptor. Sorry I should
have changed the mode of opening to text mode when I moved the code to
use AllocateFile Sorry fixed same now.

> 6.
> +dump_now(bool is_bgworker)
> {
> ..
> + (void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
> + apw_state->pid_using_dumpfile = InvalidPid;
> ..
> }
>
> How will pid_using_dumpfile be set to InvalidPid in the case of error
> for non-bgworker cases?

-- I have a try() - catch() in autoprewarm_dump_now I think that is okay.

>
> 7.
> +dump_now(bool is_bgworker)
> {
> ..
> + (void) durable_rename(transient_dump_file_path, AUTOPREWARM_FILE, ERROR);
>
> ..
> }
>
> How will transient_dump_file_path be unlinked in the case of error in
> durable_rename? I think you need to use PG_TRY..PG_CATCH to ensure
> same?

-- If durable_rename is failing that seems basic functionality of
autoperwarm is failing so I want it to be an ERROR. I do not want to
remove the temp file as we always truncate before reusing it again. So
I think there is no need to catch all ERROR in dump_now() just to
remove the temp file.

> 8.
> + file = AllocateFile(transient_dump_file_path, PG_BINARY_W);
> + if (!file)
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("could not open file \"%s\": %m",
> + transient_dump_file_path)));
> +
> + ret = fprintf(file, "<<%u>>\n", num_blocks);
> + if (ret < 0)
> + {
> + int save_errno = errno;
> +
> + FreeFile(file);
>
> I think you don't need to close the file in case of error, it will be
> automatically taken care in case of error (via transaction abort
> path).

-- I was trying to close the file before unlinking. Agree it is not
necessary to do so but just did it as a practice. I have removed it
now.

> 9.
> + /* Register autoprewarm load. */
> + setup_autoprewarm(&autoprewarm_worker, "autoprewarm", "autoprewarm_main",
> + Int32GetDatum(TASK_PREWARM_BUFFERPOOL), 0, 0);
>
> What does "load" signify in above comment? Do you want to say worker instead?

-- Sorry fixed now.

In addition to above I have made one more change, per-database
autoprewarm bgworker has been renamed from "autoprewarm" to
"per-database autoprewarm"

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
autoprewarm_18.patch application/octet-stream 34.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2017-07-05 13:53:35 Re: Request more documentation for incompatibility of parallelism and plpgsql exec_run_select
Previous Message Mithun Cy 2017-07-05 12:55:28 Re: Proposal : For Auto-Prewarm.