Re: Proposal : For Auto-Prewarm.

From: Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Beena Emerson <memissemerson(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal : For Auto-Prewarm.
Date: 2017-08-18 06:23:29
Message-ID: CAD__OuhXaGPy7Xfhvm+aCaipXQ967tmfPXjYHbnbr2TqcJOM+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 16, 2017 at 2:08 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jul 14, 2017 at 8:17 AM, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com> wrote:
>> [ new patch ]
> It's quite possible that in making all of these changes I've
> introduced some bugs, so I think this needs some testing and review.
> It's also possible that some of the changes that I made are actually
> not improvements and should be reverted, but it's always hard to tell
> that about your own code. Anyway, please see the attached version.

Sorry, Robert, I was on vacation so could not pick this immediately. I
have been testing and reviewing the patch and I found following
issues.

1. Hang in apw_detach_shmem.
+/*
+ * Clear our PID from autoprewarm shared state.
+ */
+static void
+apw_detach_shmem(int code, Datum arg)
+{
+ LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+ if (apw_state->pid_using_dumpfile == MyProcPid)
+ apw_state->pid_using_dumpfile = InvalidPid;
+ if (apw_state->bgworker_pid == MyProcPid)
+ apw_state->bgworker_pid = InvalidPid;
+ LWLockRelease(&apw_state->lock);
+}

The reason is that we might already be under the apw_state->lock when
we error out and jump to apw_detach_shmem. So we should not be trying
to take the lock again. For example, in autoprewarm_dump_now(),
apw_dump_now() will error out under the lock if bgworker is already
using dump file.

=======
+autoprewarm_dump_now(PG_FUNCTION_ARGS)
+{
+ int64 num_blocks;
+
+ apw_init_shmem();
+
+ PG_TRY();
+ {
+ num_blocks = apw_dump_now(false, true);
+ }
+ PG_CATCH();
+ {
+ apw_detach_shmem(0, 0);
+ PG_RE_THROW();
+ }
+ PG_END_TRY();
+
+ PG_RETURN_INT64(num_blocks);
+}

=======
+ LWLockAcquire(&apw_state->lock, LW_EXCLUSIVE);
+ if (apw_state->pid_using_dumpfile == InvalidPid)
+ apw_state->pid_using_dumpfile = MyProcPid;
+ else
+ {
+ if (!is_bgworker)
+ ereport(ERROR,
+ (errmsg("could not perform block dump because
dump file is being used by PID %d",
+ apw_state->pid_using_dumpfile)));

This attempt to take lock again hangs the autoprewarm module. I think
there is no need to take lock while we reset those variables as we
reset only if we have set it ourselves.

2) I also found one issue which was my own mistake in my previous patch 19.
In "apw_dump_now" I missed calling FreeFile() on first write error,
whereas on othercases I am already calling the same.
ret = fprintf(file, "<<" INT64_FORMAT ">>\n", num_blocks);
+ if (ret < 0)
+ {
+ int save_errno = errno;
+
+ unlink(transient_dump_file_path);

Other than this, the patch is working as it was previously doing. If
you think my presumed fix(not to take lock) to hang issue is right I
will produce a patch for the same.

--
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-08-18 06:30:31 Re: recovery_target_time = 'now' is not an error but still impractical setting
Previous Message Amit Khandekar 2017-08-18 06:05:33 Re: expanding inheritance in partition bound order