Re: postgres_fdw vs. force_parallel_mode on ppc

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: postgres_fdw vs. force_parallel_mode on ppc
Date: 2016-03-04 20:58:05
Message-ID: 5789.1457125085@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Mar 4, 2016 at 11:17 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Huh? Parallel workers are read-only; what would they be doing sending
>> any of these messages?

> Mumble. I have no idea what's happening here.

OK, after inserting a bunch of debug logging I have figured out what is
happening. The updates on trunc_stats_test et al, being updates, are
done in the session's main backend. But we also have these queries:

-- do a seqscan
SELECT count(*) FROM tenk2;
-- do an indexscan
SELECT count(*) FROM tenk2 WHERE unique1 = 1;

These can be, and are, done in parallel worker processes (and not
necessarily the same one, either). AFAICT, the parallel worker
processes send their stats messages to the stats collector more or
less immediately after processing their queries. However, because
of the rate-limiting logic in pgstat_report_stat, the main backend
doesn't. The point of that "pg_sleep(1.0)" (which was actually added
*after* wait_for_stats) is to ensure that the half-second delay in
the rate limiter has been soaked up, and the stats messages sent,
before we start waiting for the results to become visible in the
stats collector's output.

So the sequence of events when we get a failure looks like

1. parallel workers send stats updates for seqscan and indexscan
on tenk2.

2. stats collector emits output files, probably as a result of
an autovacuum request.

3. session's main backend finishes "pg_sleep(1.0)" and sends
stats updates for what it's done lately, including the
updates on trunc_stats_test et al.

4. wait_for_stats() observes that the tenk2 idx_scan count has
already advanced and figures it need not wait at all.

5. We print stale stats for trunc_stats_test et al.

So it appears to me that to make this robust, we need to adjust
wait_for_stats to verify advances on *all three of* the tenk2
seq_scan count, the tenk2 idx_scan count, and at least one of
the trunc_stats_test tables' counters, because those could be
coming from three different backend processes.

If we ever allow parallel workers to do writes, this will really
become a mess.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-03-04 20:58:28 Re: Default Roles
Previous Message Alvaro Herrera 2016-03-04 20:55:21 Re: Relaxing SSL key permission checks