Re: auto_explain vs. parallel query

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: auto_explain vs. parallel query
Date: 2016-11-01 13:15:27
Message-ID: CA+TgmoZE4eWGXexfAu8Qza0CwjbTR97HiLL-WJQ3+QCoOAWcYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 31, 2016 at 6:35 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> While debugging something on 9.6, I've noticed that auto_explain handles
> parallel queries in a slightly strange way - both the leader and all the
> workers log their chunk of the query (i.e. the leader logs explain for the
> whole query, while workers only log the parts they've been executing).
>
> So for example for a query with 2 workers, the log look like this:
>
> 2016-10-31 23:10:23.481 CET [12278] LOG: duration: 32.562 ms pl ...
> Query Text: <parallel query> ...
> Parallel Seq Scan on tables (cost=0.00..15158.25 rows=220 wi ...
> Filter: ((table_datoid < '1000000000'::oid) AND (table_nspo ...
> Rows Removed by Filter: 140614 ...
> 2016-10-31 23:10:23.481 CET [12277] LOG: duration: 32.325 ms pl ...
> Query Text: <parallel query> ...
> Parallel Seq Scan on tables (cost=0.00..15158.25 rows=220 wi ...
> Filter: ((table_datoid < '1000000000'::oid) AND (table_nspo ...
> Rows Removed by Filter: 80948 ...
> 2016-10-31 23:10:23.483 CET [12259] LOG: duration: 38.997 ms pl ...
> Query Text: explain analyze select * from tables where table_ ...
> Gather (cost=1000.00..16211.15 rows=529 width=356) (actual t ...
> Workers Planned: 2 ...
> Workers Launched: 2 ...
> -> Parallel Seq Scan on tables (cost=0.00..15158.25 rows= ...
> Filter: ((table_datoid < '1000000000'::oid) AND (tabl ...
> Rows Removed by Filter: 105570 ...
>
> I'd say that's fairly surprising, and I haven't found any discussion about
> auto_explain vs. parallel queries in the archives (within the last year), so
> I assume this may not be entirely intentional.

Yeah, that's not entirely intentional. :-(

> Not only this does not match the output when running EXPLAIN ANALYZE
> manually, it also provides no information about which messages from workers
> belong to which "leader" message.
>
> Another thing is that this interacts with sampling in a rather unfortunate
> way, because each process evaluates the sampling condition independently. So
> for example with auto_explain.sample_rate=0.5 a random subset of
> worker/leader explain plans will be logged.
>
> But this also applies to the conditions in ExecutorStart, which enables
> instrumentation only when the query gets sampled - so when the leader gets
> sampled but not all the workers, the counters in the query logged by the
> leader are incomplete.
>
> I do believe those are bugs that make auto_explain rather unusable with
> parallel queries. Adding IsParallelWorker() to the condition in ExecutorEnd
> should fix the extra logging messages (and log only from the leader).
>
> Not sure how to fix the sampling - simply adding IsParallelWorker() to
> ExecutorStart won't do the trick, because we don't want to enable
> instrumentation only for sample queries. So I guess this needs to be decided
> in the leader, and communicated to the workers somehow.

Yes, deciding in the leader and communicating to workers seems like
the way to go, but I'm not sure how simple that's going to be to
implement.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julian Markwort 2016-11-01 13:20:43 Re: [PATCH] pgpassfile connection option
Previous Message Julian Markwort 2016-11-01 12:58:41 Re: [PATCH] pgpassfile connection option