Re: pgsql: Add parallel-aware hash joins.

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-committers <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Add parallel-aware hash joins.
Date: 2017-12-22 08:16:10
Message-ID: CAEepm=0WxwzpHVHt3PcWHBV=L3k3FDb6dvMq1A2Li49LGBa7TA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Fri, Dec 22, 2017 at 1:48 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> I don't think that's quite it, because it should never have set
> 'writing' for any batch number >= nbatch.
>
> It's late here, but I'll take this up tomorrow and either find a fix
> or figure out how to avoid antisocial noise levels on the build farm
> in the meantime.

Not there yet but I learned some things and am still working on it. I
spent a lot of time trying to reproduce the assertion failure, and
succeeded exactly once. Unfortunately the one time I managed do to
that I'd built with clang -O2 and got a core file that I couldn't get
much useful info out of, and I've been trying to do it again with -O0
ever since without luck. The time I succeeded, I reproduced it by
creating the tables "simple" and "bigger_than_it_looks" from join.sql
and then doing this in a loop:

set min_parallel_table_scan_size = 0;
set parallel_setup_cost = 0;
set work_mem = '192kB';

explain analyze select count(*) from simple r join
bigger_than_it_looks s using (id);

The machine that it happened on is resource constrained, and exhibits
another problem: though the above query normally runs in ~20ms,
sometimes it takes several seconds and occasionally much longer. That
never happens on fast development systems or test servers which run it
quickly every time, and it doesn't happen on my 2 core slow system if
I have only two workers (or one worker + leader). I dug into that and
figured out what was going wrong and wrote that up separately[1],
because I think it's an independent bug needing to be fixed, not the
root cause here. However, I think it could easily be contributing to
the timing required to trigger the bug we're looking for.

Andres, your machine francolin crashed -- got a core file?

[1] https://www.postgresql.org/message-id/CAEepm%3D0NWKehYw7NDoUSf8juuKOPRnCyY3vuaSvhrEWsOTAa3w%40mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Andres Freund 2017-12-22 08:22:03 Re: pgsql: Add parallel-aware hash joins.
Previous Message Alvaro Herrera 2017-12-21 22:15:23 pgsql: Minor edits to catalog files and scripts

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-12-22 08:22:03 Re: pgsql: Add parallel-aware hash joins.
Previous Message Michael Paquier 2017-12-22 08:10:30 Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256