From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-committers <pgsql-committers(at)postgresql(dot)org> |
Subject: | Re: pgsql: Add parallel-aware hash joins. |
Date: | 2017-12-22 08:16:10 |
Message-ID: | CAEepm=0WxwzpHVHt3PcWHBV=L3k3FDb6dvMq1A2Li49LGBa7TA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
On Fri, Dec 22, 2017 at 1:48 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> I don't think that's quite it, because it should never have set
> 'writing' for any batch number >= nbatch.
>
> It's late here, but I'll take this up tomorrow and either find a fix
> or figure out how to avoid antisocial noise levels on the build farm
> in the meantime.
Not there yet but I learned some things and am still working on it. I
spent a lot of time trying to reproduce the assertion failure, and
succeeded exactly once. Unfortunately the one time I managed do to
that I'd built with clang -O2 and got a core file that I couldn't get
much useful info out of, and I've been trying to do it again with -O0
ever since without luck. The time I succeeded, I reproduced it by
creating the tables "simple" and "bigger_than_it_looks" from join.sql
and then doing this in a loop:
set min_parallel_table_scan_size = 0;
set parallel_setup_cost = 0;
set work_mem = '192kB';
explain analyze select count(*) from simple r join
bigger_than_it_looks s using (id);
The machine that it happened on is resource constrained, and exhibits
another problem: though the above query normally runs in ~20ms,
sometimes it takes several seconds and occasionally much longer. That
never happens on fast development systems or test servers which run it
quickly every time, and it doesn't happen on my 2 core slow system if
I have only two workers (or one worker + leader). I dug into that and
figured out what was going wrong and wrote that up separately[1],
because I think it's an independent bug needing to be fixed, not the
root cause here. However, I think it could easily be contributing to
the timing required to trigger the bug we're looking for.
Andres, your machine francolin crashed -- got a core file?
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2017-12-22 08:22:03 | Re: pgsql: Add parallel-aware hash joins. |
Previous Message | Alvaro Herrera | 2017-12-21 22:15:23 | pgsql: Minor edits to catalog files and scripts |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2017-12-22 08:22:03 | Re: pgsql: Add parallel-aware hash joins. |
Previous Message | Michael Paquier | 2017-12-22 08:10:30 | Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256 |