Re: Parallel Seq Scan

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bert <biertie(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-11-19 03:41:41
Message-ID: CAA4eK1K-+hGmAkk1g62D+3EW9G77OH_XLb28Sih=dkO+t0-s+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 17, 2015 at 1:21 AM, Bert <biertie(at)gmail(dot)com> wrote:

> Hey,
>
> I've just pulled and compiled the new code.
> I'm running a TPC-DS like test on different PostgreSQL installations, but
> running (max) 12queries in parallel on a server with 12cores.
> I've configured max_parallel_degree to 2, and I get messages that backend
> processes crash.
> I am running the same test now with 6queries in parallel, and parallel
> degree to 2, and they seem to work. for now. :)
>
> This is the output I get in /var/log/messages
> Nov 16 20:40:05 woludwha02 kernel: postgres[22918]: segfault at
> 7fa3437bf104 ip 0000000000490b56 sp 00007ffdf2f083a0 error 6 in
> postgres[400000+5b5000]
>
>
Thanks for reporting the issue.

I think whats going on here is that when any of the session doesn't
get any workers, we shutdown the Gather node which internally destroys
the dynamic shared memory segment as well. However the same is
needed as per current design for doing scan by master backend as
well. So I think the fix would be to just do shutdown of workers which
actually won't do anything in this scenario. I have tried to reproduce
this issue with a simpler test case as below:

Create two tables with large data:
CREATE TABLE t1(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 10000000) g;

CREATE TABLE t2(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 1000000) g;

Set max_worker_processes = 2 in postgresql.conf

Session-1
set max_parallel_degree=4;
set parallel_tuple_cost=0;
set parallel_setup_cost=0;
Explain analyze select count(*) from t1 where c1 > 10000;

Session-2
set max_parallel_degree=4;
set parallel_tuple_cost=0;
set parallel_setup_cost=0;
Explain analyze select count(*) from t2 where c1 > 10000;

The trick to reproduce is that the Explain statement in Session-2
needs to be executed immediately after Explain statement in
Session-1.

Attached patch fixes the issue for me.

I think here we can go for somewhat more invasive fix as well which is
if the statement didn't find any workers, then reset the dsm and also
reset the execution tree (which in case of seq scan means clear the
parallel scan desc and may be few more fields in scan desc) such that it
performs seq scan. I am not sure how future-proof such a change would
be, because resetting some of the fields in execution tree and expecting
it to work in all cases might not be feasible for all nodes.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
fix_early_dsm_destroy_v1.patch application/octet-stream 446 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2015-11-19 03:48:18 Re: proposal: LISTEN *
Previous Message Marko Tiikkaja 2015-11-19 03:40:00 proposal: LISTEN *