Re: Parallel Seq Scan

From: Thom Brown <thom(at)linux(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-03-25 11:46:08
Message-ID: CAA-aLv6JMAsDOg7R6DzvcWgLCSukGK_Ap4gRfiC+1NgWaqHAVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 25 March 2015 at 10:27, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Fri, Mar 20, 2015 at 5:36 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> >
> > So the patches have to be applied in below sequence:
> > HEAD Commit-id : 8d1f2390
> > parallel-mode-v8.1.patch [2]
> > assess-parallel-safety-v4.patch [1]
> > parallel-heap-scan.patch [3]
> > parallel_seqscan_v11.patch (Attached with this mail)
> >
> > The reason for not using the latest commit in HEAD is that latest
> > version of assess-parallel-safety patch was not getting applied,
> > so I generated the patch at commit-id where I could apply that
> > patch successfully.
> >
> > [1] -
> http://www.postgresql.org/message-id/CA+TgmobJSuefiPOk6+i9WERUgeAB3ggJv7JxLX+r6S5SYydBRQ@mail.gmail.com
> > [2] -
> http://www.postgresql.org/message-id/CA+TgmoZJjzYnpXChL3gr7NwRUzkAzPMPVKAtDt5sHvC5Cd7RKw@mail.gmail.com
> > [3] -
> http://www.postgresql.org/message-id/CA+TgmoYJETgeAXUsZROnA7BdtWzPtqExPJNTV1GKcaVMgSdhug@mail.gmail.com
> >
>
> Fixed the reported issue on assess-parallel-safety thread and another
> bug caught while testing joins and integrated with latest version of
> parallel-mode patch (parallel-mode-v9 patch).
>
> Apart from that I have moved the Initialization of dsm segement from
> InitNode phase to ExecFunnel() (on first execution) as per suggestion
> from Robert. The main idea is that as it creates large shared memory
> segment, so do the work when it is really required.
>
>
> HEAD Commit-Id: 11226e38
> parallel-mode-v9.patch [2]
> assess-parallel-safety-v4.patch [1]
> parallel-heap-scan.patch [3]
> parallel_seqscan_v12.patch (Attached with this mail)
>
> [1] -
> http://www.postgresql.org/message-id/CA+TgmobJSuefiPOk6+i9WERUgeAB3ggJv7JxLX+r6S5SYydBRQ@mail.gmail.com
> [2] -
> http://www.postgresql.org/message-id/CA+TgmoZfSXZhS6qy4Z0786D7iU_AbhBVPQFwLthpSvGieczqHg@mail.gmail.com
> [3] -
> http://www.postgresql.org/message-id/CA+TgmoYJETgeAXUsZROnA7BdtWzPtqExPJNTV1GKcaVMgSdhug@mail.gmail.com
>

Okay, with my pgbench_accounts partitioned into 300, I ran:

SELECT DISTINCT bid FROM pgbench_accounts;

The query never returns, and I also get this:

grep -r 'starting background worker process "parallel worker for PID
12165"' postgresql-2015-03-25_112522.log | wc -l
2496

2,496 workers? This is with parallel_seqscan_degree set to 8. If I set it
to 2, this number goes down to 626, and with 16, goes up to 4320.

Here's the query plan:

QUERY
PLAN
---------------------------------------------------------------------------------------------------------
HashAggregate (cost=38856527.50..38856529.50 rows=200 width=4)
Group Key: pgbench_accounts.bid
-> Append (cost=0.00..38806370.00 rows=20063001 width=4)
-> Seq Scan on pgbench_accounts (cost=0.00..0.00 rows=1 width=4)
-> Funnel on pgbench_accounts_1 (cost=0.00..192333.33
rows=100000 width=4)
Number of Workers: 8
-> Partial Seq Scan on pgbench_accounts_1
(cost=0.00..1641000.00 rows=100000 width=4)
-> Funnel on pgbench_accounts_2 (cost=0.00..192333.33
rows=100000 width=4)
Number of Workers: 8
-> Partial Seq Scan on pgbench_accounts_2
(cost=0.00..1641000.00 rows=100000 width=4)
-> Funnel on pgbench_accounts_3 (cost=0.00..192333.33
rows=100000 width=4)
Number of Workers: 8
...
-> Partial Seq Scan on pgbench_accounts_498
(cost=0.00..10002.10 rows=210 width=4)
-> Funnel on pgbench_accounts_499 (cost=0.00..1132.34 rows=210
width=4)
Number of Workers: 8
-> Partial Seq Scan on pgbench_accounts_499
(cost=0.00..10002.10 rows=210 width=4)
-> Funnel on pgbench_accounts_500 (cost=0.00..1132.34 rows=210
width=4)
Number of Workers: 8
-> Partial Seq Scan on pgbench_accounts_500
(cost=0.00..10002.10 rows=210 width=4)

Still not sure why 8 workers are needed for each partial scan. I would
expect 8 workers to be used for 8 separate scans. Perhaps this is just my
misunderstanding of how this feature works.

--
Thom

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sawada Masahiko 2015-03-25 11:46:41 Re: Auditing extension for PostgreSQL (Take 2)
Previous Message Shigeru HANADA 2015-03-25 11:41:13 Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)