Re: Parallel Seq Scan

From: Thom Brown <thom(at)linux(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2014-12-31 16:16:20
Message-ID: CAA-aLv4CRgko6C_KaY1gazS1NwTHY=h-Rq8a-VteGHyDqKHRtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 31 December 2014 at 14:20, Thom Brown <thom(at)linux(dot)com> wrote:

> On 18 December 2014 at 16:03, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
>>
>>
>> On Thu, Dec 18, 2014 at 9:22 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> >
>> > On Mon, Dec 8, 2014 at 10:40 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> > >
>> > > On Sat, Dec 6, 2014 at 5:37 PM, Stephen Frost <sfrost(at)snowman(dot)net>
>> wrote:
>> > > >
>> > >
>> > > So to summarize my understanding, below are the set of things
>> > > which I should work on and in the order they are listed.
>> > >
>> > > 1. Push down qualification
>> > > 2. Performance Data
>> > > 3. Improve the way to push down the information related to worker.
>> > > 4. Dynamic allocation of work for workers.
>> > >
>> > >
>> >
>> > I have worked on the patch to accomplish above mentioned points
>> > 1, 2 and partly 3 and would like to share the progress with community.
>>
>> Sorry forgot to attach updated patch in last mail, attaching it now.
>>
>
> When attempting to recreate the plan in your example, I get an error:
>
> ➤ psql://thom(at)[local]:5488/pgbench
>
> # create table t1(c1 int, c2 char(500)) with (fillfactor=10);
> CREATE TABLE
> Time: 13.653 ms
>
> ➤ psql://thom(at)[local]:5488/pgbench
>
> # insert into t1 values(generate_series(1,100),'amit');
> INSERT 0 100
> Time: 4.796 ms
>
> ➤ psql://thom(at)[local]:5488/pgbench
>
> # explain select c1 from t1;
> ERROR: could not register background process
> HINT: You may need to increase max_worker_processes.
> Time: 1.659 ms
>
> ➤ psql://thom(at)[local]:5488/pgbench
>
> # show max_worker_processes ;
> max_worker_processes
> ----------------------
> 8
> (1 row)
>
> Time: 0.199 ms
>
> # show parallel_seqscan_degree ;
> parallel_seqscan_degree
> -------------------------
> 10
> (1 row)
>
>
> Should I really need to increase max_worker_processes to >=
> parallel_seqscan_degree? If so, shouldn't there be a hint here along with
> the error message pointing this out? And should the error be produced when
> only a *plan* is being requested?
>
> Also, I noticed that where a table is partitioned, the plan isn't
> parallelised:
>
> # explain select distinct bid from pgbench_accounts;
>
>
> QUERY
> PLAN
>
> ----------------------------------------------------------------------------------------
> HashAggregate (cost=1446639.00..1446643.99 rows=499 width=4)
> Group Key: pgbench_accounts.bid
> -> Append (cost=0.00..1321639.00 rows=50000001 width=4)
> -> Seq Scan on pgbench_accounts (cost=0.00..0.00 rows=1 width=4)
> -> Seq Scan on pgbench_accounts_1 (cost=0.00..4279.00
> rows=100000 width=4)
> -> Seq Scan on pgbench_accounts_2 (cost=0.00..2640.00
> rows=100000 width=4)
> -> Seq Scan on pgbench_accounts_3 (cost=0.00..2640.00
> rows=100000 width=4)
> -> Seq Scan on pgbench_accounts_4 (cost=0.00..2640.00
> rows=100000 width=4)
> -> Seq Scan on pgbench_accounts_5 (cost=0.00..2640.00
> rows=100000 width=4)
> -> Seq Scan on pgbench_accounts_6 (cost=0.00..2640.00
> rows=100000 width=4)
> -> Seq Scan on pgbench_accounts_7 (cost=0.00..2640.00
> rows=100000 width=4)
> ...
> -> Seq Scan on pgbench_accounts_498 (cost=0.00..2640.00
> rows=100000 width=4)
> -> Seq Scan on pgbench_accounts_499 (cost=0.00..2640.00
> rows=100000 width=4)
> -> Seq Scan on pgbench_accounts_500 (cost=0.00..2640.00
> rows=100000 width=4)
> (504 rows)
>
> Is this expected?
>

Another issue (FYI, pgbench2 initialised with: pgbench -i -s 100 -F 10
pgbench2):

➤ psql://thom(at)[local]:5488/pgbench2

# explain select distinct bid from pgbench_accounts;
QUERY
PLAN
-------------------------------------------------------------------------------------------
HashAggregate (cost=245833.38..245834.38 rows=100 width=4)
Group Key: bid
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..220833.38
rows=10000000 width=4)
Number of Workers: 8
Number of Blocks Per Workers: 208333
(5 rows)

Time: 7.476 ms

➤ psql://thom(at)[local]:5488/pgbench2

# explain (analyse, buffers, verbose) select distinct bid from
pgbench_accounts;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Time: 14897.991 ms

The logs say:

2014-12-31 15:21:42 GMT [9164]: [240-1] user=,db=,client= LOG: registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [241-1] user=,db=,client= LOG: registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [242-1] user=,db=,client= LOG: registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [243-1] user=,db=,client= LOG: registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [244-1] user=,db=,client= LOG: registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [245-1] user=,db=,client= LOG: registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [246-1] user=,db=,client= LOG: registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [247-1] user=,db=,client= LOG: registering
background worker "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [248-1] user=,db=,client= LOG: starting
background worker process "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [249-1] user=,db=,client= LOG: starting
background worker process "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [250-1] user=,db=,client= LOG: starting
background worker process "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [251-1] user=,db=,client= LOG: starting
background worker process "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [252-1] user=,db=,client= LOG: starting
background worker process "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [253-1] user=,db=,client= LOG: starting
background worker process "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [254-1] user=,db=,client= LOG: starting
background worker process "backend_worker"
2014-12-31 15:21:42 GMT [9164]: [255-1] user=,db=,client= LOG: starting
background worker process "backend_worker"
2014-12-31 15:21:46 GMT [9164]: [256-1] user=,db=,client= LOG: worker
process: backend_worker (PID 10887) exited with exit code 1
2014-12-31 15:21:46 GMT [9164]: [257-1] user=,db=,client= LOG:
unregistering background worker "backend_worker"
2014-12-31 15:21:50 GMT [9164]: [258-1] user=,db=,client= LOG: worker
process: backend_worker (PID 10888) exited with exit code 1
2014-12-31 15:21:50 GMT [9164]: [259-1] user=,db=,client= LOG:
unregistering background worker "backend_worker"
2014-12-31 15:21:57 GMT [9164]: [260-1] user=,db=,client= LOG: server
process (PID 10869) was terminated by signal 9: Killed
2014-12-31 15:21:57 GMT [9164]: [261-1] user=,db=,client= DETAIL: Failed
process was running: explain (analyse, buffers, verbose) select distinct
bid from pgbench_accounts;
2014-12-31 15:21:57 GMT [9164]: [262-1] user=,db=,client= LOG: terminating
any other active server processes

Running it again, I get the same issue. This is with
parallel_seqscan_degree set to 8, and the crash occurs with 4 and 2 too.

This doesn't happen if I set the pgbench scale to 50. I suspect this is a
OOM issue. My laptop has 16GB RAM, the table is around 13GB at scale 100,
and I don't have swap enabled. But I'm concerned it crashes the whole
instance.

I also notice that requesting BUFFERS in a parallel EXPLAIN output yields
no such information. Is that not possible to report?
--
Thom

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-12-31 16:19:52 Re: Publish autovacuum informations
Previous Message Robert Haas 2014-12-31 16:10:30 Re: The return value of allocate_recordbuf()