Re: modeling parallel contention (was: Parallel Append implementation)

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: modeling parallel contention (was: Parallel Append implementation)
Date: 2017-05-09 05:28:55
Message-ID: CAJrrPGcUCOtb21j7uou2Ng0ikrrRbKz44OVCkz_MgWddDbncXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 8, 2017 at 11:39 AM, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:

>
> We really need a machine with good IO concurrency, and not too much
> RAM to test these things out. It could well be that for a suitability
> large enough table we'd want to scan a whole 1GB extent per worker.
>
> I did post a patch to have heap_parallelscan_nextpage() use atomics
> instead of locking over in [1], but I think doing atomics there does
> not rule out also adding batching later. In fact, I think it
> structures things so batching would be easier than it is today.
>

As part of our internal PostgreSQL project, we developed parallel seq
scan with batch mode only. The problem that we faced with batch mode
is making sure that all the parallel workers should finish almost the same
time with a proper distribution of data pages. Otherwise, it may lead to
a problem where one worker only doing the last batch job and all others
gets finished their job. In these cases, we cannot achieve good performance.

Whereas in the current approach, the maximum time the last worker
will do the job is scanning the last one page of the table.

If we go with batching of 1GB per worker, there may be chances that, the
data that satisfies the query condition may fall into only one extent then
in these cases also the batching may not yield the good results.

Regards,
Hari Babu
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2017-05-09 05:30:45 Re: Time based lag tracking for logical replication
Previous Message Michael Paquier 2017-05-09 05:24:03 Re: SUBSCRIPTIONS and pg_upgrade