Re: group locking: incomplete patch, just for discussion

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: group locking: incomplete patch, just for discussion
Date: 2014-10-29 08:48:51
Message-ID: CA+U5nMLExvfQxipqYvW8wan6HLJW0-vsNZzWP-+Sh_TSkR4Q3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28 October 2014 23:24, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>> You asked for my help, but I'd like to see some concrete steps towards
>> an interim feature so I can see some benefit in a clear direction.
>>
>> Can we please have the first step we discussed? Parallel CREATE INDEX?
>> (Note the please)
>
> What I've been thinking about trying to work towards is parallel
> sequential scan. I think that it would actually be pretty easy to
> code up a mostly-working version using the existing infrastructure,
> but the patch would be rejected with a bazooka, because the
> non-working parts would include things like:
>
> 1. The cooperating backends might not all be using the same snapshot,
> because that requires sharing the snapshot, combo CID hash, and
> transaction state.
> 2. The quals that got pushed down to the workers might not return the
> same answers that they would have produced with a single backend,
> because we have no mechanism for assessing pushdown-safety.
> 3. Deadlock detection would be to some greater or lesser degree
> broken, the details depending on the implementation choices you made.
>
> There is a bit of a chicken-and-egg problem here. If I submit a patch
> for parallel sequential scan, it'll (justifiably) get rejected because
> it doesn't solve those problems. So I'm trying to solve those
> above-enumerated problems first, with working and at least
> somewhat-useful examples that show how the incremental bits of
> infrastructure can be used to do stuff. But that leads to your
> (understandable) complaint that this isn't very real yet.
>
> Why am I now thinking about parallel sequential scan instead of
> parallel CREATE INDEX? You may remember that I posted a patch for a
> new memory allocator some time ago, and it came in for a fair amount
> of criticism and not much approbation. Some of that criticism was
> certainly justified, and perhaps I was as hard on myself as anyone
> else was. However you want to look at it, I see the trade-off between
> parallel sort and parallel seq-scan this way: parallel seq-scan
> requires dealing with the planner (ouch!) but parallel sort requires
> dealing with memory allocation in dynamic shared memory segments
> (ouch!). Both of them require solving the three problems listed
> above.
>
> And maybe a few others, but I think those are the big ones - and I
> think proper deadlock detection is the hardest of them. A colleague
> of mine has drafted patches for sharing snapshots and combo CIDs
> between processes, and as you might expect that's pretty easy.
> Sharing the transaction state (so we can test whether a transaction ID
> is "our" transaction ID inside the worker) is a bit trickier, but I
> think not too hard. Assessing pushdown-safety will probably boil down
> to adding some equivalent of proisparallel. Maybe not the most
> elegant, but defensible, and if you're looking for the shortest path
> to something usable, that's probably it. But deadlock detection ...
> well, I don't see any simpler solution than what I'm trying to build
> here.

OK, I see where you are headed, and perhaps why.

I said I would help and I mean to do so. Here's my thoughts:

Solving your 3 problems requires significant research, coding and
agreement that you're unlikely to get in 9.5 because there are no
solutions happening at the end of it; its all theoretical design,
which becomes more arguable. And even if you do, Parallel (||) Seq
Scan isn't interesting except for marketing purposes, since its use in
real queries is limited because most queries need to do something else
after the SeqScan, which opens up a fuller discussion of parallel
query which we've never had. My feeling is that releasing only || Seq
Scan would backfire since we don't yet do enough other stuff to be a
full solution.

If you do wish to pursue || Seq Scan, then a working prototype would
help. It allows us to see that there is an open source solution we are
working to solve the problems for. People can benchmark it, understand
the benefits and issues it raises and that would help focus attention
on the problems you are trying to solve in infrastructure. People may
have suggestions on how to solve or avoid those that you hadn't
thought of.

Personally, IMHO, its quite late in the cycle to be releasing such a
prototype since we have only 6-7 weeks until the major patch
submission deadline. Hence my request: can we get *something* good for
users into 9.5, since with respect, infrastructure isn't useful. (If
you take such comments negatively, we might discuss similar issues
with BDR - but we are working to make simple features available in
each release on that project also).

My proposal is we do a parallel index build scan... just as we
discussed earlier, so you would be following the direction set by Dev
Meeting, not just a proposal of mine.

As I mentioned previously when you started discussing shared memory
segments, parallel sort does NOT require shared memory. The only thing
you need to share are files. Split the problem into N pieces, sort
them to produce N files and then merge the files using existing code.
That only applies to large sorts, but then those are the ones you
cared about doing in parallel anyway.

CREATE INDEX has a lock on the table. Worker tasks can be simply
banned from acquiring new locks and doing many other tasks. We can
prevent transaction chains, so both normal and concurrent versions
have no complex combocid or transaction state issues. So your 3
problems come down to nothing that needs to be solved immediately.
Pushdown can be assumed for the functions we use for existing index
types: we can verify this from our internal code, so we are not at
risk there.

So my proposal is we do a parallel index build scan, a limited subset
of parallel seq scan, we put the data from that into a private sort
node, then output a series of files. The plan for this is the
following...

Parallel Sort Merge
Local Sort
Local IndexBuildScanSegment

Local IndexBuildScanSegment scans a portion of the table being
indexed, a "segment" - this is the heart of the division of labour
within the parallel scan.
Local Sort is just a normal sort node except it outputs a file, not a
stream of sorted tuples.
Parallel Sort Merge waits for all Local Sorts to finish and then
merges the inputs using existing sort merge logic.

Now I'm sure you can see there are ways to improve that plan using
shmem. But the above is a workable solution for 9.5 within the time
available. We'll learn much making this happen and it will fuel
everyone for more work in the following release.

For the next release, I suggest that || hash join is more useful goal
than || seq scan since it allows typical queries on big tables to be
optimized, but we can discuss that in more detail another time.

Happy to have a more detailed phone call or meeting to discuss.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-10-29 09:27:50 Re: WIP: Access method extendability
Previous Message Kyotaro HORIGUCHI 2014-10-29 08:47:39 Re: alter user/role CURRENT_USER