Re: Don't Thread On Me (PostgreSQL related)

From: Chris Travers <chris(dot)travers(at)gmail(dot)com>
To: Eduardo Morras <nec556(at)retena(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Don't Thread On Me (PostgreSQL related)
Date: 2012-01-27 11:03:48
Message-ID: CAKt_ZftsiKtshbaBr8Fysqpi2yLx11cc0GAwqrmCoHxHfd86aA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Jan 27, 2012 at 1:28 AM, Eduardo Morras <nec556(at)retena(dot)com> wrote:

> At 00:32 27/01/2012, you wrote:
>
> There are cases where intraquery parallelism would be helpful. As far as
>> I understand it, PostgreSQL is the only major, solid (i.e. excluding MySQL)
>> RDBMS which does not offer some sort of intraquery parallelism, and when
>> running queries across very large databases, it might be helpful to be able
>> to, say, scan different partitions simultaneously using different threads.
>> So I think it is wrong to simply dismiss the need out of hand. The thing
>> though is that I am not sure that where this need really comes to the fore,
>> it is typical of single-server instances, and so this brings me to the
>> bigger question.
>>
>> The question in my mind though is a more basic one: How should
>> intraquery parallelism be handled? Is it something PostgreSQL needs to do
>> or is it something that should be the work of an external project like
>> Postgres-XC? Down the road is there value in merging the codebases,
>> perhaps making stand-alone/data/coordination node a compile time option?
>>
>
> I still don't think threads are the solution for this scenary. You can do
> intraquery parallelism with multiprocess easier and safer than with
> multithread. You launch a process with the whole query, it divide the work
> in chunks and assigns them to different process instead of threads. You can
> use shared resources for communicattion between process. When all work is
> done, they pass results to the original process and it join them. The
> principal advantage doing it with process is that if one of the child
> subprocess dies, it can be killed/slained and relaunched without any damage
> to the work of the other brothers, but if you use threads, the whole
> process and all the work done is lost.
>

Well, I am assuming that when anything regarding a query crashes, the work
for that query should be lost so I don't see that as a big issue provided
that you still have one process per session.

The larger issue would be rewriting the backend so that this is safe, and
it would complicate QA. For this reason, I assume for now that this is not
the way to go.

>
> It's not the unique advantage of using process vs threads. Some years ago,
> one of the problems on multi socket servers was with the shared memory and
> communications between the sockets. The inter cpu speed was too much slow
> and latency too much high. Now, we have multi cpus in one socket and faster
> intersocket communications and this is not a problem anymore. Even better,
> the speed and latency communicating 2 or more servers (not sockets or cpus)
> is reaching levels where a postgresql could have a shared memory between
> them, for example using Hypertransport cards or modern FC, and it's easier,
> lot easier, launch a remote process than a remote thread.

But this gets back to my question: are there significant use cases where
intraquery parallelism makes sense where clustering across servers does
not? The reason I ask is that if there are not, then the work that's going
into Postgres-XC would get us there entirely, in a multi-process
(single-threaded), two tiered, network transparent model that would
potentially scale up well.

Best Wishes,
Chris Travers

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Julian v. Bock 2012-01-27 11:45:47 Re: Stange "duplicate key value violates unique constraint" after "delete" at ON UPDATE trigger
Previous Message Eduardo Morras 2012-01-27 09:28:01 Re: Don't Thread On Me (PostgreSQL related)