Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile

From: Sergey Koposov <koposov(at)ast(dot)cam(dot)ac(dot)uk>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Date: 2012-05-31 01:10:29
Message-ID: alpine.LRH.2.02.1205310148440.6351@calx046.ast.cam.ac.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 30 May 2012, Jeff Janes wrote:

>> But the question now is whether there is a *PG* problem here or not, or is
>> it Intel's or Linux's problem ? Because still the slowdown was caused by
>> locking. If there wouldn't be locking there wouldn't be any problems (as
>> demonstrated a while ago by just cat'ting the files in multiple threads).
>
> You cannot have a traditional RDBMS without locking. From your

I understand the need of significant locking when there concurrent writes,
but not when there only reads. But I'm not a RDBMS expert, so that's
maybe that's misunderstanding on my side.

> description of the problem, I probably wouldn't be using a traditional
> database system at all for this, but rather flat files and Perl.

Flat files and perl for 25-50 TB of data over few years is a bit extreme
;)

> Or
> at least, I would partition the data before loading it to the DB,
> rather than trying to do it after.

I intensionally did otherwise, because I thought that PG will
to be much smarter than me in juggling the data I'm ingesting (~ tens of
gig each day), join the appropriate bits of data and then split by
partitions. Unfortunately I see that there are some scalability
issues on the way, which I didn't expect. Those aren't fatal, but slightly
disappointing.

> But anyway, is idt_match a fairly static table? If so, I'd partition
> that into 16 tables, and then have each one of your tasks join against
> a different one of those tables. That should relieve the contention
> on the index root block, and might have some other benefits as well.

No, idt_match is getting filled by multi-threaded copy() and then joined
with 4 other big tables like idt_phot. The result is then split into
partitions. And I was trying different approaches to fully utilize the
CPUs and/or I/O and somehow parallize the queries. That's the
reasoning for somewhat contrived queries in my test.

Cheers,
S

*****************************************************
Sergey E. Koposov, PhD, Research Associate
Institute of Astronomy, University of Cambridge
Madingley road, CB3 0HA, Cambridge, UK
Tel: +44-1223-337-551 Web: http://www.ast.cam.ac.uk/~koposov/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-05-31 01:30:06 Re: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741
Previous Message Devrim GÜNDÜZ 2012-05-31 01:06:50 Re: Uppercase tab completion keywords in psql?