Re: Postgres vs other Postgres based MPP implementations

From: Ondrej Ivanič <ondrej(dot)ivanic(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres vs other Postgres based MPP implementations
Date: 2011-11-08 06:49:37
Message-ID: CAM6mie+eR8COxcvpgTQwSW0+vaPWPPk+ZfK6Xx9amtLrdVgaXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

On 8 November 2011 16:58, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
> Which one(s) are you referring to? In what kind of workloads?
>
> Are you talking about Greenplum or similar?

Yes, mainly Geenplum and nCluster (AsterData). I haven't played with
gridSQL and pgpool-II's parallel query mode too much. Queries are
simple aggregations/drill downs/roll ups/... -- mostly heavy read
workloads but OLTP performance is required (like run query over 100m+
dataset in 15 sec)

> Pg isn't very good at parallelism within a single query. It handles lots of
> small queries concurrently fairly well, but isn't as good at using all the
> resources of a box on one big query because it can only use one CPU per
> query and has only a very limited ability to do concurrent I/O on a single
> query too.

Usually CPU is not bottleneck but I it was when I put Pustgres on
FusionIO. The problem is that PG spreads reads too much . iostat
reports very low drive utilisation and very low queue size.

> That said, you should be tuning effective_io_concurrency to match your
> storage; if you're not, then you aren't getting the benefit of the
> concurrent I/O that PostgreSQL *is* capable of. You'll also need to have
> tweaked your shared_buffers, work_mem etc appropriately for your query
> workload.

I've played with effective_io_concurrency (went thru entire range: 1,
2, 5, 10, 20, 50, 100, 200, 500, 1000) but nothing improved. Is there
a way to get PG backed IO stats using stock CentOS (5.7) kernel and
tools? (I can't change my env easily)

> queries it won't perform all that well without something to try to
> parallelise the queries outside Pg.

yeah, I have one moster query which needs half a day to finish but it
finishes in less than two hours on the same hw if is executed in
parallel...

> I'm not at all surprised by that. PostgreSQL couldn't use the full resources
> of your system when it was expressed as just one query.

This is very interesting area to work in but my lack of C/C++ and PG
internals puts me out of the game :)

--
Ondrej Ivanic
(ondrej(dot)ivanic(at)gmail(dot)com)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John R Pierce 2011-11-08 07:08:15 Re: Www emulator
Previous Message pasman pasmański 2011-11-08 06:25:21 Re: Www emulator