Re: Postgres configuration for 64 CPUs, 128 GB RAM...

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres configuration for 64 CPUs, 128 GB RAM...
Date: 2007-08-01 18:10:14
Message-ID: C2D61A96.3A533%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Marc,

You should expect that for the kind of OLAP workload you describe in steps 2
and 3 you will have exactly one CPU working for you in Postgres.

If you want to accelerate the speed of this processing by a factor of 100 or
more on this machine, you should try Greenplum DB which is Postgres 8.2
compatible. Based on the overall setup you describe, you may have a hybrid
installation with GPDB doing the reporting / OLAP workload and the other
Postgres databases handling the customer workloads.

- Luke

On 7/24/07 7:38 AM, "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de> wrote:

>
> Hello,
>
> thank you for all your comments and recommendations.
>
> I'm aware that the conditions for this benchmark are not ideal, mostly
> due to the lack of time to prepare it. We will also need an additional
> benchmark on a less powerful - more realistic - server to better
> understand the scability of our application.
>
>
> Our application is based on java and is generating dynamic reports from
> log files content. Dynamic means here that a repor will be calculated
> from the postgres data the first time it is requested (it will then be
> cached). Java is used to drive the data preparation and to
> handle/generate the reports requests.
>
> This is much more an OLAP system then an OLTP, at least for our
> performance concern.
>
>
>
>
> Data preparation:
>
> 1) parsing the log files with a heavy use of perl (regular expressions)
> to generate csv files. Prepared statements also maintain reference
> tables in the DB. Postgres performance is not an issue for this first
> step.
>
> 2) loading the csv files with COPY. As around 70% of the data to load
> come in a single daily table, we don't allow concurrent jobs for this
> step. We have between a few and a few hundreds files to load into a
> single table; they are processed one after the other. A primary key is
> always defined; for the case when the required indexes are alreay built
> and when the new data are above a given size, we are using a "shadow"
> table instead (without the indexes) , build the index after the import
> and then replace the live table with the shadow one.
> For example, we a have a table of 13 GB + 11 GB indexes (5 pieces).
>
> Performances :
>
> a) is there an "ideal" size to consider for our csv files (100 x 10
> MB or better 1 x 1GB ?)
> b) maintenance_work_mem: I'll use around 1 GB as recommended by
> Stefan
>
> 3) Data agggregation. This is the heaviest part for Postgres. On our
> current system some queries need above one hour, with phases of around
> 100% cpu use, alterning with times of heavy i/o load when temporary
> results are written/read to the plate (pgsql_tmp). During the
> aggregation, other postgres activities are low (at least should be) as
> this should take place at night. Currently we have a locking mechanism
> to avoid having more than one of such queries running concurently. This
> may be to strict for the benchmark server but better reflect our current
> hardware capabilities.
>
> Performances : Here we should favorise a single huge transaction and
> consider a low probability to have another transaction requiring large
> sort space. Considering this, is it reasonable to define work_mem being
> 3GB (I guess I should raise this parameter dynamically before running
> the aggregation queries)
>
> 4) Queries (report generation)
>
> We have only few requests which are not satisfying while requiring large
> sort operations. The data are structured in different aggregation levels
> (minutes, hours, days) with logical time based partitions in oder to
> limit the data size to compute for a given report. Moreover we can scale
> our infrastrucure while using different or dedicated Postgres servers
> for different customers. Smaller customers may share a same instance,
> each of them having its own schema (The lock mechanism for large
> aggregations apply to a whole Postgres instance, not to a single
> customer) . The benchmark will help us to plan such distribution.
>
> During the benchmark, we will probably not have more than 50 not idle
> connections simultaneously. It is a bit too early for us to fine tune
> this part. The benchmark will mainly focus on the steps 1 to 3
>
> During the benchmark, the Db will reach a size of about 400 GB,
> simulating 3 different customers, also with data quite equally splitted
> in 3 scheemas.
>
>
>
> I will post our configuration(s) later on.
>
>
>
> Thanks again for all your valuable input.
>
> Marc Mamin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Carlos H. Reimer 2007-08-02 00:26:24 RES: RES: Improving select peformance
Previous Message Jan Dittmer 2007-08-01 17:37:11 Re: select on 1milion register = 6s