Re: Index creation takes more time?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Herouth Maoz <herouth(at)unicell(dot)co(dot)il>
Cc: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, pgsql-general(at)postgresql(dot)org, tv(at)fuzzy(dot)cz
Subject: Re: Index creation takes more time?
Date: 2012-09-18 00:19:06
Message-ID: CAMkU=1z1piiKVy0TbKARJ=28CBXk2hZOwH7QxVJV88xubREcqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Sep 17, 2012 at 4:07 AM, Herouth Maoz <herouth(at)unicell(dot)co(dot)il> wrote:
> Yes, thank you, I did notice it, but I decided to wait a week to the next
> archive schedule, to see if the problem persists, especially since the
> previous time ran with relatively low disk space because we kept the old
> database files around. We have removed them during the week.
>
> Unfortunately, the problem persists.

You changed hardware, pgsql versions, and kernel versions all at the
same time, so that could make it hard to pin down the cause of the
change. Especially if you don't have a test environment you can use
for doing experiments. If not, hopefully you can get away with doing
some tests on the production systems.

> So here is the information I could glean.
>
> First, the variable "maintenance_work_mem" has not been changed between the
> old and new postgresql. In fact, it is commented out, so I assume it's the
> default 16MB for both installations.
>
> The server that runs the 9.1 is generally better and faster than the PC that
> runs the 8.3 (it does every other operation - inserts, updates, selects -
> much faster than the PC). More specifically:
>
> Server running 9.1:
> 3373252k of memory
> Two hard disks, separate for system and database. The database disk is
> 15000RPM, 825G.
> CPU: Xeon, 2.0GHz, 4 cores (or two CPUs with 2 cores, I'm not sure)
>
> PC running 8.3:
> 3073344k of memory
> One SATA hard disk (used for both system and database), 7200RPM, 915G.
> CPU: Pentium dual-core 2.80GHz

What are the minor release numbers? Is is 8.3.9 and 9.1.5?

I'd run a very simple benchmark test on both machines, from psql:

\timing on
set work_mem = 16384;
select count(distinct foo) from (select random() as foo from
generate_series(1,100000000)) asdf;

This is what I get, running both on the same hardware:

8.3.9
Time: 569041.153 ms

9.1.4
Time: 374607.288 ms

So the newer version is faster for me. If 9.1.4 is slower for you,
then the next thing I would do is install 8.3.9 on your new hardware
where the 9.1.x version is running, running on a different port, and
see how it does there on the same query.

>
> In both machines postgreSQL is set up with shared_buffers of 1800M.

Can you give us the full configuration?

http://wiki.postgresql.org/wiki/Server_Configuration

I wonder if they have different encoding/collations.

...

> CREATE INDEX billinga_msisdn_sme_reference ON
> sms.billing__archive(msisdn,sme_reference) ;
...

>
> Statistics collection:
>
> For the sake of experimentation, I dropped and created the
> billinga_msisdn_sme_reference in both machines, timed it, and ran vmstat,
> iostat and sar in the background at intervals of 1 minute.

A case with that many columns is unwieldy as a test case. To make a
simpler test case, what if you do:

create table foo as select msisdn,sme_reference from
sms.billing__archive limit 1000000;

And then index that table (on each system)? Do you still see the
discrepancy, or has it gone away?

It it has gone away, do the same thing again but without the limit,
and see if it is still gone.

> On the PC, the creation of the index took 40 minutes 35 seconds.
>
> The server (9.1) has not finished yet. I set up stats to run for an hour,
> and I'm sending this hour's worth of stats.
>
> I'm attaching the stats files in tarballs. I'm not sure what I'm supposed to
> look at.

In vmstat, you can see that the process is purely CPU limited (user
time, "us", is nearly 25%, that is, 1 out of 4 CPUs, the entire time,
and IO wait time, "wa", is near zero) on the newer hardware.

For the older hardware, it is partially CPU limited ("us" is not near
50, (1 out of 2 CPUs) but also not near zero), and partially IO
limited.

The SAR report show about the same thing, just in a different way.

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Anibal David Acosta 2012-09-18 01:27:20 Re: Windows Services and Postgresql 9.1.3
Previous Message Mike Christensen 2012-09-17 23:44:47 Re: RFE: Column aliases in WHERE clauses