Re: Index creation takes more time?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Herouth Maoz <herouth(at)unicell(dot)co(dot)il>
Cc: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, pgsql-general(at)postgresql(dot)org, tv(at)fuzzy(dot)cz
Subject: Re: Index creation takes more time?
Date: 2012-09-18 17:19:27
Message-ID: CAMkU=1yNA3zp1FcFdXDeXd4LN1Lfpr-R1B2h8pyAAo1bGYw2Lw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Sep 18, 2012 at 1:13 AM, Herouth Maoz <herouth(at)unicell(dot)co(dot)il> wrote:
> I think you hit the nail right on the head when you asked:
>
>
>
>> I wonder if they have different encoding/collations.
>
> [headdesk]Of course. One of the requirements of the upgrade was to change
> the database encoding to unicode, because previously it was in an 8-bit
> encoding and we couldn't handle international text, which has become an
> absolute necessity. So when I restored the database, I took care to create
> it in unicode first:

> So, I must, at this point, draw the conclusion that string comparison is a
> much, much heavier task in utf-8 than it is in an 8-bit encoding - or that
> the collation is the problem.
>
> Running a different test, which involves string comparison, shows a bigger
> discrepancy:
>
> select count( foo ), foo from ( select cast(random() as varchar(14)) >
> cast(random() as varchar(14)) as foo
> from generate_series (1,100000000)) asdf
> group by foo;
>
> PC: Time: 308152.090 ms
> Server: Time: 499631.553 ms

I think the one below will show an even larger discrepancy. You are
doing 2 casts for each comparison,
so I think the casts overhead will dilute out the comparison.

select count(distinct foo) from ( select cast(random() as varchar(14)) as foo
from generate_series (1,100000000)) asdf;

> Finally, I created a test table, as you asked:
>
>
>> create table foo as select msisdn,sme_reference from
>> sms.billing__archive limit 1000000;
>
> Then I created an index on the msisdn and sme_reference columns together.
> 99% of the data in the msisdn field consist of 11-digit phone numbers.
> Result:
>
> PC: 5792.641 ms
> Server: 23740.470 ms
>
> Huge discrepancy there.

try:
create index ON foo (msisdn COLLATE "C", sme_reference) ;

This can only be done on 9.1 server, as that feature is new to that
release. It should be much faster to create than the index with
default collation.

(or change the collation of msisdn column definition, rather than just
in the index).

This assumes you just need the index for equality, not for some
precise locale-specific ordering (which for phone numbers seems like a
safe bet).

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Eden Cardim 2012-09-18 18:20:31 Re: Column aliases in WHERE clauses
Previous Message Rob Sargent 2012-09-18 16:11:26 Re: Time-based trigger