Re: Re : Solaris Performance - Profiling

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark kirkwood <markir(at)slingshot(dot)co(dot)nz>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Re : Solaris Performance - Profiling
Date: 2002-04-02 16:02:51
Message-ID: 25091.1017763371@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Mark kirkwood <markir(at)slingshot(dot)co(dot)nz> writes:
> I am a little concerned that the cpu time recored by the profiler seems
> to be a bit on the short side at 4047.53 seconds (67 minutes), as
> opposed to 119.2 for the backend itself. Are these two numbers supposed
> to match closely ? (they did for shorter queries..)

Hmm. Where exactly did you get those numbers from? I see 4118.54 sec
as the total CPU accounted for in the profile.

Some versions of gprof hide the time spent in the profiler subroutine,
leading to discrepancies between the time accounted for in the profile
and the actual process CPU time. However, yours doesn't seem to be one
of them --- internal_mcount is right there.

> Anyway in the hope that this captured output is interesting enough, here
> it is...or rather here is some of it (biggish file), so I am leaving a
> complete copy under :
> http://homepages.slingshot.co.nz/~markir/tar/solaris/
> Let me know if further information is likely to be helpful. I have
> profiled the same query (on the same dataset) on Freebsd and have the
> output available at :
> http://homepages.slingshot.co.nz/~markir/tar/freebsd/

Hmm. Assuming that the profile data is trustworthy and the queries are
indeed the same (did you compare EXPLAIN output?), it seems that
Solaris' problem is a spectacularly bad qsort() implementation.
The FreeBSD profile shows:

1 qsort [29]
0.00 0.31 1/2 load_password_cache [44]
0.00 0.31 1/2 tuplesort_performsort [45]
[29] 9.8 0.01 0.62 2+1 qsort [29]
0.01 0.62 148033/148033 qsort_comparetup [30]
1 qsort [29]

where Solaris has

0.00 1387.23 1/2 load_password_cache [18]
0.00 1387.23 1/2 tuplesort_performsort [19]
[4] 68.5 0.00 2774.46 2 qsort [4]
19.73 2754.58 1/1 qst [5]
0.01 0.14 88012/1680561053 qsort_comparetup [6]
0.00 0.00 1/61473 .umul [154]

-----------------------------------------------

19.73 2754.58 1/1 qsort [4]
[5] 68.5 19.73 2754.58 1 qst [5]
167.74 2586.81 1680473041/1680561053 qsort_comparetup [6]
0.02 0.00 59630/59703 .udiv [114]
0.01 0.00 59630/61473 .umul [154]

and all the rest of the top profile entries are explained by the fact
that qsort_comparetup is called 1.68 billion times instead of 148K
times.

Can these really be the same dataset? Can Solaris' qsort really be that
outstandingly incompetent? How many rows are actually being sorted
here, anyway?

It might be entertaining to snarf a qsort off the net (from glibc,
perhaps) and link it into Postgres to see if you get better results.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message David Esposito 2002-04-02 16:12:10 Re: Core dump on PG 7.1.3
Previous Message Doug McNaught 2002-04-02 15:56:36 Re: v7.2.1 Released: Critical Bug Fix

Browse pgsql-hackers by date

  From Date Subject
Next Message Reinoud van Leeuwen 2002-04-02 16:09:26 status of IPv6 Support for INET/CIDR types
Previous Message Doug McNaught 2002-04-02 15:56:36 Re: v7.2.1 Released: Critical Bug Fix