Re: Major performance problem after upgrade from 8.3 to 8.4

From: Gerhard Wiesinger <lists(at)wiesinger(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Andreas Kretschmer <akretschmer(at)spamfence(dot)net>
Subject: Re: Major performance problem after upgrade from 8.3 to 8.4
Date: 2010-09-15 18:39:33
Message-ID: alpine.LFD.2.01.1009152031140.22693@bbs.intern
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 15 Sep 2010, Merlin Moncure wrote:

> On Wed, Sep 15, 2010 at 2:32 AM, Gerhard Wiesinger <lists(at)wiesinger(dot)com> wrote:
>> On Tue, 14 Sep 2010, Merlin Moncure wrote:
>>>
>>> np -- this felt particularly satisfying for some reason. btw, I think
>>> you have some more low hanging optimization fruit.  I think (although
>>> it would certainly have to be tested) hiding your attribute
>>> description under keyid is buying you nothing but headaches.  If you
>>> used natural key style, making description primary key of
>>> key_description (or unique), and had log_details have a description
>>> column that directly referenced that column, your subquery:
>>>
>>> (
>>>  SELECT value FROM log_details d WHERE l.id = d.fk_id AND d.fk_keyid =
>>>  (
>>>   SELECT keyid FROM key_description WHERE description =
>>> 'Kesselsolltemperatur'
>>>  )
>>> ) AS Kesselsolltemperatur,
>>>
>>> would look like this:
>>> (
>>>  SELECT value FROM log_details d WHERE l.id = d.fk_id AND
>>> d.description = 'Kesselsolltemperatur'
>>> ) AS Kesselsolltemperatur,
>>>
>>> your index on log_details(fk_id, description) is of course fatter, but
>>> quite precise...does require rebuilding your entire dataset however.
>>> food for thought.
>>
>> I think your suggestion might be slower because the WHERE clause and
>> possible JOINS with BIGINT is much faster (especially when a lot of data is
>> queried) than with a VARCHAR. With the latest query plan key_description is
>> only queried once per subselect which is perfect. I've also chosen that
>> indirection that I can change description without changing too much in data
>> model and all data rows on refactoring.
>
> You're not joining -- you're filtering (and your assumption that
> bigint is always going to be faster is quite debatable depending on
> circumstances). The join is skipped because of the key (yes, it's
> cheap lookup, but w/50 columns each doing it, nothing is cheap).

I know that I'm not JOINing in that case - as discussed I ment possible
JOINs in other query scenarios.

BTW: Latest query plan is also optimal that only the
used columns from the view are evaluated. With the full joined version
all columns where used even when dropped in the result-set, e.g.:
SELECT col1, col2 FROM view1; -- Equivalent to SELECT * FROM view1; as col1, col2 are all colums in that view
SELECT col1 FROM view1; -- less effort with subselects when less columns are needed, joins have same "full view" effort here

Ciao,
Gerhard

--
http://www.wiesinger.com/
>From pgsql-performance-owner(at)postgresql(dot)org Wed Sep 15 15:48:33 2010
Received: from maia.hub.org (maia-5.hub.org [200.46.204.29])
by mail.postgresql.org (Postfix) with ESMTP id 7F05F634580
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>; Mon, 13 Sep 2010 13:32:17 -0300 (ADT)
Received: from mail.postgresql.org ([200.46.204.86])
by maia.hub.org (mx1.hub.org [200.46.204.29]) (amavisd-maia, port 10024)
with ESMTP id 29820-03
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>;
Mon, 13 Sep 2010 16:32:09 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mail-bw0-f46.google.com (mail-bw0-f46.google.com [209.85.214.46])
by mail.postgresql.org (Postfix) with ESMTP id 1253A634253
for <pgsql-performance(at)postgresql(dot)org>; Mon, 13 Sep 2010 13:32:08 -0300 (ADT)
Received: by bwz11 with SMTP id 11so4631169bwz.19
for <pgsql-performance(at)postgresql(dot)org>; Mon, 13 Sep 2010 09:32:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=domainkey-signature:received:mime-version:received:in-reply-to
:references:from:date:message-id:subject:to:cc:content-type;
bh=SrL7EJgmKb4ZXmks4pzWfl5U/NaF4iQL9b5uuzAI9zY=;
b=MbYVt2cy4SheQa3rY10fGMj2aAUSIM0QM0N+gDp9ubgBdqeIi0osny60Aq6y+9X3Pu
2LqxUkr0JFlvs4EdiqKoropSOzCQnlTmFovGKxeiShFz5xvzeIEaXZmb6D3rSzolNqYT
xqwPAfUsSukZ4w9iznPM1qjrX+2eFJjTUtoFA=
DomainKey-Signature: a=rsa-sha1; c=nofws;
d=gmail.com; s=gamma;
h=mime-version:in-reply-to:references:from:date:message-id:subject:to
:cc:content-type;
b=h8xCcEXv5qjPA38c6d7fceJbo1LpCSclqkEpZfI9pMxwEJTBHRCZPKiFTb8E7UTfto
ImSZoiaTFZBF3caY23NEQPZh66+GbxF2nCmJW9SYS+jsWiZkZPGgIdBrEke+V/jiepn2
/l5VMbeqC85lFQCswN0ciUiU78radGvPsbvUc=
Received: by 10.204.60.145 with SMTP id p17mr3407724bkh.56.1284395527479; Mon,
13 Sep 2010 09:32:07 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.204.10.11 with HTTP; Mon, 13 Sep 2010 09:31:47 -0700 (PDT)
In-Reply-To: <20100911020337(dot)GH26232(at)tamriel(dot)snowman(dot)net>
References: <AANLkTinQu1vhokjuAZDd3d2b6oeSqERF4biNGP-k-6UV(at)mail(dot)gmail(dot)com>
<5323(dot)1284170438(at)sss(dot)pgh(dot)pa(dot)us> <20100911020337(dot)GH26232(at)tamriel(dot)snowman(dot)net>
From: Mason Harding <mason(dot)harding(at)gmail(dot)com>
Date: Mon, 13 Sep 2010 09:31:47 -0700
Message-ID: <AANLkTikweUAnRDw_kTfFSc26-cTDtCA5WbFJBGq82Xr8(at)mail(dot)gmail(dot)com>
Subject: Re: Slow SQL lookup due to every field being listed in SORT KEY
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Content-Type: multipart/alternative; boundary=001636c5b4562aa80e049026a395
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=-1.899 tagged_above=-10 required=5
tests=BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001
X-Spam-Level:
X-Archive-Number: 201009/89
X-Sequence-Number: 40292

--001636c5b4562aa80e049026a395
Content-Type: text/plain; charset=ISO-8859-1

Thanks all for your help. I didn't really understand why it was sorting on
every field, but it now makes sense. What I ended up doing was replacing
the

SELECT DISTINCT * FROM .... JOIN ... WHERE ... ORDER BY... LIMIT ...
with

SELECT * FROM ... WHERE id in (SELECT DISTINCT id FROM .... JOIN ... WHERE
... ) ORDER BY... LIMIT ...
This reduced the lookup time down to 19 ms, which is much faster than just
upping the work_mem, as that still took 800ms

Thanks all,
Mason

On Fri, Sep 10, 2010 at 7:03 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> > The reason it's sorting by all the columns is the DISTINCT
>
> You might also verify that you actually need/*should* have the DISTINCT,
> if it's included today.. Often developers put that in without
> understanding why they're getting dups (which can often be due to
> missing pieces from the JOIN clause or misunderstanding of the database
> schema...).
>
> Stephen
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAkyK43kACgkQrzgMPqB3kihX4ACfVboO4jRzFO3hkckdHfrSeAgF
> sysAnjmeoV7BA7uClEY8gXT4nEYhSx0u
> =y556
> -----END PGP SIGNATURE-----
>
>

--001636c5b4562aa80e049026a395
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks all for your help.=A0 I didn&#39;t really understand why it was sort=
ing on every field, but it now makes sense.=A0 What I ended up doing was re=
placing the <br><br>SELECT DISTINCT * FROM .... JOIN ... WHERE ... ORDER BY=
... LIMIT ...<br>

with<br><br>SELECT * FROM ... WHERE id in (SELECT DISTINCT id FROM .... JOI=
N ... WHERE ... ) ORDER BY... LIMIT ...<br>This reduced the lookup time dow=
n to 19 ms, which is much faster than just upping the work_mem, as that sti=
ll took 800ms<br>

<br>Thanks all,<br>Mason<br><br><div class=3D"gmail_quote">On Fri, Sep 10, =
2010 at 7:03 PM, Stephen Frost <span dir=3D"ltr">&lt;<a href=3D"mailto:sfro=
st(at)snowman(dot)net">sfrost(at)snowman(dot)net</a>&gt;</span> wrote:<br><blockquote cla=
ss=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt 0.8ex; border-left: 1px sol=
id rgb(204, 204, 204); padding-left: 1ex;">

<div class=3D"im">* Tom Lane (<a href=3D"mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us">tgl(at)sss(dot)=
pgh.pa.us</a>) wrote:<br>
&gt; The reason it&#39;s sorting by all the columns is the DISTINCT<br>
<br>
</div>You might also verify that you actually need/*should* have the DISTIN=
CT,<br>
if it&#39;s included today.. =A0Often developers put that in without<br>
understanding why they&#39;re getting dups (which can often be due to<br>
missing pieces from the JOIN clause or misunderstanding of the database<br>
schema...).<br>
<font color=3D"#888888"><br>
=A0 =A0 =A0 =A0Stephen<br>
</font><br>-----BEGIN PGP SIGNATURE-----<br>
Version: GnuPG v1.4.9 (GNU/Linux)<br>
<br>
iEYEARECAAYFAkyK43kACgkQrzgMPqB3kihX4ACfVboO4jRzFO3hkckdHfrSeAgF<br>
sysAnjmeoV7BA7uClEY8gXT4nEYhSx0u<br>
=3Dy556<br>
-----END PGP SIGNATURE-----<br>
<br></blockquote></div><br>

--001636c5b4562aa80e049026a395--

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tobias Brox 2010-09-15 19:07:24 Re: locking issue on simple selects?
Previous Message Tom Lane 2010-09-15 13:39:47 Re: locking issue on simple selects?