Re: strategies for dealing with frequently updated tables

From: Dave Vitek <dvitek(at)grammatech(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: strategies for dealing with frequently updated tables
Date: 2012-02-29 00:41:39
Message-ID: 4F4D7443.4050905@grammatech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2/28/2012 5:21 PM, Andy Colson wrote:
> On 2/28/2012 2:09 PM, Dave Vitek wrote:
>>>> The following query takes about 100 minutes (3 seconds per tuple):
>>>> SELECT count(id) from T
>>>
>>> so table T has 18,000 rows? (100 * 60 * 3)
>> 100 minutes * 60 (sec/min) / 3 (sec/tuple) = 2000 tuples
>
> Well... math was never my strong point :-)
>
>>> So you have a large row, who knows how many, and the problem is
>>> "SELECT count(id) from T" is slow?
>> Correct. I think anything requiring a sequential scan of the table takes
>> 100 minutes to run. However, I wanted to be careful about drawing
>> conclusions and just present data.
>>>
>>>
>>> -Andy
>>> ps: based on you're email gramma, I almost changed all your to you're,
>>> just to see if it annoyed you. Sounds like you work for grammar
>>> technology.
>> How embarrassing! After writing code all day I tend to have some natural
>> language issues.
>
> No, no, that's not what I meant. Your email address grammatech.com,
> makes it sound like you are doing grammar technology.
>
> Have you checked for table bloat? I thought I saw a sql statement
> someplace that did that.
>
> I'm not sure what it actually means, but this row would make me worry:
>
> INFO: "T": found 0 removable, 1444 nonremovable row versions in 1522
> out of 299964 pages
> DETAIL: 30 dead row versions cannot be removed yet.
>
> Hopefully somebody else is reading along that can offer insight.
>
> Have you always had this problem, or is it something new? Or is
> "select count..." new?
>
> Also, not sure if you have seen it elsewhere, but "select count"'ing
> an entire table is probably one of PG slowest operations. If you
> really need the count you might consider a separate summary table that
> stores the count.
>
> -Andy

Andy,

Thanks for the discussion. The name has more to do with attribute
grammars, iirc. Anyway, I'm sure the table is quite bloated. I tried a
query yesterday from somewhere on the internet that claimed to report
the amount of bloat, but it reported 100.0% bloat, which seems
suspicious given that there is some measurable payload in there. Still,
probably not far from the truth.

The count(id) query is an example. The application almost never does
sequential scans of this table... except when the server is starting and
for one user query that displays a significant portion of the table.
I'd rather avoid creating even more indices to avoid the sequential scan
-- these indices would interfere with HOT updates.

I've made a schema change that moves a large, immutable text column out
of this table. With this change, the remaining columns are simple
scalars (integer, float, timestamp) with the exception of a username and
file path. I'd be pretty shocked if any tuple had more than a kilobyte
of payload now. It eliminates use of TOAST... I'll be clustering and I
guess we'll see what happens over the next year or so.

- Dave

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2012-02-29 01:16:40 Re: Four issues why "old elephants" lack performance: Explanation sought Four issues why "old elephants" lack performance: Explanation sought
Previous Message Andy Colson 2012-02-28 22:21:32 Re: strategies for dealing with frequently updated tables