Re: strategies for dealing with frequently updated tables

From: Dave Vitek <dvitek(at)grammatech(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: strategies for dealing with frequently updated tables
Date: 2012-02-28 20:09:43
Message-ID: 4F4D3487.2030907@grammatech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2/27/2012 10:29 PM, Andy Colson wrote:
> On 02/27/2012 06:55 PM, Dave Vitek wrote:
>> Hi all,
>>
>> I have a relation where a tuple typically undergoes a lifecycle
>> something like:
>> 1) Created
>> 2) Updated maybe thousands of times (no updates to indexed columns
>> though)
>> 3) Rarely or never modified again
>>
>> The following query takes about 100 minutes (3 seconds per tuple):
>> SELECT count(id) from T
>>
>> (2) is causing a lot of auto vacuum/analyze activity (which is OK).
>> HOT seems to get used for about 90% of the updates, but there are
>> enough updates that don't use it to cause issues. I'm using pg
>> version 9.0.3 on a 32-bit windows xp machine with 3GB of RAM. The
>> .conf file is using default settings.
>>
>> Table Info:
>> n_live_tup 1799
>> n_dead_tup 191
>> pg_relation_size 2343mb
>> indexsize 10mb
>> toastsize 552kb
>> toastindexsize 16kb
>>
>> This reports under 10kb for most tuples:
>> psql -A -c "select * from T where id=123" | wc -c
>>
>> auto-vacuum and auto-analyze both ran yesterday with default
>> settings. There are only one or two new tuples since yesterday. The
>> database is fairly old (was probably created using pg_restore about
>> when 9.0.3 came out).
>>
>> Here is the output from VACUUM VERBOSE:
>> INFO: vacuuming "public.T"
>> 0 pages are entirely empty.
>> CPU 0.00s/0.00u sec elapsed 0.56 sec.
>> VACUUM
>>
>>
>> I imagine CLUSTERing the table would make things happier, but I'm
>> hoping for a permanent solution that avoids periodic downtime.
>>
>> One thought was to partition so rows that are still changing live in
>> a separate table from the more stable rows. I imagine the cardinality
>> of (2) rarely exceeds 10. Can I still get into performance trouble
>> with a table that small after enough updates? Anyone have other ideas?
>>
>> - Dave
>>
>
> I'm a little confused... whats the problem?
Sequential scans of this table seem to be pathologically slow for a
table with only 2000 rows.
>
>
>> The following query takes about 100 minutes (3 seconds per tuple):
>> SELECT count(id) from T
>
> so table T has 18,000 rows? (100 * 60 * 3)
100 minutes * 60 (sec/min) / 3 (sec/tuple) = 2000 tuples
>
>> Table Info:
>> n_live_tup 1799
>
> Oh, so the table has 1,800 rows?
Correct.
>
>
>> This reports under 10kb for most tuples:
>> psql -A -c "select * from T where id=123" | wc -c
>
> 10 kb? what? So that's one row? And that one row is about 10,000 bytes?
Correct. I mentioned this since pg_relation_size would suggest that one
tuple is using over a meg including overhead, fragmentation, and free
space. So only about 1% of the space is used by the payload, which
might be a red flag?
>
>
> So you have a large row, who knows how many, and the problem is
> "SELECT count(id) from T" is slow?
Correct. I think anything requiring a sequential scan of the table takes
100 minutes to run. However, I wanted to be careful about drawing
conclusions and just present data.
>
> You say:
>
>> 2) Updated maybe thousands of times (no updates to indexed columns
>> though)
>
> Are the updates slow? How slow, and how fast to you need them to be?
The updates are perfectly responsive.
> Is ID a unique key? Or does "update ... where id=123" update more
> than one row?
It's the primary key, it's unique, and it updates a single row.
>
> What does the actual table look like? Looks like it has lots of
> indexes, but I'm just guessing! Are you sure you need all those
> indexes? updating a row has to update all those indexes.. I assume,
> but you give no details.
It has 5 indices, which are warranted, but writes are performing fine --
it's just sequential scans that seem to be pathological. Index scans
are fine too.
>
> Also... in the end, you're on windows, so you probably cant tell me if
> you are IO bound, but you probably need a better IO subsystem. And
> stop running on raid 5 (which I'm assuming since you don't mention
> anything)
I think this is typical consumer hardware from maybe 3 years ago.
Copying a 1G file on the disk takes... 11 minutes when the machine is
under heavy load. There are many instances of this database schema+app,
so making it work well on low end hardware is important.
>
> Wow. Re-reading that I seem to be angry. Please don't take it that
> way. Blood sugar is high, makes me angry person. Others will be
> along to scold me. Bad Andy! Bad! Go to your room until you find
> your happy face.
No problem, I should have been clearer about sequential scans being the
issue.
>
> -Andy
> ps: based on you're email gramma, I almost changed all your to you're,
> just to see if it annoyed you. Sounds like you work for grammar
> technology.
How embarrassing! After writing code all day I tend to have some
natural language issues.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bartosz Dmytrak 2012-02-28 20:24:13 Re: Stored Procedure Record Updates using For Loops - Postgres 8.1
Previous Message Lummis, Patrick J 2012-02-28 20:08:38 Stored Procedure Record Updates using For Loops - Postgres 8.1