Skip site navigation (1) Skip section navigation (2)

Re: strategies for dealing with frequently updated tables

From: Dave Vitek <dvitek(at)grammatech(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: strategies for dealing with frequently updated tables
Date: 2012-02-28 20:09:43
Message-ID: 4F4D3487.2030907@grammatech.com (view raw or flat)
Thread:
Lists: pgsql-general
On 2/27/2012 10:29 PM, Andy Colson wrote:
> On 02/27/2012 06:55 PM, Dave Vitek wrote:
>> Hi all,
>>
>> I have a relation where a tuple typically undergoes a lifecycle 
>> something like:
>> 1) Created
>> 2) Updated maybe thousands of times (no updates to indexed columns 
>> though)
>> 3) Rarely or never modified again
>>
>> The following query takes about 100 minutes (3 seconds per tuple):
>> SELECT count(id) from T
>>
>> (2) is causing a lot of auto vacuum/analyze activity (which is OK). 
>> HOT seems to get used for about 90% of the updates, but there are 
>> enough updates that don't use it to cause issues. I'm using pg 
>> version 9.0.3 on a 32-bit windows xp machine with 3GB of RAM. The 
>> .conf file is using default settings.
>>
>> Table Info:
>> n_live_tup 1799
>> n_dead_tup 191
>> pg_relation_size 2343mb
>> indexsize 10mb
>> toastsize 552kb
>> toastindexsize 16kb
>>
>> This reports under 10kb for most tuples:
>> psql -A -c "select * from T where id=123" | wc -c
>>
>> auto-vacuum and auto-analyze both ran yesterday with default 
>> settings. There are only one or two new tuples since yesterday. The 
>> database is fairly old (was probably created using pg_restore about 
>> when 9.0.3 came out).
>>
>> Here is the output from VACUUM VERBOSE:
>> INFO: vacuuming "public.T"
>> 0 pages are entirely empty.
>> CPU 0.00s/0.00u sec elapsed 0.56 sec.
>> VACUUM
>>
>>
>> I imagine CLUSTERing the table would make things happier, but I'm 
>> hoping for a permanent solution that avoids periodic downtime.
>>
>> One thought was to partition so rows that are still changing live in 
>> a separate table from the more stable rows. I imagine the cardinality 
>> of (2) rarely exceeds 10. Can I still get into performance trouble 
>> with a table that small after enough updates? Anyone have other ideas?
>>
>> - Dave
>>
>
> I'm a little confused... whats the problem?
Sequential scans of this table seem to be pathologically slow for a 
table with only 2000 rows.
>
>
>> The following query takes about 100 minutes (3 seconds per tuple):
>> SELECT count(id) from T
>
> so table T has 18,000 rows?   (100 * 60 * 3)
100 minutes * 60 (sec/min) / 3 (sec/tuple) = 2000 tuples
>
>> Table Info:
>> n_live_tup 1799
>
> Oh, so the table has 1,800 rows?
Correct.
>
>
>> This reports under 10kb for most tuples:
>> psql -A -c "select * from T where id=123" | wc -c
>
> 10 kb?  what? So that's one row?  And that one row is about 10,000 bytes?
Correct.  I mentioned this since pg_relation_size would suggest that one 
tuple is using over a meg including overhead, fragmentation, and free 
space.  So only about 1% of the space is used by the payload, which 
might be a red flag?
>
>
> So you have a large row, who knows how many, and the problem is 
> "SELECT count(id) from T" is slow?
Correct. I think anything requiring a sequential scan of the table takes 
100 minutes to run.  However, I wanted to be careful about drawing 
conclusions and just present data.
>
> You say:
>
>> 2) Updated maybe thousands of times (no updates to indexed columns 
>> though)
>
> Are the updates slow?  How slow, and how fast to you need them to be?
The updates are perfectly responsive.
>   Is ID a unique key? Or does "update ... where id=123" update more 
> than one row?
It's the primary key, it's unique, and it updates a single row.
>
> What does the actual table look like?  Looks like it has lots of 
> indexes, but I'm just guessing!  Are you sure you need all those 
> indexes?  updating a row has to update all those indexes.. I assume, 
> but you give no details.
It has 5 indices, which are warranted, but writes are performing fine -- 
it's just sequential scans that seem to be pathological.  Index scans 
are fine too.
>
> Also... in the end, you're on windows, so you probably cant tell me if 
> you are IO bound, but you probably need a better IO subsystem.  And 
> stop running on raid 5 (which I'm assuming since you don't mention 
> anything)
I think this is typical consumer hardware from maybe 3 years ago.  
Copying a 1G file on the disk takes... 11 minutes when the machine is 
under heavy load.  There are many instances of this database schema+app, 
so making it work well on low end hardware is important.
>
> Wow.  Re-reading that I seem to be angry.  Please don't take it that 
> way.  Blood sugar is high, makes me angry person.  Others will be 
> along to scold me.  Bad Andy!  Bad!  Go to your room until you find 
> your happy face.
No problem, I should have been clearer about sequential scans being the 
issue.
>
> -Andy
> ps: based on you're email gramma, I almost changed all your to you're, 
> just to see if it annoyed you.  Sounds like you work for grammar 
> technology.
How embarrassing!  After writing code all day I tend to have some 
natural language issues.

In response to

Responses

pgsql-general by date

Next:From: Bartosz DmytrakDate: 2012-02-28 20:24:13
Subject: Re: Stored Procedure Record Updates using For Loops - Postgres 8.1
Previous:From: Lummis, Patrick JDate: 2012-02-28 20:08:38
Subject: Stored Procedure Record Updates using For Loops - Postgres 8.1

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group