Slow sequential scans on one DB but not another; fragmentation?

From: Stephen Harris <lists(at)spuddy(dot)org>
To: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Slow sequential scans on one DB but not another; fragmentation?
Date: 2007-03-28 13:55:35
Message-ID: 20070328135535.GA12849@pugwash.spuddy.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

This 8.0.8 on Solaris 8. Yes I know; old technologies but we can't
upgrade yet.

I have two identical servers. One is production and overnight we do
a complete dump and scp the results to the other. The other is standby.
It's currently running data that's about 7 months old because we haven't
needed to fail over yet.

I have one specific table ("sweep_users") that has 900,000 rows in prod
and 630,000 on the standby. On the standby a "select count(*) from
sweep_users" takes a couple of seconds. On production it takes... 240
seconds!

Data updates on this table consist of the following meta-logic:

for host in list_of_hosts
delete from sweep_users where hostid=host
for user in users_for_host
insert into sweep_users ....
vacuum analyze sweep_users

I'm at a loss to understand why the production server is so slow. While
the query is running "iostat -x" returns values like (on a striped mirror):

device r/s w/s kr/s kw/s wait actv svc_t %w %b
md30 764.0 0.0 92110.5 0.0 0.0 1.3 1.6 0 87
md30 1666.8 0.0 67254.3 0.0 0.0 1.7 1.0 0 92
md30 844.6 0.4 75716.1 0.3 0.0 1.3 1.6 0 90

The disk on the standby machine only shows 1/20th of that activity.

Now there is a difference in tuning between these values where we tried
to allocate more memory to the database to make queries more likely to be
in RAM (but we could have made a mistake)...

% diff prod standby
< effective_cache_size | 262144
---
> effective_cache_size | 1000
92c92
< maintenance_work_mem | 524288
---
> maintenance_work_mem | 16384
106c106
< random_page_cost | 1.5
---
> random_page_cost | 4
113c113
< shared_buffers | 30000
---
> shared_buffers | 1000
141c141
< work_mem | 20480
---
> work_mem | 1024

The only idea I have is that our update pattern is somehow causing excessive
fragmentation, either at the DB level or the OS file level.

Anyone else have any thoughts?

--

rgds
Stephen

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dmitry Koterov 2007-03-28 14:05:39 How to speedup CHECKPOINTs?
Previous Message Billy Gray 2007-03-28 13:38:33 Some encoding trouble via libpq