Re: REINDEX takes half a day (and still not complete!)

From: Phoenix <phoenix(dot)kiula(at)gmail(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: REINDEX takes half a day (and still not complete!)
Date: 2011-04-17 16:59:44
Message-ID: BANLkTi=j-+Cauczk-q+pJuASp_PPOjXgEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks Scott.

I have shared huge amounts of info in my emails to Merlin and you.
Intentionally not shared in public. Apologies if you are feeling
tired.

The reason I need to REINDEX is because a simple SELECT query based on
the index column is taking ages. It used to take less than a second. I
want to make sure that the index is properly in place, at least.

We went through some BLOAT reports. Apparently Merlin told me there's
no significant bloat.

A manual VACUUM right now takes ages too. AUTOVACUUM settings are below.

It's a RAID 1 setup. Two Raptor 10000rpm disks.

TOP does not show much beyond "postmaster". How should I use TOP and
what info can I give you? This is what it looks like:

14231 root 18 0 4028 872 728 R 93.8 0.0 28915:37
exim_dbmbuild
11001 root 25 0 4056 864 716 R 93.8 0.0 23111:06
exim_dbmbuild
16400 root 25 0 4824 864 720 R 92.5 0.0 33843:52
exim_dbmbuild
4799 postgres 15 0 532m 94m 93m D 0.7 1.2 0:00.14
postmaster
12292 nobody 15 0 48020 14m 5088 S 0.7 0.2 0:00.06 httpd
12943 root 17 0 2828 1224 776 R 0.7 0.0 0:00.04 top
7236 mysql 16 0 224m 64m 3692 S 0.3 0.8 26:43.46 mysqld
31421 postgres 15 0 530m 12m 12m S 0.3 0.2 0:03.08
postmaster
31430 postgres 15 0 10456 576 224 S 0.3 0.0 0:00.08
postmaster
955 postgres 15 0 532m 91m 90m S 0.3 1.1 0:00.15
postmaster
1054 postgres 15 0 532m 196m 195m S 0.3 2.4 0:00.37
postmaster
1232 postgres 15 0 532m 99m 98m D 0.3 1.2 0:00.27
postmaster
1459 postgres 15 0 532m 86m 85m S 0.3 1.1 0:00.12
postmaster
4552 postgres 15 0 532m 86m 85m S 0.3 1.1 0:00.08
postmaster
7187 postgres 15 0 532m 157m 155m S 0.3 1.9 0:00.19
postmaster
7587 postgres 15 0 532m 175m 173m D 0.3 2.2 0:00.23
postmaster
8131 postgres 15 0 532m 154m 152m S 0.3 1.9 0:00.15
postmaster
9473 nobody 16 0 48268 15m 5800 S 0.3 0.2 0:00.34 httpd
9474 nobody 15 0 48096 14m 5472 S 0.3 0.2 0:00.27 httpd
10688 nobody 16 0 0 0 0 Z 0.3 0.0 0:00.20 httpd
<defunct>
12261 nobody 15 0 47956 13m 4296 S 0.3 0.2 0:00.08 httpd
12278 nobody 15 0 47956 13m 4052 S 0.3 0.2 0:00.04 httpd
12291 nobody 15 0 47972 14m 4956 S 0.3 0.2 0:00.07 httpd
12673 nobody 15 0 47912 13m 4180 S 0.3 0.2 0:00.02 httpd
12674 nobody 15 0 47936 13m 4924 S 0.3 0.2 0:00.02 httpd
12678 nobody 16 0 47912 13m 4060 S 0.3 0.2 0:00.01 httpd
12727 nobody 15 0 47912 13m 4024 S 0.3 0.2 0:00.03 httpd
12735 nobody 15 0 47912 13m 4144 S 0.3 0.2 0:00.02 httpd

VMSTAT 10 shows this:

r b swpd free buff cache si so bi bo in cs us sy id wa
3 14 99552 17900 41108 7201712 0 0 42 11 0 0 8 34 41 16
2 17 99552 16468 41628 7203012 0 0 1326 84 1437 154810 7 66 12 15
3 7 99476 16796 41056 7198976 0 0 1398 96 1453 156211 7 66 21 6
3 17 99476 17228 39132 7177240 0 0 1325 68 1529 156111 8 65 16 11

The results of "iostat -xd 10" is:

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.24 24.55 9.33 4.41 111.31 231.75 55.65 115.88
24.97 0.17 12.09 6.67 9.17
sdb 0.06 97.65 2.21 3.97 91.59 389.58 45.80 194.79
77.84 0.06 9.95 2.73 1.69
sdc 1.46 62.71 187.20 29.13 132.43 311.72 66.22
155.86 2.05 0.36 1.65 1.12 24.33

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 7.41 0.30 3.50 2.40 87.29 1.20 43.64
23.58 0.13 32.92 10.03 3.81
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdc 0.00 18.32 158.26 4.10 2519.32 180.98 1259.66
90.49 16.63 13.04 79.91 6.17 100.11

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0.00 6.21 0.00 1.40 0.00 60.86 0.00 30.43
43.43 0.03 20.07 15.00 2.10
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00
sdc 0.10 10.31 159.06 2.50 2635.44 101.70 1317.72
50.85 16.94 12.82 79.44 6.20 100.12

8GB memory in total. 1GB devoted to PGSQL during these operations.
Otherwise, my settings are as follows (and yes I did make the vacuum
settings more aggressive based on your email, which has had no
apparent impact) --

max_connections = 350
shared_buffers = 500MB
effective_cache_size = 1250MB
max_fsm_relations = 1500
max_fsm_pages = 950000
work_mem = 100MB
maintenance_work_mem = 200MB
temp_buffers = 4096
authentication_timeout = 10s
ssl = off
checkpoint_warning = 3600
random_page_cost = 1

What else can I share?

Thanks much for offering to help.

On Sun, Apr 17, 2011 at 11:44 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
> On Sun, Apr 17, 2011 at 9:30 AM, Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com> wrote:
>> Sorry, rejuvenating a thread that was basically unanswered.
>>
>> I closed the database for any kinds of access to focus on maintenance
>> operations, killed all earlier processes so that my maintenance is the
>> only stuff going on.
>>
>> REINDEX is still taking 3 hours -- and it is still not finished!
>>
>> Similarly, if I cancel the REINDEX and issue a VACUUM ANALYZE VERBOSE,
>> this too seems to just hang there on my big table.
>>
>> I changed the maintenance_work_men to 2GB for this operation. It's
>> highly worrisome -- the above slow times are with 2GB of my server
>> dedicated to Postgresql!!!!
>>
>> Surely this is not tenable for enterprise environments? I am on a
>> 64bit RedHat server with dual CPU Intel Woodcrest or whatever that was
>> called. Postgres is 8.2.9.
>>
>> How do DB folks do this with small maintenance windows? This is for a
>> very high traffic website so it's beginning to get embarrassing.
>>
>> Would appreciate any thoughts or pointers.
>
> Upgrade to something more modern than 8.2.x.  Autovacuum was still
> very much in its infancy back then.  9.0 or higher is a good choice.
> What do iostat -xd 10 and vmstat 10 and top say about these processes
> when they're running.  "It's taking a really long time and seems like
> it's hanging" tells us nothing useful.  Your OS has tools to let you
> figure out what's bottlenecking your operations, so get familiar with
> them and let us know what they tell you.  These are all suggestions I
> made before which you have now classified as "not answering your
> questions" so I'm getting a little tired of helping you when you don't
> seem interested in helping yourself.
>
> What are your vacuum and autovacuum costing values set to?  Can you
> make vacuum and / or autovacuum more aggresive?
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alan Hodgson 2011-04-17 18:13:52 Re: REINDEX takes half a day (and still not complete!)
Previous Message Scott Marlowe 2011-04-17 16:09:45 Re: REINDEX takes half a day (and still not complete!)