Re: REINDEX takes half a day (and still not complete!)

From: Jim Nasby <jim(at)nasby(dot)net>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: REINDEX takes half a day (and still not complete!)
Date: 2011-05-04 13:28:03
Message-ID: 097E9E9A-BD13-4DDB-965E-EEFF80AB94C3@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Apr 30, 2011, at 9:34 AM, Kenneth Marshall wrote:
>> I suppose that's what I am going to do on a periodic basis from now
>> on. There is a lot of DELETE/UPDATE activity. But I wonder if the
>> vacuum stuff really should do something that's similar in function?
>> What do the high-end enterprise folks do -- surely they can't be
>> dumping/restoring every quarter or so....or are they?
>>
>> Anyway, many many thanks to the lovely folks on this list. Much appreciated!
>>
>
> The autovacuum and space management in 9.0 is dramatically more effective
> and efficient then that of 8.2. Unless you have an odd corner-case there
> really should be no reason for a periodic dump/restore. This is not your
> grandmother's Oldsmobile... :)

In 10+ years of using Postgres, I've never come across a case where you actually *need* to dump and restore on a regular basis. However, you can certainly run into scenarios where vacuum simply can't keep up. If your restored database is 1/3 the size of the original then this is certainly what was happening on your 8.2 setup.

As Kenneth mentioned, 9.0 is far better in this regard than 8.2, though it's still possible that you're doing something that will give it fits. I suggest that you run a weekly vacuumdb -av, capture that output and run it through pgFouine. That will give you a ton of useful information about the amount of bloat you have in each table. I would definitely look at anything with over 20% bloat.

BTW, in case you're still questioning using Postgres in an enterprise setting; all of our production OLTP databases run on Postgres. The largest one is ~1.5TB and does over 650TPS on average (with peaks that are much higher). Unplanned downtime on that database would cost us well over $100k/hour, and we're storing financial information, so data quality issues are not an option (data quality was one of the primary reasons we moved away from MySQL in 2006). So yes, you can absolutely run very large Postgres databases in a high-workload environment. BTW, that's also on version 8.3.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Denis de Bernardy 2011-05-04 13:40:50 row estimate very wrong for array type
Previous Message Jim Nasby 2011-05-04 13:05:35 Re: amazon ec2