Re: Bloated pg_shdepend_depender_index

From: adey <adey11(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, pgsql-admin(at)postgresql(dot)org, "Gregory Maxwell" <gmaxwell(at)gmail(dot)com>
Subject: Re: Bloated pg_shdepend_depender_index
Date: 2006-03-25 00:53:19
Message-ID: 1c66bda80603241653o79be7adaq8975e5d6a9498227@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Two questions in this regard please?
1) Is tuple theory not the root of this problem
2) Vacuum does much the same as a traditional database reorg, and online
reorgs are a reality now

1) If I understand tuple theory correctly, copies of rows are created
through normal Postgres processing, that expire after a period, leaving
multiple copies of redundant data to be vacuumed dead, then vacuumed out (or
both). Most databases have been built using one copy of a row with
sophisticated locking control mechanism that Postgres has some of anyway,
and the industry has developed methods and designs to exploit locking to
best advantage. Even with tuples, locking is still evident in Postgres.
OR
2) Can vacuum full not be redesigned to run online without locking tables
and users, like a conventional online reorg, eg: work on 1 data page at a
time instead of locking the whole table with a shorter period at the end to
lock the table and "compress" the remaining populated data pages and release
disk space back to the OS; or one data file at a time, and have vacuum full
*per table* reduce / tidy up the wraparound value, thereby avoiding a full
DB vacuum for longer periods. In this way vacuum can be performed regularly
and be less intrusive. Nowadays 24x7 is more of a reality for systems and we
can't afford to take systems down for many hours to perform regular
maintenance.

(It would be extremely helpful to DBA's with little OS experience or
access to have more automation in PGAdmin, especially task scheduling and
alerting, so SQL can be scheduled in PGAmin instead of crontab, which is
usually a sysadmin function).

On 3/25/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > Am Freitag, 24. März 2006 05:48 schrieb Tom Lane:
> >> Well, the VACUUM FULL algorithm is incapable of shrinking indexes ---
> >> the only way is REINDEX, or something else that reconstructs indexes
> >> from scratch, such as CLUSTER. One of the things we need to look into
> >> is putting more smarts into VACUUM so that it automatically does
> >> something reasonable when faced with extreme cases like these.
>
> > If the user is running VACUUM FULL, he has presumably determined that
> the
> > table is too bloated to be recovered in a graceful way, and quite likely
> the
> > indexes are going to be bloated similarly. So seemingly one might as
> well
> > launch a reindexing on the table after VACUUM FULL has done its thing.
> > Whether that should be automatic is another question but perhaps the
> advice
> > should be documented somewhere?
>
> Actually, I wonder whether VACUUM FULL shouldn't be thrown away and
> replaced by something else entirely. That algorithm only really works
> nicely when just a small percentage of the rows need to be moved to
> re-compact the table --- if you're moving lots of rows, it makes the
> index bloat situation *worse* not better because of the transient need
> for index entries pointing to both copies of moved rows. Lazy VACUUM
> has become the de-facto standard for situations where there's not a huge
> amount of empty space, and so it's not clear where the sweet spot is for
> VACUUM FULL anymore. If you've got enough disk space, a rewrite (like
> CLUSTER or ALTER TABLE) is going to blow the doors off VACUUM FULL,
> let alone VACUUM FULL plus REINDEX. Not to mention that for
> sufficiently huge tables, VACUUM FULL fails outright because it runs out
> of RAM.
>
> We need to fix CLUSTER to make it MVCC-safe (ie, not discard
> recently-dead rows), and it'd be nice to have something like it that
> didn't worry about ordering but just did a seqscan of the source table.
> Then I'd be inclined to recommend that instead of VACUUM FULL for most
> cases of severe bloat.
>
> Unfortunately this all breaks down for shared system catalogs and the
> core (nailed-in) catalogs, because we can't change their relfilenodes
> and so the crash-safe CLUSTER/REINDEX approach doesn't work. We still
> need a new idea or two there.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message ogjunk-pgjedan 2006-03-25 03:22:59 Renaming a sequence?
Previous Message Sriram Dandapani 2006-03-25 00:29:32 WAL file naming convention