Re: Vacuum dead tuples that are "between" transactions

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Paul Tillotson <spam1011(at)adelphia(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Vacuum dead tuples that are "between" transactions
Date: 2006-02-28 16:22:00
Message-ID: 20060228162200.GQ82012@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 28, 2006 at 01:22:35PM +0000, Simon Riggs wrote:
> Paul, you mention serializable transactions, but your root issue seems
> to be that VACUUM clears up less rows when pg_dump is running, yes? Have
> you tried using an on-line hot backup with archive_command set (PITR)?
> That doesn't suffer from the same issue and is faster too.
>
> OTOH a few hackers discussed this recently and found that nobody used
> serializable transactions (ST) except during pg_dump. It seems a
> reasonable observation that *most* STs are pg_dumps, or at very least:
> the longest running STs are pg_dumps. So rather than changing all
> transaction modes, or even special-casing STs, why not put in some
> infrastructure to cope specifically with the problems that pg_dump can
> cause?

While it's not currently serialized, another big candidate IMO is vacuum
itself. Vacuuming a large table in a database that also sees heavy
update activity can be a real nightmare, because dead space piles up in
the updated tables while the long vacuum is running. Although there's
probably any number of ways that this problem could be addressed, making
vacuum a serialized transaction (which shouldn't be an issue, afaik) and
creating a generic framework that optimizes for that case would win in
more than one place.

Also, does this really only apply to serialized transactions? As the OP
stated, if a row couldn't possibly exist to a specific (old)
transaction, it should be safe to vacuum it...

> A general facility that would allow STs to identify which tables they
> would/would not touch again could be used by pg_dump to advertise useful
> information. That information could then be picked up by a VACUUM: when
> locking to get xmin it would see an ST, then retrieve the information to
> allow it to work out a per-table xmin. Non-VACUUM transactions would
> ignore any special ST information, causing very low overhead for normal
> operation (checking whether each current transaction was an ST, which
> mostly will be predicted correctly as "no" by the CPU).
>
> You could take that further and get pg_dump to use a list file like
> pg_restore. You would then be able to *where possible* alter the
> sequence of data dumping so that heavily updated tables were dumped
> first so the dumping ST could then advertise "no further access" to
> particular tables. VACUUMs could then proceed as if the ST were not
> there at all.
>
> Or maybe at least the idea of some special case ST behaviour might be
> worthy of some thought.
>
> I've no intention of working on this myself, especially since PITR
> provides an alternate backup solution anyway (even in combination with
> other techniques), but the idea seems worth recording for others to
> discuss.
>
> Best Regards, Simon Riggs
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2006-02-28 16:28:59 Re: Vacuum dead tuples that are "between" transactions
Previous Message Magnus Hagander 2006-02-28 16:21:31 Re: character encoding in StartupMessage