Re: 9.3: load path to mitigate load penalty for checksums

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.3: load path to mitigate load penalty for checksums
Date: 2012-06-13 02:02:35
Message-ID: CA+TgmoabGc3nUGAfMNa9ng8EaskwLf2rCmoJtn==+eaL9bLMyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 12, 2012 at 6:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The part I think is actually hard is how to clean up if the inserting
> xact doesn't reach commit.  I think what we're basically looking at here
> is pushing more cost into that path in order to avoid cost in successful
> cases.  The first design that comes to mind is
>
> (1) the inserting xact remembers which tables it's inserted pre-hinted
> tuples into, and if it has to abort, it first seqscans those tables to
> reset the hint bits;

I don't think we can count on that to be safe in an arbitrarily chosen
abort path. Anything FATAL, for instance. I think we're going to
need to keep track of some kind table-xmin value, representing the
oldest operation on the table that's not cleaned up yet, and make it
autovacuum's job to clean any that precede OldestXmin. If the backend
can clean itself up, great, but there has to be some kind of allowance
for the case where that doesn't happen.

I'm also skeptical about the notion that "scan the whole table" is
going to be a good idea. It really will have to be a full sequential
scan, if we're setting visibility map bits as we go, not just a scan
of pages that are not-all-visible, as vacuum normally does. I think
if we want to go this route, we need to log the TID of every tuple we
write into the heap into some kind of undo fork (or maybe just the
block numbers), so that if the transaction aborts, we (or autovacuum)
can go back and find all of those TIDs and mark the tuples dead
without having to scan through (potentially) terabytes of data.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-06-13 02:06:22 Re: 9.3: load path to mitigate load penalty for checksums
Previous Message Tom Lane 2012-06-13 01:52:56 Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.