Re: Proposal for CSN based snapshots

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Markus Wanner <markus(at)bluegap(dot)ch>
Subject: Re: Proposal for CSN based snapshots
Date: 2014-08-26 19:32:50
Message-ID: 1409081570.2335.391.camel@jeff-desktop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2014-08-26 at 19:25 +0100, Greg Stark wrote:
> I don't immediately see how to make that practical. One thought would
> be to have a list of xids in the page header with their corresponding
> csn -- which starts to sound a lot like Oralce's "Interested
> Transaction List". But I don't see how to make that work for the
> hundreds of possible xids on the page.

I feel like that's moving in the wrong direction. That's still causing a
lot of modifications to a data page when the data is not changing, and
that's bad for a lot of cases that I'm interested in (checksums are one
example).

We are mixing two kinds of data: user data and visibility information.
Each is changed under different circumstances and has different
characteristics, and I'm beginning to think they shouldn't be mixed at
all.

What if we just devised a structure specially designed to hold
visibility information, put all of the visibility information there, and
data pages would only change where there is a real, user-initiated
I/U/D. Vacuum could still clear out dead tuples from the data area, but
it would do the rest of its work on the visibility structure. It could
even be a clever structure that could compress away large static areas
until they become active again.

Maybe this wouldn't work for all tables, but could be an option for big
tables with low update rates.

> The worst case for visibility resolution is you have a narrow table
> that has random access DDL happening all the time, each update is a
> short transaction and there are a very high rate of such transactions
> spread out uniformly over a very large table. That means any given
> page has over 200 rows with random xids spread over a very large range
> of xids.

That's not necessarily a bad case, unless the CLOG/CSNLOG lookup is a
significant fraction of the effort to update a tuple.

That would be a bad case if you introduce scans into the equation as
well, but that's not a problem if the all-visible bit is set.

> Currently the invariant hint bits give us is that each xid needs to be
> looked up in the clog only a more or less fixed number of times, in
> that scenario only once since the table is very large and the
> transactions short lived.

A backend-local cache might accomplish that, as well (would still need
to do a lookup, but no locks or contention). There would be some
challenges around invalidation (for xid wraparound) and pre-warming the
cache (so establishing a lot of connections doesn't cause a lot of CLOG
access).

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-08-26 19:36:20 Re: jsonb format is pessimal for toast compression
Previous Message Andres Freund 2014-08-26 19:27:41 Re: jsonb format is pessimal for toast compression