Re: lazy snapshots?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: lazy snapshots?
Date: 2010-11-04 21:56:46
Message-ID: AANLkTimgZyQ7KqHxdBj30546_3pN_PvrWF22MM8unQ6D@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 20, 2010 at 8:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I'm less than convinced by the hypothesis that most transactions would
>>> avoid taking snapshots in this regime, anyway.  It would only hold up
>>> if there's little locality of reference in terms of which tuples are
>>> getting examined/modified by concurrent transactions, and that's a
>>> theory that seems much more likely to be wrong than right.
>
>> There will certainly be workloads where most transactions acquire a
>> snapshot, but just to take one obvious example, suppose we have a data
>> warehouse where every night we bulk load the day's data, and then we
>> run reporting queries all day.  Except during the overnight bulk
>> loads, there's no concurrent write activity at all, and thus no need
>> for snapshots.
>
> Well, yeah, but in this scenario there's also no contention involved in
> taking snapshots --- there are only readers of ProcArray and (IIRC) they
> only need shared locks on the array.  If you want to make any meaningful
> improvement in this area, you need something that solves the ProcArray
> access contention caused by a heavy mixed read/write transaction load.

There is certainly some improvement to be had even in eliminating
shared ProcArrayLocks, and the work that must be done while holding
them. For example, you have to hold an exclusive lock to end a
transaction, so that's going to compete with any shared lockers.

But suppose we do have a heavy mixed read/write transaction load. Who
is to say that requires a snapshot? It will require a snapshot if one
transactions reads data that has been recently written by another
transaction, but I'm not convinced that'll necessarily happen that
often. For example, suppose you have a table with a million records
and you have 200 concurrent database connections. Each connection
repeatedly starts a transaction where it reads 10 records and then
writes 10 records. When a new transaction starts, it overlaps 199
other transactions; if all of those transactions have done all of
their reads and writes already, there are 3,980 records in the table
for which we'll need a snapshot to determine tuple visibility;
assuming we look at no other tuples (all the accesses using index
scans rather than sequential scans), the chances that we'll need to
take a snapshot are only 1-((1-(3980/1000000))^20) = ~7.7%. That's
pretty good, and of course if you have fewer overlapping transactions
or fewer operations per transaction it gets better very quickly.

Now, of course, if you have a lot of locality of reference, things
aren't going to be nearly so good. If we assume that the accesses are
spread across only 100,000 records instead of 1,000,000, then each
transaction has a better-than-even chance of needing a snapshot.
However, in that situation, you're going to have other contention
problems, too: there's very significant probability that two backends
will actually try to update the same tuple, and one will sleep until
the other commits. So maybe the cost of taking snapshots won't be
the biggest problem in that case anyway (he said hopefully).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2010-11-04 21:57:33 Re: ALTER OBJECT any_name SET SCHEMA name
Previous Message Alvaro Herrera 2010-11-04 21:55:36 Re: ALTER OBJECT any_name SET SCHEMA name