Re: cheaper snapshots redux

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Markus Wanner <markus(at)bluegap(dot)ch>
Cc: Jim Nasby <jim(at)nasby(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: cheaper snapshots redux
Date: 2011-08-25 14:48:20
Message-ID: CA+Tgmobb3ZX1h5yQpVfR9ETvM63R8QpUJY72tZCjawYbndvYxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 25, 2011 at 10:19 AM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
> Note, however, that for imessages, I've also had the policy in place
> that a backend *must* consume its message before sending any.  And that
> I took great care for all receivers to consume their messages as early
> as possible.  None the less, I kept incrementing the buffer size (to
> multiple megabytes) to make this work.  Maybe I'm overcautious because
> of that experience.

What's a typical message size for imessages?

>> - a handful of XIDs at most - because, on the average, transactions
>> are going to commit in *approximately* increasing XID order
>
> This assumption quickly turns false, if you happen to have just one
> long-running transaction, I think.  Or in general, if transaction
> duration varies a lot.

Well, one long-running transaction that only has a single XID is not
really a problem: the snapshot is still small. But one very old
transaction that also happens to have a large number of
subtransactions all of which have XIDs assigned might be a good way to
stress the system.

>> So the backend taking a snapshot only needs
>> to be able to copy < ~64 bytes of information from the ring buffer
>> before other backends write ~27k of data into that buffer, likely
>> requiring hundreds of other commits.
>
> You said earlier, that "only the latest snapshot" is required.  It takes
> only a single commit for such a snapshot to not be the latest anymore.
>
> Instead, if you keep around older snapshots for some time - as what your
> description here implies - readers are free to copy from those older
> snapshots while other backends are able to make progress concurrently
> (writers or readers of other snapshots).
>
> However, that either requires keeping track of readers of a certain
> snapshot (reference counting) or - as I understand your description -
> you simply invalidate all concurrent readers upon wrap-around, or something.

Each reader decides which data he needs to copy from the buffer, and
then copies it, and then checks whether any of it got overwritten
before the copy was completed. So there's a lively possibility that
the snapshot that was current when the reader began copying it will no
longer be current by the time he finishes copying it, because a commit
has intervened. That's OK: it just means that, effectively, the
snapshot is taken at the moment the start and stop pointers are read,
and won't take into account any commits that happen later, which is
exactly what a snapshot is supposed to do anyway.

There is a hopefully quite small possibility that by the time the
reader finishes copying it so much new data will have been written to
the buffer that it will have wrapped around and clobbered the portion
the reader was interested in. That needs to be rare.

>> Now, as the size of the snapshot gets bigger, things will eventually
>> become less good.
>
> Also keep configurations with increased max_connections in mind.  With
> that, we not only the snapshots get bigger, but more processes have to
> share CPU time, on avg. making memcpy slower for a single process.

Right. I'm imagining making the default buffer size proportional to
max_connections.

>> Of course even if wraparound turns out not to be a problem there are
>> other things that could scuttle this whole approach, but I think the
>> idea has enough potential to be worth testing.  If the whole thing
>> crashes and burns I hope I'll at least learn enough along the way to
>> design something better...
>
> That's always a good motivation.  In that sense: happy hacking!

Thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-08-25 14:59:37 Re: cheaper snapshots redux
Previous Message Markus Wanner 2011-08-25 14:19:15 Re: cheaper snapshots redux