Re: Snapshot synchronization, again...

From: Joachim Wieland <joe(at)mcknight(dot)de>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Snapshot synchronization, again...
Date: 2011-02-28 01:33:44
Message-ID: AANLkTikAjgAu6nLfv7UmJ-AhXaf8fkFs3G5zWU_JBndY@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 27, 2011 at 3:04 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Why exactly, Heikki do you think the hash is more troublesome?
> It just feels wrong to rely on cryptography just to save some shared memory.

Remember that it's not only about saving shared memory, it's also
about making sure that the snapshot reflects a state of the database
that has actually existed at some point in the past. Furthermore, we
can easily invalidate a snapshot that we have published earlier by
deleting its checksum in shared memory as soon as the original
transaction commits/aborts. And for these two a checksum seems to be a
good fit. Saving memory then comes as a benefit and makes all those
happy who don't want to argue about how many slots to reserve in
shared memory or don't want to have another GUC for what will probably
be a low-usage feature.

> I realize that there are conflicting opinions on this, but from user
> point-of-view the hash is just a variant of the idea of passing the snapshot
> through shared memory, just implemented in an indirect way.

The user will never see the hash, why should he bother? The user point
of view is that he receives data and can obtain the same snapshot if
he passed that data back. This user experience is no different from
any other way of passing the snapshot through the client. And from the
previous discussions this seemed to be what most people wanted.

>> And how
>> could we validate/invalidate snapshots without the checksum (assuming
>> the through-the-client approach instead of storing the whole snapshot
>> in shared memory)?
>
> Either you accept anything that passes sanity checks, or you store the whole
> snapshot in shared memory (or a temp file). I'm not sure which is better,
> but they both seem better than the hash.

True, both might work but I don't see a real technical advantage over
the checksum approach for any of them, rather the opposite.

Nobody has come up with a use case for the accept-anything option so
far, so I don't see why we should commit ourselves to this feature at
this point, given that we have a cheap and easy way of
validating/invalidating snapshots. And I might be just paranoid but I
also fear that someone could raise security issues for the fact that
you would be able to request an arbitrary database state from the past
and inspect changes of other peoples' transactions. We might want to
allow that later though and I realize that we have to allow it for a
standby server that would take over a snapshot from the master anyway,
but I don't want to add this complexity into this first patch. I want
however be able to possibly allow this in the future without touching
the external API of the feature.

And for the tempfile approach, I don't see that the creation and
removal of the temp file is any less code complexity than flipping a
number in shared memory. Also it seemed that people rather wanted to
go with the through-the-client approach because it seems to be more
flexible.

Maybe you should just look at it as a conservative accept-anything
approach that uses a checksum to allow only snapshots that we know
have existed and have been published. Does the checksum still look so
weird from this perspective?

Joachim

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-02-28 03:07:01 Re: Native XML
Previous Message Tatsuo Ishii 2011-02-28 00:32:06 Re: [OT] Christchurch