Re: Repeatable read and serializable transactions see data committed after tx start

From: Álvaro Hernández Tortosa <aht(at)8Kdata(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Repeatable read and serializable transactions see data committed after tx start
Date: 2014-11-06 00:17:24
Message-ID: 545ABE14.4090508@8Kdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 06/11/14 00:42, Robert Haas wrote:
> On Mon, Nov 3, 2014 at 2:14 PM, Álvaro Hernández Tortosa <aht(at)8kdata(dot)com> wrote:
>> Given a transaction started with "BEGIN.... (REPEATABLE READ |
>> SERIALIZABLE)", if a concurrent session commits some data before *any* query
>> within the first transaction, that committed data is seen by the
>> transaction. This is not what I'd expect.
> I think the problem is with your expectation, not the behavior.

But my expectation is derived from the documentation:

"The Repeatable Read isolation level only sees data committed before the
transaction began;"

In PostgreSQL you will see data committed after a BEGIN ...
(REPEATABLE READ | SERIALIZABLE) statement (only before the first
query). And it's reasonable to "think" that transaction begins when you
issue a BEGIN statement. It's also reasonable to think this way as:

- now() is frozen at BEGIN time, as Nasby pointed out
- pg_stat_activity says that the transaction is started, as Kevin mentioned

So if the behavior is different from what the documentation says
and what other external indicators may point out, I think at least
documentation should be clear about this precise behavior, to avoid
confusing users.

> Serializable means that the transactions execute in such a fashion
> that their parallel execution is equivalent to some serial order of
> execution. It doesn't say that it must be equivalent to any
> particular order that you might imagine, such as the order in which
> the transactions commit, or, as you propose, the order in which they
> begin. Generally, I think that's a good thing, because transaction
> isolation is expensive: even at repeatable read, but for the need to
> provide transaction isolation, there would be no such thing as bloat.
> The repeatable read and serializable levels add further overhead of
> various kinds. We could provide a super-duper serializable level that
> provides even tighter guarantees, but (1) I can't imagine many people
> are keen to make the cost of serialization even higher than it already
> is and (2) if you really want that behavior, just do some trivial
> operation sufficient to cause a snapshot to be taken immediately after
> the BEGIN.
>

I'm not really asking for a new isolation level, just that either
BEGIN really freezes (for repeatable read and serializable) or if that's
expensive and not going to happen, that the documentation clearly states
the fact that freeze starts at first-query-time, and that if you need to
freeze before your first real query, you should do a dummy one instead
(like SELECT 1). Also, if this "early freeze" is a performance hit -and
for that reason BEGIN is not going to be changed to freeze- then that
also should be pointed out in the documentation, so that users that
freeze early with "SELECT 1"-type queries understand that.

Regards,

Álvaro

--
Álvaro Hernández Tortosa

-----------
8Kdata

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Steve Singer 2014-11-06 00:31:52 Re: tracking commit timestamps
Previous Message Álvaro Hernández Tortosa 2014-11-06 00:04:34 Re: Repeatable read and serializable transactions see data committed after tx start