Re: cheaper snapshots redux

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: 'Robert Haas' <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: cheaper snapshots redux
Date: 2011-09-07 03:06:47
Message-ID: DEA262E072764EBD82E0916334414EE3@china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wanted to clarify my understanding and have some doubts.

>What I'm thinking about instead is using a ring buffer with three pointers:
a start pointer, a stop pointer, and a write pointer. When a transaction
ends, we
>advance the write pointer, write the XIDs or a whole new snapshot into the
buffer, and then advance the stop pointer. If we wrote a whole new
snapshot,
>we advance the start pointer to the beginning of the data we just wrote.
>Someone who wants to take a snapshot must read the data between the start
and stop pointers, and must then check that the write pointer
>hasn't advanced so far in the meantime that the data they read might have
been overwritten before they finished reading it. Obviously,
>that's a little risky, since we'll have to do the whole thing over if a
wraparound occurs, but if the ring buffer is large enough it shouldn't
happen very often.

Clarification
------------------
1. With the above, you want to reduce/remove the concurrency issue between
the GetSnapshotData() [used at begining of sql command execution] and
ProcArrayEndTransaction() [used at end transaction]. The concurrency issue
is mainly ProcArrayLock which is taken by GetSnapshotData() in Shared mode
and by ProcArrayEndTransaction() in X mode.
There may be other instances for similar thing, but this the main thing
which you want to resolve.

2. You want to resolve it by using ring buffer such that readers don't need
to take any lock.

Is my above understanding correct?

Doubts
------------

1. 2 Writers; Won't 2 different sessions who try to commit at same time will
get the same write pointer.
I assume it will be protected as even indicated in one of your replies
as I understood?

2. 1 Reader, 1 Writter; It might be case that some body has written a new
snapshot and advanced the stop pointer and at that point of time one reader
came and read between start pointer and stop pointer. Now the reader will
see as follows:
snapshot, few XIDs, snapshot

So will it handle this situation such that it will only read latest
snapshot?

3. How will you detect overwrite.

4. Won't it effect if we don't update xmin everytime and just noting the
committed XIDs. The reason I am asking is that it is used in tuple
visibility check
so with new idea in some cases instead of just returning from begining
by checking xmin it has to go through the committed XID list.
I understand that there may be less cases or the improvement by your
idea can supesede this minimal effect. However some cases can be defeated.


--
With Regards,
Amit Kapila.

****************************************************************************
***********
This e-mail and attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed
above. Any use of the information contained herein in any way (including,
but not limited to, total or partial disclosure, reproduction, or
dissemination) by persons other than the intended recipient's) is
prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!

-----Original Message-----
From: pgsql-hackers-owner(at)postgresql(dot)org
[mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Robert Haas
Sent: Sunday, August 28, 2011 7:17 AM
To: Gokulakannan Somasundaram
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] cheaper snapshots redux

On Sat, Aug 27, 2011 at 1:38 AM, Gokulakannan Somasundaram
<gokul007(at)gmail(dot)com> wrote:
> First i respectfully disagree with you on the point of 80MB. I would say
> that its very rare that a small system( with <1 GB RAM ) might have a long
> running transaction sitting idle, while 10 million transactions are
sitting
> idle. Should an optimization be left, for the sake of a very small system
to
> achieve high enterprise workloads?

With the design where you track commit-visbility sequence numbers
instead of snapshots, you wouldn't need 10 million transactions that
were all still running. You would just need a snapshot that had been
sitting around while 10 million transactions completed meanwhile.

That having been said, I don't necessarily think that design is
doomed. I just think it's going to be trickier to get working than
the design I'm now hacking on, and a bigger change from what we do
now. If this doesn't pan out, I might try that one, or something
else.

> Second, if we make use of the memory mapped files, why should we think,
that
> all the 80MB of data will always reside in memory? Won't they get paged
out
> by the  operating system, when it is in need of memory? Or do you have
some
> specific OS in mind?

No, I don't think it will all be in memory - but that's part of the
performance calculation. If you need to check on the status of an XID
and find that you need to read a page of data in from disk, that's
going to be many orders of magnitude slower than anything we do with s
snapshot now. Now, if you gain enough elsewhere, it could still be a
win, but I'm not going to just assume that.

As I play with this, I'm coming around to the conclusion that, in
point of fact, the thing that's hard about snapshots has a great deal
more to do with memory than it does with CPU time. Sure, using the
snapshot has to be cheap. But it already IS cheap. We don't need to
fix that problem; we just need to not break it. What's not cheap is
constructing the snapshot - principally because of ProcArrayLock, and
secondarily because we're grovelling through fairly large amounts of
shared memory to get all the XIDs we need.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-09-07 03:10:09 Re: regular logging of checkpoint progress
Previous Message Noah Misch 2011-09-07 03:03:59 Re: memory-related bugs