Re: snapshot too old, configured by time

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Steve Singer <steve(at)ssinger(dot)info>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: snapshot too old, configured by time
Date: 2015-09-13 18:57:16
Message-ID: 1019061478.341047.1442170636363.JavaMail.yahoo@mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks to both Steve and Álvaro for review and comments. I won't
be able to address all of those within the time frame of the
current CF, so I have moved it to the next CF and flagged it as
"Waiting on Author". I will post a patch to address all comments
before that CF starts. I will discuss now, though.

Steve Singer <steve(at)ssinger(dot)info> wrote:

> The attached testcase fails I replace the cursor in your test
> case with direct selects from the table.
> I would have expected this to generate the snapshot too old error
> as well but it doesn't.

Good idea for an additional test; it does indeed show a case where
the patch could do better. I've reviewed the code and see how I
can adjust the calculations to fix this.

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> Kevin Grittner wrote:

>> 4. Tests have been added. They are currently somewhat minimal,
>> since this is using a whole new technique for testing from any
>> existing committed tests -- I wanted to make sure that this
>> approach to testing was OK with everyone before expanding it.
>> If it is, I assume we will want to move some of the more generic
>> portions to a .pm file to make it available for other tests.

> I think your test module is a bit unsure about whether it's
> called tso or sto.

Oops. Yeah, will fix. They all should be sto for "snapshot too
old". There is heavy enough use of timestamp fields name ts in the
code that my fingers got confused.

> I would place it inside src/test/modules, so that buildfarm
> runs it automatically; I'm not sure it will pick up things in
> src/test.

As it stands now, the test is getting run as part of `make
check-world`, and it seems like src/test/modules is about testing
separate executables, so I don't think it makes sense to move the
tests -- but I could be convinced that I'm missing something.

> I haven't written or reviewed our TestLib stuff so I can't
> comment on whether the test code is good or not. How long is the
> test supposed to last?

Well, I can't see how to get a good test of some code with a
setting of 0 (snapshot can become "too old" immediately). That
setting may keep parts of the test fast, but to exercise much of
the important code you need to configure to '1min' and have the
time pass the 0 second mark for a minute at various points in the
test; so it will be hard to avoid having this test take a few
minutes.
> It bothers me a bit to add #include rel.h to snapmgr.h because of
> the new macro definition. It seems out of place.

Yeah, I couldn't find anywhere to put the macro that I entirely
liked.

> I would instead move the macro closer to bufmgr headers or rel.h
> itself perhaps.

I'll try that, or something similar.

> Also, the definition isn't accompanied by an explanatory comment
> (other than why is it a macro instead of a function).

Will fix.

> So if I understand correctly, every call to BufferGetPage needs
> to have a TestForOldSnapshot immediately afterwards?

No, every call that is part of a scan. Access for internal
purposes (such as adding an index entry or vacuuming) does not need
this treatment.

> It seems easy to later introduce places that fail to test for old
> snapshots. What happens if they do?

That could cause incorrect results for those using the "snapshot
too old" feature, because a query might fail to recognize that one
or more rows it should see are missing.

> Does vacuum remove tuples anyway and then the query returns
> wrong results?

Right. That.

> That seems pretty dangerous. Maybe the snapshot could be an
> argument of BufferGetPage?

It would need that, the Relation (or go find it from the Buffer),
and an indication of whether this access is due to a scan.

> Please have the comments be clearer on what "align to minute
> boundaries" means. Is it floor or ceil?

ok

> Also, in the OldSnapshotControlData big comment you talk about
> "length" and then the variable is called "used". Maybe that
> should be more consistent, for clarity.

Will work on that comment.

> How large is the struct kept in shmem?

To allow a setting up to 60 days, 338kB.

> I don't think the size is configurable, is it?

Not in the posted patch.

> I would have guessed that the size would depend on the GUC param ...

I had been trying to make this GUC PGC_SIGHUP, but found a lot of
devils hiding in the details of that. When I gave up on that and
went back to PGC_POSTMASTER I neglected to change this allocation
back. I agree that it should be changed, and that will make the
size depend on the configuration. If the feature is not used, no
significant RAM will be allocated. If, for example, someone wants
to configure to '5h' they would need only about 1.2kB for this
structure.

Again, thanks to both of you for the review!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2015-09-13 19:59:35 Re: RLS open items are vague and unactionable
Previous Message Andreas Seltenreich 2015-09-13 18:32:30 Re: 9.3.9 and pg_multixact corruption