Re: READ COMMITTED vs. index-only scans

From: Melvin Davidson <melvin6925(at)gmail(dot)com>
To: Jacek Kołodziej <kolodziejj(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: READ COMMITTED vs. index-only scans
Date: 2018-01-17 16:04:24
Message-ID: CANu8Fiwc7mdZBLxPaWMtcjkYxBiQMm_afe4RVUGSFBa+9sXwHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Jan 17, 2018 at 10:45 AM, Jacek Kołodziej <kolodziejj(at)gmail(dot)com>
wrote:

>
>
> 17.01.2018 3:30 PM "Melvin Davidson" <melvin6925(at)gmail(dot)com> napisał(a):
>
>
>
> On Wed, Jan 17, 2018 at 8:08 AM, Jacek Kołodziej <kolodziejj(at)gmail(dot)com>
> wrote:
>
>> Hello everyone,
>>
>> this is my first post here and I'm starting with asking a question about
>> data consistency between two consecutive SELECTs using PostgreSQL 9.6.
>>
>> I'm sorry if that's something that was already discussed - I couldn't
>> find it either in archives, nor in _general internet_. If it is, I would
>> appreciate pointing it out.
>>
>> I have an "append-only" events table - only INSERT and SELECT queries are
>> issued to it. It has an integer (sequence) ID as a primary key.
>>
>> I'm issuing following two queries (within the same READ COMMITTED
>> transaction) to that table:
>> - first query (A): SELECT id FROM events ORDER BY events.id DESC LIMIT 1
>> - I'm saving the result as a "max ID" for the second query
>> - second query (B): SELECT id, ... FROM events WHERE id > "min ID" AND id
>> <= "max ID" AND ...
>> - I won't dig into what's "min ID" but you can assume it's at most 100
>> less than "max ID"
>>
>> Concurrently, rows are being added to that table.
>>
>> Please note that there's enough data in the index for the first query (A)
>> to perform an index-only scan. I'm not sure if that's relevant but "B"
>> query does an index scan because of other conditions I haven't mentioned
>> but still needs to fetch data from the table's heap.
>>
>> Here's what happening to me: the "A" query occasionally (in my case: on
>> the order of tenths per day) returns an ID _higher_ than any ID present in
>> second query's result (other conditions I haven't specified do _not_ filter
>> any more rows than "id <= max ID") - as if some entries were visible for
>> the first query, but not for the second one. This is an inconsistency that
>> is very problematic for me.
>>
>> All I can think of is that it might be caused by the index-only-scan and
>> READ COMMITTED transaction isolation level but from what I gather from
>> documentation, it should not be possible due to the use of visibility
>> map[0][1][2].
>>
>> And yet it happens, likely for some other reason but I can't think of
>> any. I've tried switching transaction isolation level to REPEATABLE READ
>> (on the basis that it's an faulty phenomenon occurring during to some bug)
>> but that didn't help.
>>
>> Where am I wrong? What am I missing? What information may I provide to
>> help with investigating this?
>>
>> [0] https://www.postgresql.org/docs/9.6/static/indexes-index-onl
>> y-scans.html
>> [1] https://www.postgresql.org/docs/9.6/static/storage-vm.html
>> [2] https://wiki.postgresql.org/wiki/Index-only_scans
>>
>> --
>> Kind regards,
>> Jacek Kołodziej
>> http://kolodziejj.info
>>
>
>
> *>- first query (A): SELECT id FROM events ORDER BY events.id
> <http://events.id> DESC LIMIT 1 - I'm saving the result as a "max ID" for
> the second query*
>
> *Just a suggestion. The first query is not really needed.*
> *You can simply do:*
>
> *second query (B): *
>
>
>
>
>
>
>
> *SELECT id, ... FROM events WHERE id > MIN(ID) AND id <= MAX(ID)
> AND ...*
>
>
> *See https://www.postgresql.org/docs/9.6/static/functions-aggregate.html
> <https://www.postgresql.org/docs/9.6/static/functions-aggregate.html> *
>
> *MAX and MIN functions*
>
>
> Hi Melvin, thank you for the suggestion. Unfortunately it won't do in my
> case. Sorry for not providing enough context in the first place.
>
> After making the first query (A), I'm iterating over that table (with
> LIMIT 100 and increasing OFFSET) - using a query "B" - until another
> condition is met; overall, code is supposed to gather a number of rows from
> the table. I'm also using the "max ID" for another purpose.
>
>
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize. Whether or not you
> wish to share my fantasy is entirely up to you.
>
>
>
*Oops, my bad. I didn't have my morning coffee yet.*

*Try this instead:*

*WITH minmax AS*
* (SELECT MIN(id) as min_id,*
* MAX(id AS max_id)*

* FROM events *

* ) *

*SELECT id, ... FROM events e,*

* minmax m*

* WHERE e.id <http://e.id> > m.min_id AND e.id <http://e.id> <=
m.max_id AND ...*

--
*Melvin Davidson*
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Petr Jelinek 2018-01-17 17:07:31 Re: [PATCH] Logical decoding of TRUNCATE
Previous Message Brysounds 2018-01-17 16:01:52 Re: Configuration of pgaudit settings in postgreSQL.conf causes postgreSQL to fail to start