Re: drop/truncate table sucks for large values of shared buffers

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: drop/truncate table sucks for large values of shared buffers
Date: 2015-06-29 13:48:52
Message-ID: CANP8+jJUqocFBwj=j0E-d3+VtFiu64bCpKSFyYRd6b0Qu8+wCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28 June 2015 at 17:17, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On 27 June 2015 at 15:10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> I don't like this too much because it will fail badly if the caller
> >> is wrong about the maximum possible page number for the table, which
> >> seems not exactly far-fetched. (For instance, remember those kernel
> bugs
> >> we've seen that cause lseek to lie about the EOF position?)
>
> > If that is true, then our reliance on lseek elsewhere could also cause
> data
> > loss, for example by failing to scan data during a seq scan.
>
> The lseek point was a for-example, not the entire universe of possible
> problem sources for this patch. (Also, underestimating the EOF point in
> a seqscan is normally not an issue since any rows in a just-added page
> are by definition not visible to the scan's snapshot. But I digress.)
>
> > The consequences of failure of lseek in this case are nowhere near as
> dire,
> > since by definition the data is being destroyed by the user.
>
> I'm not sure what you consider "dire", but missing a dirty buffer
> belonging to the to-be-destroyed table would result in the system being
> permanently unable to checkpoint, because attempts to write out the buffer
> to the no-longer-extant file would fail. You could only get out of the
> situation via a forced database crash (immediate shutdown), followed by
> replaying all the WAL since the time of the problem. In production
> contexts that could be pretty dire.
>

Yes, its bad, but we do notice that has happened. We can also put in code
to specifically avoid this error at checkpoint time.

If lseek fails badly then SeqScans would give *silent* data loss, which in
my view is worse. Just added pages aren't the only thing we might miss if
lseek is badly wrong.

So, I think this patch still has legs. We can check that the clean up has
been 100% when we do the buffer scan at the start of the checkpoint - that
way we do just one scan of the buffer pool and move a time-consuming
operation into a background process.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-06-29 14:42:50 Re: pg_file_settings view vs. Windows
Previous Message Franck Verrot 2015-06-29 13:46:48 Mention column name in error messages