Re: PROC_IN_ANALYZE stillborn 13 years ago

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PROC_IN_ANALYZE stillborn 13 years ago
Date: 2020-08-06 08:17:44
Message-ID: CANP8+jJsQJRHe54ZsApTKuCN9aDWzWfM0D3jcQ2xbXf-kQuc2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 6 Aug 2020 at 02:07, Andres Freund <andres(at)anarazel(dot)de> wrote:

>
> On 2020-08-05 19:55:49 -0400, Alvaro Herrera wrote:
> > ... which means the flag I had added two days earlier has never been
> > used for anything. We've carried the flag forward to this day for
> > almost 13 years, dutifully turning it on and off ... but never checking
> > it anywhere.
> >
> > I propose to remove it, as in the attached patch.
>
> I'm mildly against that, because I'd really like to start making use of
> the flag. Not so much for cancellations, but to avoid the drastic impact
> analyze has on bloat. In OLTP workloads with big tables, and without
> disabled cost limiting for analyze (or slow IO), the snapshot that
> analyze holds is often by far the transaction with the oldest xmin.
>
> It's not entirely trivial to fix (just ignoring it could lead to
> detoasting issues), but also not that.
>
> Only mildly against because it'd not be hard to reintroduce once we need
> it.
>

Good points, both.

The most obvious way to avoid long analyze snapshots is to make the
analysis take multiple snapshots as it runs, rather than try to invent some
clever way of ignoring the analyze snapshots (which as Alvaro points out,
we never did). All we need to do is to have an analyze snapshot last for at
most N rows, but keep scanning until we have the desired sample size. Doing
that would mean the analyze sample wouldn't come from a single snapshot,
but then who cares? There is no requirement for consistency - the sample
would be arguably *more* stable because it comes from multiple points in
time, not just one.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
Mission Critical Databases

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Rajkumar Raghuwanshi 2020-08-06 08:55:35 Re: recovering from "found xmin ... from before relfrozenxid ..."
Previous Message Masahiko Sawada 2020-08-06 08:11:27 Re: recovering from "found xmin ... from before relfrozenxid ..."