Re: Turning off HOT/Cleanup sometimes

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Turning off HOT/Cleanup sometimes
Date: 2015-04-16 09:36:46
Message-ID: CABOikdM6zPgaDyeJrvQMXwmJx94DZRKmFUdKpU_zPtT9ZA0enQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 16, 2015 at 2:47 PM, Greg Stark <stark(at)mit(dot)edu> wrote:

>
> On 15 Apr 2015 15:43, "Simon Riggs" <simon(at)2ndquadrant(dot)com> wrote:
> >
> > It all depends upon who is being selfish. Why is a user "selfish" for
> > not wanting to clean every single block they scan, when the people
> > that made the mess do nothing and go faster 10 minutes from now?
> > Randomly and massively penalising large SELECTs makes no sense. Some
> > cleanup is OK, with reasonable limits, which is why that is proposed.
>
> I don't think it's productive to think of a query as a different actor
> with only an interest in its own performance and no interest in overall
> system performance.
>
> From a holistic point of view the question is how many times is a given
> hit chain going to need to be followed before it's pruned. Or to put it
> another way, how expensive is creating a hot chain. Does it cause a single
> prune? a fixed number of chain readers followed by a prune? Does the amount
> of work depend on the workload or is it consistent?
>

IMO the size or traversal of the HOT chain is not that expensive compared
to the cost of either pruning too frequently, which generates WAL as well
as makes buffers dirty. OTOH cost of less frequent pruning could also be
very high. It can cause severe table bloat which may just stay for a very
long time. Even if dead space is recovered within a page, truncating a
bloated heap is not always possible. In such cases, even SELECTs would be
slowed down just because they need to read/scan far more pages than they
otherwise would have. IOW its probably wrong to assume that not-pruning
quickly enough will have impact only on the non-SELECT queries.

I also concur with arguments upthread that this change needs to be
carefully calibrated because it can lead to significant degradation for
certain workloads.

> My intuition, again, is that what we need is a percentage such as "do 10
> prunes then ignore the next 1000 clean pages with hot chains. That
> guarantees that after 100 selects the hot chains will all be pruned but
> each select will only prune 1% of the clean pages it sees.
>
I think some such proposal was made in the last. There could be knob to
control how much a read-only query (or may be a read-only transaction)
should do HOT cleanup, say as a percentage of pages it looks at. The
default can be left at 100% in the first release so that the current
behaviour is not suddenly disrupted. But it will allow others to play with
the percentages and then based on field reports, we can change defaults in
the next releases.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2015-04-16 10:50:03 Re: Optimization for updating foreign tables in Postgres FDW
Previous Message Andres Freund 2015-04-16 09:23:36 Re: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0