Skip site navigation (1) Skip section navigation (2)

Re: Further pg_upgrade analysis for many tables

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: Further pg_upgrade analysis for many tables
Date: 2013-01-10 03:46:01
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On Wednesday, January 9, 2013, Simon Riggs wrote:

> On 23 November 2012 22:34, Jeff Janes <jeff(dot)janes(at)gmail(dot)com <javascript:;>>
> wrote:
> > I got rid of need_eoxact_work entirely and replaced it with a short
> > list that fulfills the functions of indicating that work is needed,
> > and suggesting which rels might need that work.  There is no attempt
> > to prevent duplicates, nor to remove invalidated entries from the
> > list.   Invalid entries are skipped when the hash entry is not found,
> > and processing is idempotent so duplicates are not a problem.
> >
> > Formally speaking, if MAX_EOXACT_LIST were 0, so that the list
> > overflowed the first time it was accessed, then it would be identical
> > to the current behavior or having only a flag.  So formally all I did
> > was increase the max from 0 to 10.
> ...
> > It is not obvious what value to set the MAX list size to.
> A few questions, that may help you...
> Why did you pick 10, when your create temp table example needs 110?

The 110 is the size of the RelationIdCache at the end of my augmented
pgbench transaction.  But, only one of those entries needs any work, so for
that example a MAX of 1 would suffice.  But 1 seems to be cutting it rather
close, so I picked the next largest power of 10.

The downsides of making the MAX larger are:

1) For ordinary work loads, each backend needs a very little bit more
memory for the static array. (this would change if we want to extend this
to EOsubXACT as well as EOXACT, beause there can be only 1 XACT but an
unlimited number of SubXACT)

2) For pathological work loads that add the same relation to the list over
and over again thousands of times, they have to
grovel through that list at EOX, which in theory could be more work than
going through the entire non-redundant RelationIdCache hash.  (I have no
idea what a pathological work load might actually look like in practice,
but it seems like a good idea to assume that one might exist)

We could prevent duplicates from being added to the list in the first
place, but the overhead need to do that seems like a sure loser for
ordinary work loads.

By making the list over-flowable, we fix a demonstrated pathological
workload (restore of huge schemas); we impose no detectable penalty to
normal workloads; and we fail to improve, but also fail to make worse, a
hypothetical pathological workload.  All at the expense of a few bytes per

If the list overflowed at 100 rather than 10, the only cost would probably
be the extra bytes used per process.  (Because the minimum realistic size
of RelationIdCache is 100, and I assume iterating over 100 hash tags which
may or may not exist and/or be redundant is about the same amount of work
as iterating over a hash which has at least 100 entries)

If we increase the overflow above 100, we might be making things worse for
some pathological workload whose existence is entirely hypothetical--but
the workload that benefits from setting it above 100 is also hypothetical.
 So while 10 might be too small, above 100 doesn't seem to be defensible in
the absence of known cases.

> Why does the list not grow as needed?

It would increase the code complexity for no concretely-known benefit.

If we are concerned about space, the extra bytes of compiled code needed to
implement dynamic growth would certainly exceed the bytes need to just jack
the MAX setting up to static setting 100 or 500 or so.

For dynamic growth to be a win, would have to have a work-load that
satisfies these conditions:

1) It would have to have some transactions that cause >10 or >100 of
relations to need clean up.
2) It would have to have even more hundreds of relations
in RelationIdCache but which don't need cleanup (otherwise, if most
of RelationIdCache needs cleanup then iterating over that hash would be
just as efficient as iterating over a list which contains most of the said
3) The above described transaction would have to happen over and over
again, because if it only happens once there is no point in worrying about
a little inefficiency.



In response to


pgsql-hackers by date

Next:From: Tom LaneDate: 2013-01-10 03:48:10
Subject: Re: PL/perl should fail on configure, not make
Previous:From: Tom LaneDate: 2013-01-10 03:27:54
Subject: Re: Lock levels for ALTER TABLE

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group