Re: POC: Cleaning up orphaned files using undo logs

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-08-07 11:35:38
Message-ID: CA+hUKGLYoQjNaSuFYnw9xFR75eENabRDu4EiWE7CL22yZu61XA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 1, 2019 at 1:22 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Jul 31, 2019 at 10:13 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > On Tue, Jul 30, 2019 at 5:26 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > > but
> > > here's a small thing: I managed to reach an LWLock self-deadlock in
> > > the undo worker launcher:
> > >
> >
> > I could see the problem, will fix in next version.
>
> Fixed both of these problems in the patch just posted by me [1].

I reran the script that found that problem, so I could play with the
linger logic. It creates N databases, and then it creates tables in
random databases (because I'm testing with the orphaned table cleanup
patch) and commits or rolls back at (say) 100 tx/sec. While it's
doing that, you can look at the pg_stat_undo_logs view to see the
discard and insert pointers whizzing along nicely, but if you look at
the process table with htop or similar you can see that it's forking
undo apply workers at 100/sec (the pid keeps changing), whenever there
is more than one database involved. With a single database it lingers
as I was expecting (and then creates problems when you want to drop
the database). What I was expecting to see is that if you configure
the test to generate undo work in 2, 3 or 4 dbs, and you have
max_undo_workers set to 4, then you should finish up with 4 undo apply
workers hanging around to service the work calmly without any new
forking happening. If you generate undo work in more than 4
databases, I was expecting to see the undo workers exiting and being
forked so that a total of 4 workers (at any time) can work their way
around the more-than-4 databases, but not switching as fast as they
can, so that we don't waste all our energy on forking and setup (how
fast exactly they should switch, I don't know, that's what I wanted to
see). A more advanced thing to worry about, not yet tested, is how
well they'll handle asymmetrical work distributions (not enough
workers, but some databases producing a lot and some a little undo
work). Script attached.

--
Thomas Munro
https://enterprisedb.com

Attachment Content-Type Size
test_undo_worker_load_balancing.py text/x-python-script 1.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-08-07 11:42:02 Re: Grouping isolationtester tests in the schedule
Previous Message Heikki Linnakangas 2019-08-07 11:34:55 Re: POC: Cleaning up orphaned files using undo logs