Re: Extent Locks

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extent Locks
Date: 2013-05-17 03:55:37
Message-ID: 20130517035537.GX4361@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> I think it's pretty unrealistic to suppose that this can be made to
> work. The most obvious problem is that a sequential scan is coded to
> assume that every block between 0 and the last block in the relation
> is worth reading,

You don't change that. However, when a seq scan asks the storage layer
for blocks that it knows don't actually exist, it can simply skip over
them or return "empty" records or something equivilant... Yes, that's
hand-wavy, but I also think it's doable.

> I suspect there are
> slightly less obvious problems that would turn out to be highly
> intractable.

Entirely possible. :)

> The assumption that block numbers are dense is probably
> embedded in the system in a lot of subtle ways; if we start trying to
> change I think we're dooming ourselves to an unending series of crocks
> trying to undo the mess we've created.

Perhaps.

> Also, I think that's really a red herring anyway. Relation extension
> per se is not slow - we can grow a file by adding zero bytes at a
> pretty good clip, and don't really gain anything at the database level
> by spreading the growth across multiple files.

That's true when the file is on a single filesystem and a single set of
drives. Make them be split across multiple filesystems/volumes where
you get more drives involved...

> The problem is the
> relation extension LOCK, and I think that's where we should be
> focusing our attention. I'm pretty confident we can find a way to
> take the pressure off the lock without actually changing anything all
> at the storage layer.

That would certainly be very neat and if possible might render my idea
moot, which I would be more than happy with.

> As a thought experiment, suppose for example
> that we have a background process that knows, by magic, how many new
> blocks will be needed in each relation. And it knows this just enough
> in advance to have time to extend each such relation by the requisite
> number of blocks and add those blocks to the free space map. Since
> only that process ever needs a relation extension lock, there is no
> longer any contention for any such lock. Problem solved!

Sounds cute, but perhaps a bit too cute to be realistic (that's
certainly been my opinion when suggested by others, which is has been,
in the past).

> Actually, I'm not convinced that a background process is the right
> approach at all, and of course there's no actual magic that lets us
> foresee exact extension needs. But I still feel like that thought
> experiment indicates that there must be a solution here just by
> rejiggering the locking, and maybe with a bit of modest pre-extension.
> The mediocre results of my last couple tries must indicate that I
> wasn't entirely successful in getting the backends out of each others'
> way, but I tend to think that's just an indication that I don't
> understand exactly what's happening in the contention scenarios yet,
> rather than a fundamental difficulty with the approach.

Perhaps.

> > How many concurrent writers did you have and what kind of filesystem was
> > backing this? Was it a temp filesystem where writes are essentially to
> > memory, causing this relation extention lock to be much more
> > contentious?
>
> 10. ext4. No.

Ok.

> If I took 30 seconds to pre-extend the relation before writing any
> data into it, then writing the data went pretty much exactly 10 times
> faster with 10 writers than with 1.

That's rather fantastic..

> But small on-the-fly
> pre-extensions during the write didn't work as well. I don't remember
> exactly what formulas I tried, but I do remember that the few I tried
> were not really any better than "always pre-extend by 1 extra block";
> and that alone eliminated about half the contention, but then I
> couldn't do better.

That seems quite odd to me- I would have thought extending by more than
2 blocks would have helped with the contention. Still, it sounds like
extending requires a fair bit of writing, and that sucks in its own
right because we're just going to rewrite that- is that correct? If so,
I like proposal even more...

> I wonder if I need to use LWLockAcquireOrWait().

I'm not seeing how/why that might help?

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-05-17 04:06:46 Re: Better handling of archive_command problems
Previous Message Robert Haas 2013-05-17 03:38:31 Re: Extent Locks