|From:||Andres Freund <andres(at)2ndquadrant(dot)com>|
|Subject:||Re: _mdfd_getseg can be expensive|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 2014-03-31 12:10:01 +0200, Andres Freund wrote:
> I recently have seen some perf profiles in which _mdfd_getseg() was in
> the top #3 when VACUUMing large (~200GB) relations. Called by mdread(),
> mdwrite(). Looking at it's implementation, I am not surprised. It
> iterates over all segment entries a relations has; for every read or
> write. That's not painful for smaller relations, but at a couple of
> hundred GB it starts to be annoying. Especially if kernel readahead has
> already read in all data from disk.
> I don't have a good idea what to do about this yet, but it seems like
> something that should be fixed mid-term.
> The best I can come up is is caching the last mdvec used, but that's
> fairly ugly. Alternatively it might be a good idea to not store MdfdVec
> as a linked list, but as a densely allocated array.
I've seen this a couple times more since. On larger relations it gets
even more pronounced. When sequentially scanning a 2TB relation,
_mdfd_getseg() gets up to 80% proportionate CPU time towards the end of
I wrote the attached patch that get rids of that essentially quadratic
behaviour, by replacing the mdfd chain/singly linked list with an
array. Since we seldomly grow files by a whole segment I can't see the
slightly bigger memory reallocations matter significantly. In pretty
much every other case the array is bound to be a winner.
Does anybody have fundamental arguments against that idea?
With some additional work we could save a bit more memory by getting rid
of the mdfd_segno as it's essentially redundant - but that's not
entirely trivial and I'm unsure if it's worth it.
I've also attached a second patch that makes PageIsVerified() noticeably
faster when the page is new. That's helpful and related because it makes
it easier to test the correctness of the md.c rewrite by faking full 1GB
segments. It's also pretty simple, so there's imo little reason not to
The research leading to these results has received funding from the
European Union's Seventh Framework Programme (FP7/2007-2013) under
grant agreement n° 318633
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
|Next Message||Michael Paquier||2014-10-31 22:46:41||Re: tracking commit timestamps|
|Previous Message||Tom Lane||2014-10-31 22:07:41||Re: Let's drop two obsolete features which are bear-traps for novices|