Speed dblink using alternate libpq tuple storage

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Speed dblink using alternate libpq tuple storage
Date: 2012-01-14 04:55:47
Message-ID: 4F110AD3.2090607@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

One patch that fell off the truck during a turn in the November
CommitFest was "Allow substitute allocators for PGresult". Re-reading
the whole thing again, it actually turned into a rather different
submission in the middle, and I know I didn't follow that shift
correctly then. I'm replying to its thread but have changed the subject
to reflect that change. From a procedural point of view, I don't feel
right kicking this back to its author on a Friday night when the
deadline for resubmitting it would be Sunday. Instead I've refreshed
the patch myself and am adding it to the January CommitFest. The new
patch is a single file; it's easy enough to split out the dblink changes
if someone wants to work with the pieces separately.

After my meta-review I think we should get another reviewer familiar
with using dblink to look at this next. This is fundamentally a
performance patch now. Some results and benchmarking code were
submitted along with it; the other issues are moot if those aren't
reproducible. The secondary goal for a new review here is to provide
another opinion on the naming issues and abstraction concerns raised so far.

To clear out the original line of thinking, this is not a replacement
low-level storage allocator anymore. The idea of using such a mechanism
to help catch memory leaks has also been dropped.

Instead this adds and documents a new path for libpq callers to more
directly receive tuples, for both improved speed and lower memory
usage. dblink has been modified to use this new mechanism.
Benchmarking by the author suggests no significant change in libpq speed
when only that change was made, while the modified dblink using the new
mechanism was significantly faster. It jumped from 332K tuples/sec to
450K, a 35% gain, and had a lower memory footprint too. Test
methodology and those results are at
http://archives.postgresql.org/pgsql-hackers/2011-12/msg00008.php

Robert Haas did a quick code review of this already, it along with
author response mixed in are at
http://archives.postgresql.org/pgsql-hackers/2011-12/msg01149.php I see
two areas of contention there:

-There are several naming bits no one is happy with yet. Robert didn't
like some of them, but neither did Kyotaro. I don't have an opinion
myself. Is it the case that some changes to the existing code's
terminology are what's actually needed to make this all better? Or is
this just fundamentally warty and there's nothing to be done about it.
Dunno.

-There is an abstraction wrapper vs. coding convenience trade-off
centering around PGresAttValue. It sounded to me like it raised always
fun questions like "where's the right place for the line between
lipq-fe.h and libpq-int.h to be?"

dblink is pretty popular, and this is a big performance win for it. If
naming and API boundary issues are the worst problems here, this sounds
like something well worth pursuing as part of 9.2's still advancing
performance theme.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachment Content-Type Size
alt-result-storage-v1.patch text/x-patch 42.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-01-14 05:27:08 Re: Disabled features on Hot Standby
Previous Message Noah Misch 2012-01-14 01:02:31 Re: Disabled features on Hot Standby