Skip site navigation (1) Skip section navigation (2)

Speed dblink using alternate libpq tuple storage

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Speed dblink using alternate libpq tuple storage
Date: 2012-01-14 04:55:47
Message-ID: 4F110AD3.2090607@2ndQuadrant.com (view raw or flat)
Thread:
Lists: pgsql-hackers
One patch that fell off the truck during a turn in the November 
CommitFest was "Allow substitute allocators for PGresult".  Re-reading 
the whole thing again, it actually turned into a rather different 
submission in the middle, and I know I didn't follow that shift 
correctly then.  I'm replying to its thread but have changed the subject 
to reflect that change.  From a procedural point of view, I don't feel 
right kicking this back to its author on a Friday night when the 
deadline for resubmitting it would be Sunday.  Instead I've refreshed 
the patch myself and am adding it to the January CommitFest.  The new 
patch is a single file; it's easy enough to split out the dblink changes 
if someone wants to work with the pieces separately.

After my meta-review I think we should get another reviewer familiar 
with using dblink to look at this next.  This is fundamentally a 
performance patch now.  Some results and benchmarking code were 
submitted along with it; the other issues are moot if those aren't 
reproducible.  The secondary goal for a new review here is to provide 
another opinion on the naming issues and abstraction concerns raised so far.

To clear out the original line of thinking, this is not a replacement 
low-level storage allocator anymore.  The idea of using such a mechanism 
to help catch memory leaks has also been dropped.

Instead this adds and documents a new path for libpq callers to more 
directly receive tuples, for both improved speed and lower memory 
usage.  dblink has been modified to use this new mechanism.  
Benchmarking by the author suggests no significant change in libpq speed 
when only that change was made, while the modified dblink using the new 
mechanism was significantly faster.  It jumped from 332K tuples/sec to 
450K, a 35% gain, and had a lower memory footprint too.  Test 
methodology and those results are at 
http://archives.postgresql.org/pgsql-hackers/2011-12/msg00008.php

Robert Haas did a quick code review of this already, it along with 
author response mixed in are at 
http://archives.postgresql.org/pgsql-hackers/2011-12/msg01149.php  I see 
two areas of contention there:

-There are several naming bits no one is happy with yet.  Robert didn't 
like some of them, but neither did Kyotaro.  I don't have an opinion 
myself.  Is it the case that some changes to the existing code's 
terminology are what's actually needed to make this all better?  Or is 
this just fundamentally warty and there's nothing to be done about it.  
Dunno.

-There is an abstraction wrapper vs. coding convenience trade-off 
centering around PGresAttValue.  It sounded to me like it raised always 
fun questions like "where's the right place for the line between 
lipq-fe.h and libpq-int.h to be?"

dblink is pretty popular, and this is a big performance win for it.  If 
naming and API boundary issues are the worst problems here, this sounds 
like something well worth pursuing as part of 9.2's still advancing 
performance theme.

-- 
Greg Smith   2ndQuadrant US    greg(at)2ndQuadrant(dot)com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


Attachment: alt-result-storage-v1.patch
Description: text/x-patch (42.1 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Robert HaasDate: 2012-01-14 05:27:08
Subject: Re: Disabled features on Hot Standby
Previous:From: Noah MischDate: 2012-01-14 01:02:31
Subject: Re: Disabled features on Hot Standby

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group