Re: gprof SELECT COUNT(*) results

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: gprof SELECT COUNT(*) results
Date: 2005-11-25 15:38:50
Message-ID: 1132933130.2906.7.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2005-11-25 at 09:54 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > On Thu, 2005-11-24 at 23:48 -0500, Tom Lane wrote:
> >> Yeah, I was wondering the same. It'd be possible to rewrite the seqscan
> >> stuff so that we do the visibility tests for all the tuples on a given
> >> page at once, taking the buffer content lock just once, and saving aside
> >> the valid tuple IDs to return later. This should definitely be faster
> >> when all the tuples actually get fetched.
>
> > I was thinking of the brute force approach: take a complete copy of the
> > block when we read the first tuple off it. That way any wierdness
> > on-block is avoided until we logically attempt to read that tuple. It
> > also allows us to palloc the right amount of space first time.
>
> That occurred to me too, but I rejected it on two grounds:
> * All that data copying will be expensive.
> * We couldn't update tuple commit hint bits on the real page.

OK, the second one is the clincher.

> Also, I'm not at all sure that it's a good idea to release the buffer
> pin before we're genuinely done with the page. That sort of change
> would impact such things as the VACUUM interlock algorithm. Maybe it'd
> be OK but I'm disinclined to mess with it for a dubious speed gain.

I think its safer not to try it all in one go. It's worth coming back to
later though: VACUUM will never remove rows the scanner can see anyway.

> Even with my version of the proposal, it'd be risky to use the check-
> all-in-advance approach for non-MVCC snapshots such as SnapshotNow.
> I'm not sure that there are any places that would get broken by missing
> tuples that weren't yet committed when the page was first touched ...
> but I'm not sure there aren't, either. We could avoid this risk by
> having two paths through heapgettup, though.

That seems prudent. The main performance optimization is required for
MVCC access anyway; we seldom scan catalog tables and hopefully they
never get too big.

> > Are you, or will you be implementing this?
>
> I plan to take a look at it soon.

Much appreciated and well done to Qingqing for spotting this.

Best Regards, Simon Riggs

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Olivier Thauvin 2005-11-25 15:59:27 Re: gprof SELECT COUNT(*) results
Previous Message Tom Lane 2005-11-25 15:32:04 Re: Ipv6 and Postgresql 8.0.3