Re: So, is COUNT(*) fast now?

From: desmodemone <desmodemone(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: So, is COUNT(*) fast now?
Date: 2011-10-22 10:43:58
Message-ID: CAEs9oF=hAGBkGaJwpxsCGHEJUMMSBErL=OYsqdHtfzW5eQTQCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2011/10/22 Andres Freund <andres(at)anarazel(dot)de>

> On Friday, October 21, 2011 08:14:12 PM Robert Haas wrote:
> > On Fri, Oct 21, 2011 at 2:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > >> On Fri, Oct 21, 2011 at 1:18 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > >>> I don't know why you'd imagine that touching an index is free, or
> even
> > >>> cheap, CPU-wise. The whole point of the index-only optimization is
> to
> > >>> avoid I/O. When you try it on a case where there's no I/O to be
> saved,
> > >>> and no shared-buffers contention to be avoided, there's no way it's
> > >>> going to be a win.
> > >>
> > >> Well, call me naive, but I would have thought touching six times less
> > >> data would make the operation run faster, not slower.
> > >
> > > It's not "touching six times less data". It's touching the exact same
> > > number of tuples either way, just index tuples in one case and heap
> > > tuples in the other.
> >
> > Yeah, but it works out to fewer pages.
> But access to those is not sequential. I guess if you measure cache hit
> ratios
> the index scan will come out significantly worse.
>
>
> Andres
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

"But access to those is not sequential" yes, I am agree.
In my opinion the problem is that. If the query needs to scan all the b-tree
index without to
access the table rows, the better way to read the index is like sequential
one,
in fact , query like count(*) or other not need the data are in "order" so I
think we
could read all blocks (better, "only" the leaf blocks) without to touching
too much the branch blocks.

For example query like this :

select column_a from table ;

is better to read the data from indexes like sequential

For query like this :

select column_a from table order by column_a ;

is better to read the data from indexes in range scan from root block to
first branch blocks and their leaf blocks, so we could "save"
the sorting.

Mat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-10-22 12:24:53 Re: Synchronized snapshots versus multiple databases
Previous Message Andres Freund 2011-10-22 09:49:36 Re: So, is COUNT(*) fast now?