Quick Links

Re: hash aggregation

From:	Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>
To:	Korisk <Korisk(at)yandex(dot)ru>
Cc:	Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: hash aggregation
Date:	2012-10-12 07:10:06
Message-ID:	CAL_0b1trANSSN2f3AhWGc1+DG96D2ig-n-ZEv3kLsvAiwNr2aA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Thu, Oct 11, 2012 at 9:14 PM, Korisk <Korisk(at)yandex(dot)ru> wrote:
> Strange situation.
> After indexscan enabling the cost is seriously decreased.

AFAIK when the planner has to choose between index scans and seq scans
and both of this options are off it uses one of this strategies anyway
but puts 10000000000.00 as a lower cost for this (thanks Maxim Boguk
for the explanation in chat).

> -> Index Only Scan using hashcheck_name_rev_idx on public.hashcheck (cost=10000000000.00..10000466660.96 rows=25990002 width=32) (act
> ual time=0.166..3698.776 rows=25990002 loops=1)

So when you enabled one of these options it started using it as usual.

> hashes=# set enable_indexscan=on;
> SET
> hashes=# explain analyse verbose select name, count(name) as cnt from hashcheck group by name order by name desc;

[cut]

> -> Index Only Scan using hashcheck_name_rev_idx on public.hashcheck (cost=0.00..466660.96 rows=25990002 width=32) (actual time=0.129.
> .3653.848 rows=25990002 loops=1)

What I can not understand is why the seq scan's estimated cost is
better the index scan's one. It depends on the number of pages in
index/relation. May be the index is heavily bloated?

Let's see the sizes:

select pg_total_relation_size('hashcheck')
select pg_total_relation_size('hashcheck_name_rev_idx');

> hashes=# set enable_seqscan=on;
> SET
> hashes=# explain analyse verbose select name, count(name) as cnt from hashcheck group by name order by name desc;
> QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------------------------------------
> -----
> Sort (cost=565411.67..565412.17 rows=200 width=32) (actual time=21746.799..21747.026 rows=4001 loops=1)
> Output: name, (count(name))
> Sort Key: hashcheck.name
> Sort Method: quicksort Memory: 315kB
> -> HashAggregate (cost=565402.03..565404.03 rows=200 width=32) (actual time=21731.551..21733.277 rows=4001 loops=1)
> Output: name, count(name)
> -> Seq Scan on public.hashcheck (cost=0.00..435452.02 rows=25990002 width=32) (actual time=29.431..13383.812 rows=25990002 loop
> s=1)
> Output: id, name, value
> Total runtime: 21747.356 ms
> (9 rows)
>
>
>
>
>

--
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray(dot)ru(at)gmail(dot)com Skype: gray-hemp Phone: +14158679984

In response to

Re: hash aggregation at 2012-10-12 04:14:38 from Korisk

Responses

Re: hash aggregation at 2012-10-12 21:14:35 from Tomas Vondra

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Korisk	2012-10-12 14:37:48	Re: hash aggregation
Previous Message	Ondrej Ivanič	2012-10-12 04:32:53	Re: hash aggregation