From: | "Jussi Pakkanen" <jpakkane(at)gmail(dot)com> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #4462: Adding COUNT to query causes massive slowdown |
Date: | 2008-10-10 19:55:10 |
Message-ID: | 42d23b2e0810101255p59d5d1eyce79a0c917bd72ed@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, Oct 9, 2008 at 6:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Given that PostgreSQL does the scan even with the huge seqscan
>> penalty, I can think of only two different causes:
>> 1) some sort of a bug in the query analyzer
>> 2) SELECT COUNT(DISTINCT x) for some reason requires information that
>> is not available in the index.
>
> Try (3) COUNT(DISTINCT x) ... or any DISTINCT aggregate for that matter
> ... is implemented by a sort-and-uniq step inside the aggregate function
> itself. You can't see it in the plan.
Does this mean that the sort-and-uniq will always lead to a full table
scan? It would seem so, because I could not force PostgreSQL to use
the index even with enable_seqscan set to off. I understand that in
some cases the table scan is faster, but it is very strange if the
query optimizer refuses to use the index no matter what.
A quick calculation says that the table scan needs to access 32
million elements (and sort them, and uniq them). An index scan needs
only 2 million (or 4 million I suppose, if you account for the higher
levels in the B-tree). That is an order of magnitude difference in
disk reads alone.
> I wouldn't actually think that this approach would be slower than an
> indexscan, btw, unless maybe the index were very nearly correlated with
> physical order --- but that would make the sort more efficient, too.
I have ordered the data table according to the "code" column by first
importing it to a temp table and then building the actual "log" table
with CREATE TABLE log AS SELECT ... ORDER BY code;.
In this case the index is faster. A lot faster. SELECT DISTINCT using
the index and counting the rows takes 4 minutes. Building a view with
distinct "code"s and counting that also takes 4 minutes. But the
aggregate scan takes 11 minutes. Worst of all, COUNT(DISTINCT ...) is
the way all SQL books tell you to do these kinds of queries.
> Perhaps you need to raise work_mem enough to allow the sort to take
> place without spilling to disk? (Turning on trace_sort should let you
> see what's happening there.)
But isn't this just hiding the root cause? Having working sets larger
than available memory is not that uncommon.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2008-10-10 21:07:00 | Re: BUG #4462: Adding COUNT to query causes massive slowdown |
Previous Message | Peter Eisentraut | 2008-10-10 16:56:30 | Re: BUG #4465: GROUP BY is not to SQL standard |