Re: Slow count(*) again...

From: Reid Thompson <reid(dot)thompson(at)ateb(dot)com>
To: reid(dot)thompson(at)ateb(dot)com
Cc: Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow count(*) again...
Date: 2010-10-10 18:33:04
Message-ID: 4CB206E0.6080707@ateb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 10/10/2010 11:02 AM, Reid Thompson wrote:
>>>> On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel<neil(dot)whelchel(at)gmail(dot)com>
>>>>
>> On the other hand, I copied a table out of one of my production servers that
>> has about 60,000 rows with 6 columns (numeric, numeric, bool, bool, timestamp,
>> text). The first numeric column has numbers evenly spread between 0 and 100
>> and it is indexed. I put the table in a pair of database servers both running
>> on the same physical hardware. One server is Postgres, the other is a popular
>> server (I am not mentioning names here). on Postgres: SELECT count(*) FROM
>> table where column>50; takes about 8 seconds to run. The other database server
>> took less than one second (about 25 ms) as it is using the index (I assume) to
>> come up with the results. It is true that this is not a fair test because both
>> servers were tested with their default settings, and the defaults for Postgres
>> are much more conservative, however, I don't think that any amount of settings
>> tweaking will bring them even in the same ball park. There has been discussion
>> about the other server returning an incorrect count because all of the indexed
>> rows may not be live at the time. This is not a problem for the intended use,
>> that is why I suggested another function like estimate(*). It's name suggests
>> that the result will be close, not 100% correct, which is plenty good enough
>> for generating a list of results pages in most cases. I am faced with a very
>> serious problem here. If the query to make a list of pages takes say 6 seconds
>> and it takes another 6 seconds to generate a page of results, the customer is
>> waiting 12 seconds. This is not going to work. If count made a quick estimate,
>> say less than a second, and it took 6 seconds to come up with the actual
>> results, I could live with that. Or if coming up with the window of results
>> via (OFFSET and LIMIT) and returned the total number of rows that would have
>> matched the query, then I would still have everything I need to render the
>> page in a reasonable time. I really think that this needs to be addressed
>> somewhere. It's not like I am the only one that does this. You see it nearly
>> everywhere a long list of results is (expected to be) returned in a web site.
>> Among the people I work with, this seems to be the most mentioned reason that
>> they claim that they don't use Postgres for their projects. t anyway.
>
> How big is your DB?
> How fast is your disk access?
> Any chance disks/RAM can be addressed?
>
> My disk access is pitiful...
> first run, 2.3 million rows.. 0m35.38s, subsequent runs.. real 0m2.55s
>
> rthompso(at)hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
> count
> ---------
> 2340704
> (1 row)
>
>
> real 0m35.38s
> user 0m0.25s
> sys 0m0.03s
>
> subsequent runs.... (count changes due to inserts.)
>
> rthompso(at)hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
> count
> ---------
> 2363707
> (1 row)
>
>
> real 0m2.70s
> user 0m0.27s
> sys 0m0.02s
> rthompso(at)hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
> count
> ---------
> 2363707
> (1 row)
>
>
> real 0m2.55s
> user 0m0.26s
> sys 0m0.02s
> rthompso(at)hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
> count
> ---------
> 2363707
> (1 row)
>
>
> real 0m2.50s
> user 0m0.26s
> sys 0m0.02s
>
> reporting=# SELECT pg_size_pretty(pg_total_relation_size('my_production_table'));
> pg_size_pretty
> ----------------
> 1890 MB
> (1 row)
>
>
forgot to note, my table schema is significantly larger.

rthompso(at)hw-prod-repdb1> time psql -c "\d my_production_table_201010" reporting
Table "public.my_production_table_201010"
Column | Type | Modifiers
-----------------------------+-----------------------------+----------------------------------------------------------------
| integer | not null default
nextval('my_production_table_parent_id_seq'::regclass)
| character varying(20) |
| character(1) |
| character varying(32) |
| character varying(32) |
| character varying(20) |
| character varying(5) |
| character varying(5) |
| date |
| character(1) |
| character varying(32) |
| character varying(32) |
| character varying(32) |
| character varying(2) |
| character varying(10) |
| character varying(10) |
| character varying(32) |
| character varying(7) |
| character varying(10) |
| character varying(2) |
| character varying(9) |
| character varying(9) |
| character varying(9) |
| character varying(10) |
| character varying(32) |
| character varying(32) |
| character varying(20) |
| character varying(5) |
| character varying(5) |
| character varying(32) |
| character varying(32) |
| character varying(32) |
| character varying(2) |
| character varying(10) |
| character varying(10) |
| character varying(10) |
| character varying(10) |
| integer |
| character varying(2) |
| character varying(32) |
| character varying(32) |
| integer |
| integer |
| text |
| character varying(3) |
| date |
| date |
| date |
| integer |
| integer |
| integer |
| integer |
| character varying(6) |
| character varying(10) |
| character varying(32) |
| character varying(32) |
| character varying(32) |
| character varying(10) |
| character varying(6) |
| character varying(8) |
| boolean |
| character(1) |
| date |
| integer |
| date |
| character varying(11) |
| character varying(4) |
| character(1) |
| date |
| character varying(5) |
| character varying(20) |
| date |
| character(1) |
| character(1) |
| character varying(2) |
| text |
| integer |
| integer |
| timestamp without time zone | default now()
| timestamp without time zone |
| character varying(64) |
| character varying(64) |
| character varying(64) |
Indexes:
"my_production_table_201010_pkey" PRIMARY KEY, btree (id)
"my_production_table_201010_date_idx" btree (xxxxdate), tablespace "indexspace"
"my_production_table_201010_epatient_idx" btree (storeid, xxxxxxxxxxxxx), tablespace "indexspace"
"my_production_table_201010_medicationname_idx" btree (xxxxxxxxxxxxxx), tablespace "indexspace"
"my_production_table_201010_ndc_idx" btree (xxx), tablespace "indexspace"
Check constraints:
"my_production_table_201010_filldate_check" CHECK (xxxxdate >= '2010-10-01'::date AND xxxxdate <
'2010-11-01'::date)
Foreign-key constraints:
"my_production_table_201010_pkgfileid_fkey" FOREIGN KEY (pkgfileid) REFERENCES my_production_tablefiles(id)
Inherits: my_production_table_parent

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2010-10-10 20:23:31 Re: Which file does the SELECT?
Previous Message Tom Lane 2010-10-10 18:06:46 Re: WIP: Triggers on VIEWs

Browse pgsql-performance by date

  From Date Subject
Next Message Neil Whelchel 2010-10-10 21:59:48 Re: Slow count(*) again...
Previous Message Mladen Gogala 2010-10-10 17:14:22 Re: Slow count(*) again...