Re: Slow count(*) again...

From: david(at)lang(dot)hm
To: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow count(*) again...
Date: 2010-10-12 15:52:48
Message-ID: alpine.DEB.2.00.1010120849380.21889@asgard.lang.hm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Tue, 12 Oct 2010, Mladen Gogala wrote:

> david(at)lang(dot)hm wrote:
>> from a PR point of view, speeding up the trivil count(*) case could be
>> worth it, just to avoid people complaining about it not being fast.
>>
>>
> Fixing PR stuff is not the approach that I would take. People are complaining
> about select count(*) because they're using it in all the wrong places.

that may be the case, but if it's possible to make it less painful it will
mean more people use postgres, both because it works better for them when
they are using the suboptimal programs, but also because when people do
their trivial testing of databases to decide which one they will use, they
won't rule out postgres because "it's so slow"

the fact of the matter is that people do use count(*), and even though
there are usually ways to avoid doing so, having the programmer have to do
something different for postgres than they do for other databases is
raising a barrier against postgres untilization in anything.

David Lang

> My
> assessment that there is a problem with sequential scan was wrong. Now, let's
> again take Oracle as the measure.
> Someone asked me about caching the data. Here it is:
>
> SQL> connect system/*********
> Connected.
> SQL> alter system flush buffer_cache;
>
> System altered.
>
> Elapsed: 00:00:12.68
> SQL> connect adbase/*********
> Connected.
> SQL> alter session set db_file_multiblock_read_Count=128;
>
> Session altered.
>
> Elapsed: 00:00:00.41
> SQL> select count(*) from ni_occurrence;
>
> COUNT(*)
> ----------
> 402062638
>
> Elapsed: 00:02:37.77
>
> SQL> select bytes/1048576 MB from user_segments
> 2 where segment_name='NI_OCCURRENCE';
>
> MB
> ----------
> 35329
>
> Elapsed: 00:00:00.20
> SQL>
>
>
> So, the results weren't cached the first time around. The explanation is the
> fact that Oracle, as of the version 10.2.0, reads the table in the private
> process memory, not in the shared buffers. This table alone is 35GB in
> size, Oracle took 2 minutes 47 seconds to read it using the full table scan.
> If I do the same thing with PostgreSQL and a comparable table, Postgres is,
> in fact, faster:
>
> psql (9.0.1)
> Type "help" for help.
>
> news=> \timing
> Timing is on.
> news=> select count(*) from moreover_documents_y2010m09;
> count ----------
> 17242655
> (1 row)
>
> Time: 113135.114 ms
> news=> select pg_size_pretty(pg_table_size('moreover_documents_y2010m09'));
> pg_size_pretty
> ----------------
> 27 GB
> (1 row)
>
> Time: 100.849 ms
> news=>
>
> The number of rows is significantly smaller, but the table contains rather
> significant "text" field which consumes quite a bit of TOAST storage and the
> sizes are comparable. Postgres read through 27GB in 113 seconds, less than 2
> minutes and oracle took 2 minutes 37 seconds to read through 35GB. I stand
> corrected: there is nothing wrong with the speed of the Postgres sequential
> scan.
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message david 2010-10-12 15:54:24 Re: Slow count(*) again...
Previous Message david 2010-10-12 15:48:34 Re: Slow count(*) again...

Browse pgsql-performance by date

  From Date Subject
Next Message david 2010-10-12 15:54:24 Re: Slow count(*) again...
Previous Message david 2010-10-12 15:48:34 Re: Slow count(*) again...