Re: about multiprocessingmassdata

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: about multiprocessingmassdata
Date: 2012-04-05 14:47:46
Message-ID: 4F7DB092.1000400@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 5.4.2012 15:44, superman0920 wrote:
> Sure, i will post that at tomorrow.
>
> Today I install PG and MySQL at a Server. I insert 850000 rows record
> to each db.
> I execute "select count(*) from poi_all_new" at two db.
> MySQL takes 0.9s
> PG takes 364s

First of all, keep the list (pgsql-performance(at)postgresql(dot)org) on the
CC. You keep responding to me directly, therefore others can't respond
to your messages (and help you).

Are you sure the comparison was fair, i.e. both machines containing the
same amount of data (not number of rows, amount of data), configured
properly etc.? Have you used the same table structure (how did you
represent geometry data type in MySQL)?

For example I bet you're using MyISAM. In that case, it's comparing
apples to oranges (or maybe cats, so different it is). MyISAM does not
do any MVCC stuff (visibility checking, ...) and simply reads the number
of rows from a catalogue. PostgreSQL actually has to scan the whole
table - that's a big difference. This is probably the only place where
MySQL (with MyISAM beats PostgreSQL). But once you switch to a proper
storage manager (e.g. InnoDB) it'll have to scan the data just like
PostgreSQL - try that.

Anyway, this benchmark is rubbish because you're not going to do this
query often - use queries that actually make sense for the application.

Nevertheless, it seems there's something seriously wrong with your
machine or the environment (OS), probably I/O.

I've done a quick test - I've created the table (without the 'geometry'
column because I don't have postgis installed), filled it with one
million of rows and executed 'select count(*)'. See this:

http://pastebin.com/42cAcCqu

This is what I get:

======================================================================
test=# SELECT pg_size_pretty(pg_relation_size('test_table'));
pg_size_pretty
----------------
1302 MB
(1 row)

test=#
test=# \timing on
Timing is on.
test=#
test=# SELECT count(*) from test_table;
count
---------
1000000
(1 row)

Time: 2026,695 ms
======================================================================

so it's running the 'count(*)' in two seconds. If I run it again, I get
this:

======================================================================
test=# SELECT count(*) from test_table;
count
---------
1000000
(1 row)

Time: 270,020 ms
======================================================================

Yes, that's 0,27 seconds. And this is *only* my workstation - Core i5 (4
cores), 8GB of RAM, nothing special.

These results obviously depend on the data being available in page
cache. If that's not the case, PostgreSQL needs to read them from the
drive (and then it's basically i/o bound) - I can get about 250 MB/s
from my drives, so I get this:

======================================================================
test=# SELECT count(*) from test_table;
count
---------
1000000
(1 row)

Time: 5088,739 ms
======================================================================

If you have slower drives, the dependency is about linear (half the
speed -> twice the time). So either your drives are very slow, or
there's something rotten.

I still haven's seen iostat / vmstat output ... that'd tell us much more
about the causes.

Tomas

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2012-04-05 15:01:25 Re: about multiprocessingmassdata
Previous Message Scott Marlowe 2012-04-05 14:38:20 Re: postgresql.conf setting for max_fsm_pages