From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tv(at)fuzzy(dot)cz> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: about multiprocessingmassdata |
Date: | 2012-04-09 22:37:54 |
Message-ID: | CAHyXU0zbxgrpSfBumRd-V4GvsXXM1Xh1WC-LTcjBY4c_S9PDmQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Thu, Apr 5, 2012 at 9:47 AM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> On 5.4.2012 15:44, superman0920 wrote:
>> Sure, i will post that at tomorrow.
>>
>> Today I install PG and MySQL at a Server. I insert 850000 rows record
>> to each db.
>> I execute "select count(*) from poi_all_new" at two db.
>> MySQL takes 0.9s
>> PG takes 364s
>
> First of all, keep the list (pgsql-performance(at)postgresql(dot)org) on the
> CC. You keep responding to me directly, therefore others can't respond
> to your messages (and help you).
>
> Are you sure the comparison was fair, i.e. both machines containing the
> same amount of data (not number of rows, amount of data), configured
> properly etc.? Have you used the same table structure (how did you
> represent geometry data type in MySQL)?
>
> For example I bet you're using MyISAM. In that case, it's comparing
> apples to oranges (or maybe cats, so different it is). MyISAM does not
> do any MVCC stuff (visibility checking, ...) and simply reads the number
> of rows from a catalogue. PostgreSQL actually has to scan the whole
> table - that's a big difference. This is probably the only place where
> MySQL (with MyISAM beats PostgreSQL). But once you switch to a proper
> storage manager (e.g. InnoDB) it'll have to scan the data just like
> PostgreSQL - try that.
>
> Anyway, this benchmark is rubbish because you're not going to do this
> query often - use queries that actually make sense for the application.
>
> Nevertheless, it seems there's something seriously wrong with your
> machine or the environment (OS), probably I/O.
>
> I've done a quick test - I've created the table (without the 'geometry'
> column because I don't have postgis installed), filled it with one
> million of rows and executed 'select count(*)'. See this:
>
> http://pastebin.com/42cAcCqu
>
> This is what I get:
>
> ======================================================================
> test=# SELECT pg_size_pretty(pg_relation_size('test_table'));
> pg_size_pretty
> ----------------
> 1302 MB
> (1 row)
>
> test=#
> test=# \timing on
> Timing is on.
> test=#
> test=# SELECT count(*) from test_table;
> count
> ---------
> 1000000
> (1 row)
>
> Time: 2026,695 ms
> ======================================================================
>
> so it's running the 'count(*)' in two seconds. If I run it again, I get
> this:
>
> ======================================================================
> test=# SELECT count(*) from test_table;
> count
> ---------
> 1000000
> (1 row)
>
> Time: 270,020 ms
> ======================================================================
>
> Yes, that's 0,27 seconds. And this is *only* my workstation - Core i5 (4
> cores), 8GB of RAM, nothing special.
>
> These results obviously depend on the data being available in page
> cache. If that's not the case, PostgreSQL needs to read them from the
> drive (and then it's basically i/o bound) - I can get about 250 MB/s
> from my drives, so I get this:
>
> ======================================================================
> test=# SELECT count(*) from test_table;
> count
> ---------
> 1000000
> (1 row)
>
> Time: 5088,739 ms
> ======================================================================
>
> If you have slower drives, the dependency is about linear (half the
> speed -> twice the time). So either your drives are very slow, or
> there's something rotten.
>
> I still haven's seen iostat / vmstat output ... that'd tell us much more
> about the causes.
geometry column can potentially quite wide. one thing we need to see
is the table has any indexes -- in particular gist/gin on the
geometry.
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2012-04-09 23:50:59 | Re: about multiprocessingmassdata |
Previous Message | Cesar Martin | 2012-04-09 16:24:26 | Re: H800 + md1200 Performance problem |