Re: Sorting performance vs. MySQL

From: Yang Zhang <yanghatespam(at)gmail(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Sorting performance vs. MySQL
Date: 2010-02-23 05:51:16
Message-ID: 9066fa251002222151u87ba137lb7785e42d66e59d6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

nnnnnOn Mon, Feb 22, 2010 at 3:45 PM, Scott Marlowe
<scott(dot)marlowe(at)gmail(dot)com> wrote:
> On Mon, Feb 22, 2010 at 12:53 PM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
>> On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
>>> On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
>>>> I have the exact same table of data in both MySQL and Postgresql. In Postgresql:
>>>
>>> Just wondering, are these on the same exact machine?
>>>
>>
>> Yes, on the same disk.
>
> I'm wondering how much of this could be caching effects.  Is the MySQL
> database "warmed up" before you started, and the pgsql database is
> "cold" and no caching has taken place?
>
> What do things like vmstat 10 say while the query is running on each
> db?  First time, second time, things like that.

Awesome -- this actually led me to discover the problem.

When running the query in MySQL InnoDB:

$ vmstat 10
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 13 13733604 83020 5648 2193884 3 3 936 168 2 1
4 2 89 5 0
1 12 13749952 80164 5600 2178032 0 4354 908 4379 3586 2638
0 1 38 60 0
0 19 13762228 80576 5556 2145220 208 3527 1280 3690 3668 2635
1 1 39 59 0
0 19 13778632 79420 5560 2135228 52 4186 1046 4191 3682 2418
0 1 37 62 0
0 19 13792964 77336 5592 2082520 41 3731 1698 3804 4102 2686
1 1 53 45 0
0 14 13810356 84036 5556 2049836 36 4241 797 4246 3913 2603
0 1 68 31 0
1 14 13825640 81336 5520 2001920 0 4212 958 4220 3848 2736
1 1 73 25 0
0 17 13844952 78036 5476 1976956 8 4685 923 4689 3832 2547
0 1 69 29 0
2 13 13863828 79812 5448 1954952 3 4627 692 4634 3744 2505
0 1 70 28 0
0 15 13883828 77764 5440 1920528 249 4544 972 4548 4345 2506
0 1 70 28 0
1 20 13898900 79132 5456 1890192 28 4341 723 4438 4982 3030
0 3 64 33 0
0 11 13915252 85184 5624 1865260 79 3668 752 3764 4472 2765
0 3 57 40 0
0 12 13933964 78448 5700 1832640 120 4327 1066 4434 4484 2777
1 3 52 45 0
0 19 13951748 77640 5816 1795720 94 4005 1159 4091 4580 2762
1 3 48 49 0
0 16 13972748 79884 5780 1753676 0 4737 787 4746 4385 2766
1 3 51 45 0
0 25 13988108 78936 5884 1726068 547 3954 1468 4116 4976 3502
0 4 44 52 0
1 20 14011500 77676 5868 1689136 161 4980 843 5506 5218 3131
0 3 34 62 0
0 22 14032472 81348 5816 1647884 270 4198 943 4369 4521 2826
1 3 40 56 0
0 23 14055220 81712 5804 1626872 193 4774 1408 4856 4617 2754
1 3 38 58 0
0 21 14075848 81844 5696 1576836 0 4738 974 4742 4528 2704
1 3 40 56 0
0 25 14097260 79788 5628 1536336 213 4512 922 4639 4726 2929
1 3 27 69 0
0 24 14123900 80820 5616 1488460 319 5033 1059 5128 4895 2780
2 3 17 78 0
1 26 14142632 77276 5660 1445592 445 4605 1434 4727 5401 3364
1 3 16 79 0
0 31 14165668 83736 5976 1387048 362 4288 1444 4428 4739 2963
2 3 17 78 0
1 28 14180104 77564 6324 1369232 387 4526 2222 4677 5748 3559
1 3 16 80 0

I'm guessing the swap numbers are because MySQL uses mmap?

Anyway, when running the query in Postgresql:

$ vmstat 10
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 2 13866336 1574540 25024 7878980 3 3 936 168 2 1
4 2 89 5 0
1 3 13861520 1163596 25328 8128360 1046 0 25429 433 4368 4175
4 2 80 14 0
0 3 13856484 803024 25600 8343220 1117 0 22062 688 4492 4590
4 2 73 20 0
0 6 13855304 678868 26052 8435540 160 0 9239 598 5195 7141
1 5 70 24 0
0 6 13853644 513568 26332 8563680 401 0 12480 7100 4775 4248
3 3 68 26 0
2 2 13851804 166228 26624 8775304 634 0 21466 1497 4680 4550
6 2 64 28 0
0 5 13861556 81896 26740 8825360 860 3547 6100 3847 5142 3386
6 2 57 35 0
0 6 13867688 91368 26808 8832712 653 3326 1835 3604 4738 2762
5 2 61 32 0
0 5 13870676 88524 26872 8849392 638 3272 2578 3517 4864 2909
4 2 55 39 0
0 5 13872748 79512 27004 8864456 629 1788 2086 2949 4337 2921
1 3 55 41 0
0 7 13876760 83124 27136 8867272 1018 2253 1713 2409 4321 2889
0 3 63 33 0
0 6 13878964 82876 27240 8874540 792 2119 1854 2314 4288 2813
2 2 72 24 0
3 4 13883204 81224 27280 8887068 661 3067 2995 3385 4558 2899
4 2 72 22 0
0 6 13886636 82036 27352 8905628 594 3726 2628 4013 4744 2765
4 2 69 25 0
0 8 13899852 85604 27400 8925800 638 4423 2689 4658 4903 2808
4 2 55 40 0
1 4 13905184 80544 27484 8940040 676 3501 3006 3799 4805 2932
4 2 66 28 0
0 9 13908480 80100 27516 8948476 668 2996 1720 3192 4594 2799
4 2 60 35 0

vmstat showed no swapping-out for a while, and then suddenly it
started spilling a lot. Checking psql's memory stats showed that it
was huge -- apparently, it's trying to store its full result set in
memory. As soon as I added a LIMIT 10000, everything worked
beautifully and finished in 4m (I verified that the planner was still
issuing a Sort).

I'm relieved that Postgresql itself does not, in fact, suck, but
slightly disappointed in the behavior of psql. I suppose it needs to
buffer everything in memory to properly format its tabular output,
among other possible reasons I could imagine.

Now it all almost seems obvious. My only solace is that it wasn't obv.
to the list either (thanks to everyone for their feedback!). Should've
just started with system profiling -- saves the day again.

>
> Also, just curios, what's shared_buffers set to on the pgsql instance?

shared_buffers
----------------
320000kB
(1 row)
--
Yang Zhang
http://www.mit.edu/~y_z/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Yang Zhang 2010-02-23 05:55:33 Re: Sorting performance vs. MySQL
Previous Message Bruce Momjian 2010-02-23 05:31:27 Re: comment on constraint