Re: ANALYZE sampling is too good

From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Greg Stark <stark(at)mit(dot)edu>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE sampling is too good
Date: 2013-12-10 03:13:17
Message-ID: 52A686CD.6050104@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/12/13 15:32, Mark Kirkwood wrote:
> On 10/12/13 15:17, Mark Kirkwood wrote:
>> On 10/12/13 15:11, Mark Kirkwood wrote:
>>> On 10/12/13 15:04, Mark Kirkwood wrote:
>>>> On 10/12/13 13:53, Mark Kirkwood wrote:
>>>>> On 10/12/13 13:20, Mark Kirkwood wrote:
>>>>>> On 10/12/13 13:14, Mark Kirkwood wrote:
>>>>>>> On 10/12/13 12:14, Heikki Linnakangas wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I took a stab at using posix_fadvise() in ANALYZE. It turned
>>>>>>>> out to be very easy, patch attached. Your mileage may vary, but
>>>>>>>> I'm seeing a nice gain from this on my laptop. Taking a 30000
>>>>>>>> page sample of a table with 717717 pages (ie. slightly larger
>>>>>>>> than RAM), ANALYZE takes about 6 seconds without the patch, and
>>>>>>>> less than a second with the patch, with
>>>>>>>> effective_io_concurrency=10. If anyone with a good test data
>>>>>>>> set loaded would like to test this and post some numbers, that
>>>>>>>> would be great.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I did a test run:
>>>>>>>
>>>>>>> pgbench scale 2000 (pgbench_accounts approx 25GB).
>>>>>>> postgres 9.4
>>>>>>>
>>>>>>> i7 3.5Ghz Cpu
>>>>>>> 16GB Ram
>>>>>>> 500 GB Velociraptor 10K
>>>>>>>
>>>>>>> (cold os and pg cache both runs)
>>>>>>> Without patch: ANALYZE pgbench_accounts 90s
>>>>>>> With patch: ANALYZE pgbench_accounts 91s
>>>>>>>
>>>>>>> So I'm essentially seeing no difference :-(
>>>>>>
>>>>>>
>>>>>> Arrg - sorry forgot the important bits:
>>>>>>
>>>>>> Ubuntu 13.10 (kernel 3.11.0-14)
>>>>>> filesystem is ext4
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> Doing the same test as above, but on a 80GB Intel 520 (ext4
>>>>> filesystem mounted with discard):
>>>>>
>>>>> (cold os and pg cache both runs)
>>>>> Without patch: ANALYZE pgbench_accounts 5s
>>>>> With patch: ANALYZE pgbench_accounts 5s
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> Redoing the filesystem on the 520 as btrfs didn't seem to make any
>>>> difference either:
>>>>
>>>> (cold os and pg cache both runs)
>>>> Without patch: ANALYZE pgbench_accounts 6.4s
>>>> With patch: ANALYZE pgbench_accounts 6.4s
>>>>
>>>>
>>>>
>>>
>>> Ah - I have just realized I was not setting effective_io_concurrency
>>> - so I'll redo the test. - Apologies.
>>>
>>>
>>
>> Redoing the test on the velociraptor gives me exactly the same
>> numbers as before (effective_io_concurrency = 10 instead of 1).
>>
>>
>
> Good grief - repeating the test gave:
>
> Without patch: ANALYZE pgbench_accounts 90s
> With patch: ANALYZE pgbench_accounts 42s
>
> pretty consistent *now*. No idea what was going on in the 1st run
> (maybe I happened to have it running at the same time as a
> checkpoint)? Anyway will stop now before creating more confusion.
>
>

Just one more...

The Intel 520 with ext4:

Without patch: ANALYZE pgbench_accounts 5s
With patch: ANALYZE pgbench_accounts 1s

And double checking -
With patch, but effective_io_concurrency = 1: ANALYZE pgbench_accounts 5s

These results look more like Heikki's. Which suggests more benefit on
SSD than spinning disks. Some more data points (apart from mine) would
be good to see tho.

Cheers

Mark

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2013-12-10 03:30:46 Re: [bug fix] pg_ctl always uses the same event source
Previous Message Bruce Momjian 2013-12-10 03:08:35 Re: [patch] Adding EXTRA_REGRESS_OPTS to all pg_regress invocations