Re: Abbreviated keys for text cost model fix

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Abbreviated keys for text cost model fix
Date: 2015-02-22 21:30:40
Message-ID: CAM3SWZR2PDCphC+sWi9y811uYrJZopCj0PSKfafnoWHji=qckw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 22, 2015 at 1:19 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> In short, this fixes all the cases except for the ASC sorted data. I
> haven't done any code review, but I think we want this.
>
> I'll use data from the i5-2500k, but it applies to the Xeon too, except
> that the Xeon results are more noisy and the speedups are not that
> significant.
>
> For the 'text' data type, and 'random' dataset, the results are these:
>
> scale datum cost-model
> -------------------------------
> 100000 328% 323%
> 1000000 392% 391%
> 2000000 96% 565%
> 3000000 97% 572%
> 4000000 97% 571%
> 5000000 98% 570%
>
> The numbers are speedup vs. master, so 100% means exactly the same
> speed, 200% means twice as fast.
>
> So while with 'datum' patch this actually caused very nice speedup for
> small datasets - about 3-4x speedup up to 1M rows, for larger datasets
> we've seen small regression (~3% slower). With the cost model fix, we
> actually see a significant speedup (about 5.7x) for these cases.

Cool.

> I haven't verified whether this produces the same results, but if it
> does this is very nice.
>
> For 'DESC' dataset (i.e. data sorted in reverse order), we do get even
> better numbers, with up to 6.5x speedup on large datasets.
>
> But for 'ASC' dataset (i.e. already sorted data), we do get this:
>
> scale datum cost-model
> -------------------------------
> 100000 85% 84%
> 1000000 87% 87%
> 2000000 76% 96%
> 3000000 82% 90%
> 4000000 91% 83%
> 5000000 93% 81%
>
> Ummm, not that great, I guess :-(

You should try it with the data fully sorted like this, but with one
tiny difference: The very last tuple is out of order. How does that
look?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-02-22 23:16:40 Re: Abbreviated keys for text cost model fix
Previous Message Tomas Vondra 2015-02-22 21:19:33 Re: Abbreviated keys for text cost model fix