Re: Double sorting split patch

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Double sorting split patch
Date: 2011-10-05 12:59:47
Message-ID: CAPpHfdvt-8vkD3p5wUCH0A3fCWtmgKxzG6bp23K4aBvOrS5Vxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Path without allocating extra bytes is attached.
I run some more detailed tests on geonames and two smaller datasets from
rtreeportal.org.
Test tables are following:
1) test1 - geonames
2) test2 - California Roads, containing the MBRs of 2,249,727 streets
3) test3 - Tiger Streams, containing the MBRs of 194,971 streams
Scripts for test queries generation and execution are attached
(scripts.tar.gz).

Build times are given in the following table:
New linear Double sorting
test1 277.889630 273.766355
test2 73.262561 75.114079
test3 4.251563 4.425139
As we can see, build times differ insignificantly.

Index sizes are given in the following table:
New linear Double sorting
test1 572383232 578977792
test2 179929088 178569216
test3 15409152 15532032
As we can see, index sizes differ insignificantly.

Data about index quality testing are in the table test_results of the
following structure:
tablename - name of the queried table
count - actual count of rows matching query
nominal_count - count of rows matching query which test generator tried to
provide (test generator not always can create test query which return
exactly same amount of rows as it was expected)
strategy - split strategy: 1 - new liner split(current), 2 - double
sorting(this patch)
hits - number of pages hits for query execution.

Raw data are in the attachment (test_result.sql.gz). Report is below.
test=# select tablename, nominal_count, round(avg(count),2) as avg_count,
strategy, round(avg(hits),2) as avg_hits from test_results group by
tablename, nominal_count, strategy order by tablename, nominal_count,
strategy;
tablename | nominal_count | avg_count | strategy | avg_hits
-----------+---------------+-----------+----------+----------
test1 | 1 | 4.87 | 1 | 19.94
test1 | 1 | 4.87 | 2 | 6.46
test1 | 10 | 11.07 | 1 | 23.95
test1 | 10 | 11.07 | 2 | 7.36
test1 | 100 | 101.36 | 1 | 43.30
test1 | 100 | 101.36 | 2 | 10.19
test1 | 1000 | 998.70 | 1 | 86.48
test1 | 1000 | 998.70 | 2 | 24.21
test2 | 1 | 1.32 | 1 | 8.67
test2 | 1 | 1.32 | 2 | 5.99
test2 | 10 | 11.32 | 1 | 9.40
test2 | 10 | 11.32 | 2 | 6.71
test2 | 100 | 102.93 | 1 | 13.10
test2 | 100 | 102.93 | 2 | 9.02
test2 | 1000 | 999.67 | 1 | 32.16
test2 | 1000 | 999.67 | 2 | 23.51
test3 | 1 | 0.99 | 1 | 6.03
test3 | 1 | 0.99 | 2 | 4.32
test3 | 10 | 9.95 | 1 | 7.52
test3 | 10 | 9.95 | 2 | 5.09
test3 | 100 | 99.92 | 1 | 10.73
test3 | 100 | 99.92 | 2 | 7.73
test3 | 1000 | 999.75 | 1 | 27.47
test3 | 1000 | 999.75 | 2 | 22.44
(24 rows)

As we can see, double sorting split outperfoms new linear split in all the
presented cases. Difference in test2 and test3 is not so big as it is in
test1. It can be explained by fact that test2 and test3 are smaller than
test1.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
scripts.tar.gz application/x-gzip 1.9 KB
double-sorting-split-0.7.patch text/x-patch 26.2 KB
test_result.sql.gz application/x-gzip 986 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Lou Picciano 2011-10-05 13:24:29 Error building v9.1.1 (git) with python 3.2.2
Previous Message Pavel Stehule 2011-10-05 12:33:29 Re: [REVIEW] prepare plans of embedded sql on function start