Re: 3D Z-curve spatial index

From: Boris Muratshin <bmuratshin(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 3D Z-curve spatial index
Date: 2017-02-06 07:01:09
Message-ID: CAO+8Nm6yBMW2bqfL=Bugs0Z0xMe+CrMDrnYxOAGrUYV-nyakmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks David,

I am really a novice in the community.
But thing I wrote about is not a patch to something but a standalone
extension.
And I've placed it on github (https://github.com/bmuratshin/zcurve/tree/3D)
+ attached.
The algorithm is not documented well because it is rather an experimental
work
and I am frequently changing the interfaces.

In the README.zcurve file you can find the instruction how to get the
numbers in the figures above,
and just in case I'll post it here:
------------------------------------------------------------------------------------------------------------------
To test SQL 3D Z-curve interface (on built and registered extension) you
need:
1) Create points table (psql):
create table test_points_3d (x integer,y integer, z integer);
2) Create test dataset:
2.1) Make gawk script, let it be 'mkdata.awk':
BEGIN{
for (i = 0; i < 100000000; i++)
{
x = int(1000000 * rand());
z = int(1000000 * rand());
print "("x",0,"z")";
}
}

2.2) Fill txt file with this script:
gawk -f mkdata.awk > data_3d.csv

3.3) copy it into the table (psql)
COPY test_points_3d from '/home/.../postgresql/contrib/zcurve/data_3d.csv';

3) Create index (psql):
create index zcurve_test_points_3d on test_points_3d(zcurve_num_from_xyz(x,
y, z));

4) Create test requests set:
4.1) Make gawk script, let it be 'mktest.awk':
BEGIN{
for (i = 0; i < 100000; i++)
{
x = 1000 * int(1000 * rand());
y = 0;
z = 1000 * int(1000 * rand());
#EXPLAIN (ANALYZE,BUFFERS)
print "select count(1) from
zcurve_3d_lookup_tidonly('zcurve_test_points_3d',
"x","y","z","x+1000","y+1000","z+1000");";
}
}
box 100X100X1000 gives us 100 points in average

2.2) Fill testing file with this script:
gawk -f mktest.awk > test_100.sql

5) For getting times, run:
date; psql -f test_100.sql ; date;
and divide the difference by 100 000

5) For getting reads,
5.1) insert 'EXPLAIN (ANALYZE,BUFFERS)' into select preparation in gawk
script:
5.2) run 'date; psql -f test_100.sql > 100.out ; date;'
5.3) prepare gawk script (let it be 'buf.awk')
BEGIN {
FS="=";
num = 0;
sum = 0;
sum2 = 0;
}
if (NF==3)
{
sum += $3;
sum2 += $2;
}
else if (NF==2)
{
if ($1==" Buffers: shared hit")
sum2 += $2;
else if ($1==" Buffers: read")
sum += $2;
}
num++;
END {
print sum/(num)" "sum2/(num)" "num;
}
5.4) run in over 100.out
gawk -f buf.awk 100.out
5.5) first number in its output is 'shared reads', second - 'shared hits'
------------------------------------------------------------------------------------------------------------------

Regards,
Boris

Attachment Content-Type Size
zcurve-3D.zip application/zip 28.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2017-02-06 07:01:31 [RFC] Should "SHOW huge_pages" display the effective value "off" when the huge page is unavailable?
Previous Message Amit Khandekar 2017-02-06 06:45:05 Re: Parallel Append implementation