Re: Fix picksplit with nan values

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix picksplit with nan values
Date: 2014-02-01 03:50:22
Message-ID: 20140201035022.GH31141@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Where are we on this?

---------------------------------------------------------------------------

On Fri, Nov 8, 2013 at 01:38:28PM -0500, Tom Lane wrote:
> Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> > I wrote attached patch by following principles:
> > 1) NaN coordinates shouldn't crash or hang GiST.
> > 2) NaN coordinates should be processed in GiST index scan like in
> > sequential scan.
> > 3) NaN coordinates shouldn't lead to significant slowdown.
>
> I looked at this patch for awhile. It seems like there's still an awful
> lot of places in gistproc.c that are not worrying about NANs, and it's not
> clear to me that they don't need to. For instance, despite the changes in
> adjustBox(), it'll still be possible to have boxes with NAN boundaries if
> all the contained values are all-NAN boxes. It doesn't seem like
> gist_box_penalty will behave very sanely for that; it'll return a NAN
> penalty which seems unhelpful. The static box_penalty function doesn't
> work sanely for NANs either, and if it can return NAN then you also have
> to worry about NAN deltas in common_entry_cmp. And isn't it still
> possible for the Assert in gist_box_picksplit to fire?
>
> > That could be illustrated on following test-case:
>
> > create table test1 as (select point(random(), random()) as p from
> > generate_series(1,1000000));
> > create index test1_idx on test1 using gist(p);
> > create table temp as (select * from test1);
> > insert into temp (select point('nan'::float8, 'nan'::float8) from
> > generate_series(1,1000));
> > insert into temp (select point('nan'::float8, random()) from
> > generate_series(1,1000));
> > insert into temp (select point(random(), 'nan'::float8) from
> > generate_series(1,1000));
> > create table test2 as (select * from temp order by random());
> > create index test2_idx on test2 using gist(p);
> > drop table temp;
>
> I think this test case is unlikely to generate any all-NAN index entry
> boxes, because almost certainly the initial entries will be non-NANs, and
> you've got it set up to keep incoming NANs from adjusting the boundaries
> of an existing box. You'd get better code coverage if you started by
> inserting some NANs into an empty index.
>
> Also, as a stylistic matter, I thought your previous solution of
> copying float8_cmp_internal was a better idea than manually inlining it
> (sans comments!) in multiple places as this version does.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2014-02-01 04:35:45 Re: jsonb and nested hstore
Previous Message Bruce Momjian 2014-02-01 03:29:11 Re: pg_restore multiple --function options