From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|

To: | pgsql-hackers(at)postgreSQL(dot)org |

Subject: | numeric_normalize() is a few bricks shy of a load |

Date: | 2014-11-05 23:19:26 |

Message-ID: | 24216.1415229566@sss.pgh.pa.us |

Views: | Raw Message | Whole Thread | Download mbox | Resend email |

Thread: | |

Lists: | pgsql-hackers |

9.4 introduced a function numeric_normalize() whose charter is to

produce a string representing a numeric, such that two numerics' string

representations will compare equal if and only if the numeric values

compare equal. (This is used, so far, only by GIN indexes on JSONB.)

Thomas Fanghaenel pointed out to me that this function is utterly,

completely broken: it will reduce distinct values such as 1 and 1000

to the same string, and what's much worse for the GIN use-case,

it can reduce values that should compare equal to distinct strings.

Here's a test case demonstrating the latter assertion:

regression=# create table t1 (f1 jsonb);

CREATE TABLE

regression=# insert into t1 values('{"x": 12000}');

INSERT 0 1

regression=# insert into t1 values('{"x": 12000.00}');

INSERT 0 1

regression=# select * from t1 where f1 @> '{"x": 12000}';

f1

-----------------

{"x": 12000}

{"x": 12000.00}

(2 rows)

regression=# create index on t1 using gin (f1);

CREATE INDEX

regression=# set enable_seqscan TO 0;

SET

regression=# select * from t1 where f1 @> '{"x": 12000}';

f1

--------------

{"x": 12000}

(1 row)

regression=# select * from t1 where f1 @> '{"x": 12000.00}';

f1

-----------------

{"x": 12000.00}

(1 row)

Since JSONB GIN indexes are always considered lossy, the

reduction-to-same-string case is masked by index rechecks, resulting

only in an inefficient index that fails to distinguish keys it could

distinguish. However, the reduction-to-different-strings case results

in wrong answers, as illustrated above.

I think there's no choice but to fix this in 9.4. The only reason

it even needs discussion is that this would invalidate index entries

in beta testers' JSONB GIN indexes, such that queries would not find

entries that they did find before. I'm not sure we can do anything

about this except recommend that beta testers REINDEX such indexes

after updating to 9.4next.

Thoughts?

regards, tom lane

- Re: numeric_normalize() is a few bricks shy of a load at 2014-11-06 00:34:42 from Stephen Frost

From | Date | Subject | |
---|---|---|---|

Next Message | Andres Freund | 2014-11-05 23:19:58 | Re: WIP: dynahash replacement for buffer table |

Previous Message | Josh Berkus | 2014-11-05 23:15:26 | Re: Amazon Redshift |