Re: [SQL] querying with index on jsonb slower than standard column. Why?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: Tim Dudgeon <tdudgeon(dot)ml(at)gmail(dot)com>, pgsql-sql(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] querying with index on jsonb slower than standard column. Why?
Date: 2014-12-08 20:53:09
Message-ID: 3936.1418071989@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance pgsql-sql

I wrote:
> Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> writes:
>> Seems work_mem is the key:

> Fascinating. So there's some bad behavior in the lossy-bitmap stuff
> that's exposed by one case but not the other.

Meh. I was overthinking it. A bit of investigation with oprofile exposed
the true cause of the problem: whenever the bitmap goes lossy, we have to
execute the "recheck" condition for each tuple in the page(s) that the
bitmap has a lossy reference to. So in the fast case we are talking about

Recheck Cond: ((assay1_ic50 > 90::double precision) AND (assay2_ic50 < 10::double precision))

which involves little except pulling the float8 values out of the tuple
and executing float8gt and float8lt. In the slow case we have got

Recheck Cond: ((((data ->> 'assay1_ic50'::text))::double precision > 90::double precision) AND (((data ->> 'assay2_ic50'::text))::double precision < 10::double precision))

which means we have to pull the JSONB value out of the tuple, search
it to find the 'assay1_ic50' key, convert the associated value to text
(which is not exactly cheap because *the value is stored as a numeric*),
then reparse that text string into a float8, after which we can use
float8gt. And then probably do an equivalent amount of work on the way
to making the other comparison.

So this says nothing much about the lossy-bitmap code, and a lot about
how the JSONB code isn't very well optimized yet. In particular, the
decision not to provide an operator that could extract a numeric field
without conversion to text is looking pretty bad here.

For reference, the oprofile results down to the 1% level for
the jsonb query:

samples % symbol name
7646 8.1187 get_str_from_var
7055 7.4911 AllocSetAlloc
4447 4.7219 AllocSetCheck
4000 4.2473 BitmapHeapNext
3945 4.1889 lengthCompareJsonbStringValue
3713 3.9425 findJsonbValueFromContainer
3637 3.8618 ExecMakeFunctionResultNoSets
3624 3.8480 hash_search_with_hash_value
3452 3.6654 cstring_to_text
2993 3.1780 slot_deform_tuple
2566 2.7246 jsonb_object_field_text
2225 2.3625 palloc
2176 2.3105 heap_tuple_untoast_attr
1993 2.1162 AllocSetReset
1926 2.0451 findJsonbValueFromContainerLen
1846 1.9601 GetPrivateRefCountEntry
1563 1.6596 float8gt
1486 1.5779 float8in
1477 1.5683 InputFunctionCall
1365 1.4494 getJsonbOffset
1137 1.2073 slot_getattr
1083 1.1500 init_var_from_num
1058 1.1234 ExecEvalConst
1056 1.1213 float8_cmp_internal
1053 1.1181 cstring_to_text_with_len
1032 1.0958 text_to_cstring
988 1.0491 ExecClearTuple
969 1.0289 ResourceOwnerForgetBuffer

and for the other:

samples % symbol name
14010 12.1898 BitmapHeapNext
13479 11.7278 hash_search_with_hash_value
8201 7.1355 GetPrivateRefCountEntry
7524 6.5465 slot_deform_tuple
6091 5.2997 ExecMakeFunctionResultNoSets
4459 3.8797 ExecClearTuple
4456 3.8771 slot_getattr
3876 3.3724 ExecStoreTuple
3112 2.7077 ReleaseBuffer
3086 2.6851 float8_cmp_internal
2890 2.5145 ExecQual
2794 2.4310 HeapTupleSatisfiesMVCC
2737 2.3814 float8gt
2130 1.8533 ExecEvalScalarVarFast
2102 1.8289 IncrBufferRefCount
2100 1.8272 ResourceOwnerForgetBuffer
1896 1.6497 hash_any
1752 1.5244 ResourceOwnerRememberBuffer
1567 1.3634 DatumGetFloat8
1543 1.3425 ExecEvalConst
1486 1.2929 LWLockAcquire
1454 1.2651 _bt_checkkeys
1424 1.2390 check_stack_depth
1374 1.1955 ResourceOwnerEnlargeBuffers
1354 1.1781 pgstat_end_function_usage
1164 1.0128 tbm_iterate
1158 1.0076 CheckForSerializableConflictOut

Just to add insult to injury, this is only counting cycles in postgres
proper; it appears that in the jsonb case 30% of the overall runtime is
spent in strtod() :-(

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Adrian Klaver 2014-12-08 21:14:48 Re: querying with index on jsonb slower than standard column. Why?
Previous Message Tom Lane 2014-12-08 16:56:04 Re: querying with index on jsonb slower than standard column. Why?

Browse pgsql-sql by date

  From Date Subject
Next Message Adrian Klaver 2014-12-08 21:14:48 Re: querying with index on jsonb slower than standard column. Why?
Previous Message Tom Lane 2014-12-08 16:56:04 Re: querying with index on jsonb slower than standard column. Why?