RE: PostgreSQL 10.0 SELECT LIMIT performance problem

From: Mareks Kalnačs <Mareks(dot)Kalnacs(at)datakom(dot)lv>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Victor Yegorov <vyegorov(at)gmail(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>, Māris Rucis <Maris(dot)Rucis(at)datakom(dot)lv>, Pāvels Koržs <Pavels(dot)Korzs(at)datakom(dot)lv>
Subject: RE: PostgreSQL 10.0 SELECT LIMIT performance problem
Date: 2018-09-13 10:19:04
Message-ID: c081b34fc8584e4e80ddf0e99eabbba5@datakom.lv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi!

My point is, why planned ignore table hits and, when joining tables, think that rows will be less than in reality:

Table1: Index Scan using idx_tvc_entity_sid_u on tvcis.tvc_entity e (cost=0.28..1856.72 rows=253 width=208) (actual time=0.091..13.944 rows=71 loops=1)

Table2: Index Scan using idx_oss_objects_sid on tvcis.oss_objects o (cost=0.29..16.84 rows=23 width=951) (actual time=0.009..0.033 rows=18 loops=71)

Join: Nested Loop (cost=0.57..6175.55 rows=9 width=951) (actual time=0.112..17.355 rows=1291 loops=1)

Why planned don’t take in account results from each table:

Join rows = Table1 rows * Table2 rows == 253 * 23 = 5819? In this case difference will be 22%, but planner got 9 rows what is 0.7% from actual result

As we see, actual rows almost equal with subquery row multiplication 71 * 18 ~= 1291

Mareks

-----Original Message-----
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Sent: trešdiena, 2018. gada 12. septembris 20:04
To: Victor Yegorov <vyegorov(at)gmail(dot)com>
Cc: Mareks Kalnačs <Mareks(dot)Kalnacs(at)datakom(dot)lv>; pgsql-bugs(at)postgresql(dot)org; Māris Rucis <Maris(dot)Rucis(at)datakom(dot)lv>
Subject: Re: PostgreSQL 10.0 SELECT LIMIT performance problem

Victor Yegorov <vyegorov(at)gmail(dot)com<mailto:vyegorov(at)gmail(dot)com>> writes:

> The fact, that planner is not accurate on the estimates of JSON

> internal keys is expected, PostgreSQL is not parsing JSON values when

> gathering stats.

> You cannot expect planner to be picky about all possible corner cases,

> it would make planning time enormously huge.

Right. The fact that it doesn't make the right guesses without help can't be considered to be a bug in all cases. These are engineering tradeoffs we have to make.

> That is the reason I outlined, that important keys should be extracted

> into plain columns.

If that seems infeasible from an application standpoint, another possibility is to make expression indexes on those important keys.

ANALYZE will gather stats on the values of indexed expressions, and then perhaps the planner will have enough info to make better decisions.

In the other case mentioned, where the problem is a poor guess about the selectivity of

where (

to_tsvector(

'simple'::regconfig,

a.ts_vector_fields

) @@ to_tsquery(

'simple',

' (decodedType5b6f0753ossAlarmsGenerator) '

)

)

it's the same problem: the planner has no stats that would let it figure out the selectivity. It can't reasonably extract an estimate on the fly --- if it did, you'd be complaining that planning time was too long. The only way to get reasonable behavior is to set things up so that ANALYZE will accumulate stats about the values of "to_tsvector('simple',a.ts_vector_fields)". You can arrange that either by extracting that into a column, or by making an index on it.

regards, tom lane

________________________________

Šis elektroniskā pasta sūtījums un tajā esošie dokumenti ir paredzēts tikai norādītajam adresātam(-iem). Tas satur konfidenciālu informāciju un var saturēt arī priviliģētu informāciju. Tā satura pilnīga vai daļēja nesankcionēta izpaušana, izmantošana vai tālāka izplatīšana jebkādā veidā ir aizliegta. Ja šis elektroniskā pasta sūtījums saņemts kļūdas dēļ, lūdzam sazināties ar sūtītāju, nosūtot atbildes e-pasta vēstuli, un izdzēst šo vēstuli. Drošības apsvērumu dēļ SIA Datakom var veikt elektroniskā pasta sūtījumu un tiem pievienoto dokumentu satura monitoringu. Ja vien šajā vēstulē vai pievienotajos dokumentus nav nepārprotami teikts pretējais, šis elektroniskā pasta sūtījums vai tam pievienotie dokumenti nav uzskatāmi par SIA Datakom piedāvājumu slēgt darījumu, apņemšanos vai gatavību uzņemties saistības. Sūtītājs: SIA Datakom, vienotais reģistrācijas nr: 40103142605, Juridiskā adrese: Vienības gatve 109, Rīga, LV-1058, Latvija

This electronic transmission and attached documents is intended only for the addressee(s). It contains confidential information and may also contain privileged information. If you are not an intended recipient, you may not use, read, retransmit, disseminate or take any action in reliance upon it. If this electronic transmission is received in error, please contact the sender by sending a reply e-mail and delete this message. SIA Datakom reserves the right to monitor all e-mail communications through its networks. Unless this letter or the accompanying documents clearly stated otherwise, by this electronic transmission does not constitute a contract offer, a contract amendment or an acceptance of a contract offer on behalf of the SIA Datakom. Sender: SIA Datakom, registration number: 40103142605, Latvia. Registered office: Vienibas gatve 109, Riga, LV-1058, Latvia

________________________________

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2018-09-13 10:45:21 BUG #15383: Join Filter cost estimation problem in 10.5
Previous Message Mareks Kalnačs 2018-09-13 10:03:49 RE: PostgreSQL 10.0 SELECT LIMIT performance problem