Re: hash as an search key and hash collision

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, i(dot)kurbangaliev(at)gmail(dot)com
Subject: Re: hash as an search key and hash collision
Date: 2020-06-19 23:33:39
Message-ID: CAKU4AWqPgk_u_q6G3XQmFqL-eOR4RqEMiWtoLzkX8j1K4h184Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 20, 2020 at 12:34 AM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> On Fri, Jun 19, 2020 at 04:24:01PM +0800, Andy Fan wrote:
> >I want to maintain an internal table which the primary key is sql_text and
> >planstmt::text, it is efficient since it both may be very long. So a
> >general
> >idea is to use sql_hash_value and plan_hash_value. Then we have to
> >handle the hash collision case. However I checked the codes both in
> >sr_plans[1]
> >and pg_stat_statements[2], both of them didn't handle such cases, IIUC.
> so
> >how can I understand this situation?
> >
>
> IIRC pg_stat_statements simply accepts the hash collision risk. This is
> what the docs say:
>
> In some cases, queries with visibly different texts might get merged
> into a single pg_stat_statements entry. Normally this will happen
> only for semantically equivalent queries, but there is a small
> chance of hash collisions causing unrelated queries to be merged
> into one entry. (This cannot happen for queries belonging to
> different users or databases, however.)
>
> The consequences of a hash collision are relatively harmless, enough to
> make it not worth the extra checks (e.g. because the SQL text may not be
> available in memory and would need to be read from the file).
>

I see. Thank you for this information, this does make sense.

I suppose sr_plan does the same thing, but I haven't checked.
>

sr_plans is used to map a sql hash value to a PlannedStmts, if hash
collisions
happen, it may execute a query B while the user wants to execute Query A.
this should be more sensitive than pg_stat_statements which doesn't require
exact data. I added Ildus who is the author of sr_plan to the cc list
in case he wants to take a look.

--
Best Regards
Andy Fan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-06-20 00:39:13 Re: min_safe_lsn column in pg_replication_slots view
Previous Message Thomas Munro 2020-06-19 22:32:47 Re: Cache relation sizes?