Re: How to get the 'ctid' from a record type?

From: Eric Ridge <eebbrr(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: How to get the 'ctid' from a record type?
Date: 2017-03-11 21:58:00
Message-ID: CANcm6wY4OFhD9s9j=Egt3Vrcm49_qMPVQj3nKDWp1gkE4YffNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 11, 2017 at 2:14 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2017-03-11 04:31:16 +0000, Eric Ridge wrote:
> > Well shoot. That kinda spoils my plans.
>
> I think you should elaborate on what you're trying to achieve -
> otherwise our advice will be affected by the recent, widely reported,
> crystal ball scarcity.
>

What I'm trying to do is port https://github.com/zombodb/zombodb to
Postgres 9.6+. It's an Access Method that stores full rows, encoded as
JSON, in Elasticsearch instead of in local storage. It was fairly
straightforward to do the mechanical work to convert it to use 9.6's new AM
API (which is very nice, btw!), but the fact that 9.6 also disallows
including system columns (specifically ctid) has me turned upside down.

With <9.6, I was able to cook-up a scheme where it was able to answer
queries from the remote Elasticsearch index even when Postgres decided to
plan a sequential scan. That hinged, mostly, on being able to create a
multi-column index where the first column was a function call that included
as an argument (among other things) the ctid system column.

The ability to answer sequential scans (and filters) using the remote ES
index is pretty important as the knowledge of how to do that exists in
Elasticsearch, not my custom operator function in Postgres.

Anyways, I've been trying to find a way to intuit the ctid system column
value with 9.6 and it's clear now that that just isn't possible. The
closest I got was digging through
ActivePortal->queryDesc->estate->es_tuple, but that only works when it's a
real tuple, not one that's virtual or minimal.

I'm pretty sure that I need to be implementing a Custom Scan Provider
instead, and I've been spending time with that API too. There's a pretty
steep learning curve for me, but I'll eventually get over that hump.

I could probably bore you with greater detail but basically, I want to take:
CREATE INDEX idxfoo ON table USING zombodb (zdb(table),
zdb_to_json(table)) WITH (url='http://remote.ip.addr:9200/');
SELECT * FROM table WHERE zdb(table) ==> 'some full text query' OR id =
42;

And have the "zdb(table) ==> 'some full text query'" bit be answered by my
extension, regardless of how PG wants to plan the query. While I was able
to hack something together for <9.6, I think that means a Custom Scan
Provider now?

eric

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2017-03-11 22:09:32 Preserving param location
Previous Message Joe Conway 2017-03-11 21:37:36 Re: [bug fix] dblink leaks unnamed connections