Re: schema advice for event stream with tagging and filtering

From: Chris Withers <chris(at)simplistix(dot)co(dot)uk>
To: Ilya Kazakevich <Ilya(dot)Kazakevich(at)JetBrains(dot)com>, 'Postgresql General' <pgsql-general(at)postgresql(dot)org>
Subject: Re: schema advice for event stream with tagging and filtering
Date: 2016-12-12 13:02:39
Message-ID: 58b0478c-1f58-792e-eadc-a63016580951@simplistix.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 16/08/2016 15:10, Ilya Kazakevich wrote:
>>>> An event is a row with a primary key along the lines of (colo, host,
>>>> category) and an associated set of tags, where each tag has a type
>>>> and a value
>>>> (eg: {"color": "red", "owner": "fred", "status": "open"...}).
>>>
>>> What about simple table with several columns and hstore field for tags?
>
> BTW, "pure SQL" approach here is to use separate tables: Tags(TagId, TagName) and TagValues(EventId,TagId,Value).

Well, maybe, but none of us wants to do that ;-)

> But in this case it will be painful to filter events by tag values directly, so only separate denormalized OLAP table should be used in "pure SQL":)

I don't understand the second half of this I'm afraid...

> PostgreSQL, however, supports key-value based hstore.

Right, but hstore only allows single values for each key, if I
understand correctly?

Okay, so that leaves me with a jsonb "tags" column with a gin index, but
I still have a couple of choices..

So, in order to best answer these types of queries:

>> - show me a list of tag types and the count of the number of events of that
>> type
>>
>> - show me all events that have tag1=x, tag2=y and does not have tag3

...which of the following is going to be most performant:

# SELECT '{"tag1":["v1", "v2", "v3"]}'::jsonb @> '{"tag1": ["v1"]}'::jsonb;
?column?
----------
t
(1 row)

# SELECT '[{"tag1":"v1"}, {"tag1": "v2"}, {"tag1": "v3"}]'::jsonb @>
'[{"tag1": "v1"}]'::jsonb;
?column?
----------
t
(1 row)

So, should I go for a tag name that maps to a list of values for that
tag, or should I go for a sequence of one-entry mappings of tag name to
tag value?

cheers,

Chris

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2016-12-12 14:33:21 Re: btree gist indices, null and open-ended tsranges
Previous Message Torsten Förtsch 2016-12-12 12:23:58 Re: WAL