From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Gene Selkov <selkovjr(at)gmail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: genomic locus |
Date: | 2017-12-25 14:19:46 |
Message-ID: | 8995e58b-80a9-7f8a-f552-a12d77550a74@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> I think I can wrangle this type into GiST just by tweaking consistent(),
> union(), and picksplit(), if I manage to express my needs in C without breaking
> too many things. My first attempt segfaulted.
Actually, consistent() can determ actual query data type by strategy number. See
examples in ltree, intarray
> If all goes to plan, I will end up with an index tree partitioned by contig at
> the top level and geometrically down from there. That will be as close as I can
> get to an array of config-specific indices, without having to store data in
> separate tables.
>
> What do you think of that?
I have some doubt that you can distinguish root page, but it's possible to
distinguish leaf pages, intarray and tsearch do that.
Reading your plan, I found an idea for GIN: key for GIN is a pair of (contig,
one genome position). So, any search for interset operation with be actually a
range search from (contig, start) to (contig, end)
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> I have a low-level technical question. Because I can’t anticipate the maximum
> length of contig names (and do not want to waste space), I have made the new
> locus type a varlena, like this:
>
> #include "utils/varlena.h"
>
> typedef struct LOCUS
> {
> int32 l_len_; /* varlena header (do not touch directly!) */
> int32 start;
> int32 end;
> char contig[FLEXIBLE_ARRAY_MEMBER];
> } LOCUS;
>
> #define LOCUS_SIZE(str) (offsetof(LOCUS, contig) + sizeof(str))
sizeof? or strlen ?
>
> That flexible array member messes with me every time I need to copy it while
> deriving a new locus object from an existing one (or from a pair). What I ended
> up doing is this:
>
> LOCUS *l = PG_GETARG_LOCUS_P(0);
> LOCUS *new_locus;
> char *contig;
> int size;
> new_locus = (LOCUS *) palloc0(sizeof(*new_locus));
> contig = pstrdup(l->contig); // need this to determine the length of contig
l->contig should be null-terminated for pstrdup, but if so, you don't need to
pstrdup() it - you could use l->contig directly below. BTW, LOCUS_SIZE should
add 1 byte for '\0' character in this case.
> name at runtime
> size = LOCUS_SIZE(contig);
> SET_VARSIZE(new_locus, size);
> strcpy(new_locus->contig, contig);
>
> Is there a more direct way to clone a varlena structure (possibly assigning an
> differently-sized contig to it)? One that is also memory-safe?
Store length of contig in LOCUS struct.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Benyamin Guedj | 2017-12-25 14:39:37 | How to Works with Centos |
Previous Message | Aleksandr Parfenov | 2017-12-25 14:15:07 | Re: [HACKERS] Flexible configuration for full-text search |