Re: genomic locus

From: Gene Selkov <selkovjr(at)gmail(dot)com>
To: obartunov(at)gmail(dot)com
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: genomic locus
Date: 2017-12-22 02:23:35
Message-ID: B7BE110F-AA0C-418F-96D8-F9470C5F0EE7@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Oleg,

Great to hear from you. I wondered how many of the old-timers were still around.

> Why not use composite type ? For simple interval approach it's worked for us
> (see attached hdate.sql).

I have just begun looking at your hdate example; I see potentially useful stuff in it, but the first thing that I noticed is hat it is not fully equivalent to my problem. It looks like you only need to match intervals, while I need to match intervals and something else — ideally, in a single operation. I attempted to explain that in my reply to Craig Ringer.

> If you need to specify distribution
> function,

Not in this case; there is no uncertainty associated with the loci; where there is uncertainty is in the existence of a feature called at a locus: is it real or is it a technogenic artifact? But that is a different problem for a later day.

> than it may be
> worth to see orion project http://orion.cs.purdue.edu/index.html
> 6 years ago we was thinking about implementation special UNCERTAINTY data type
> (http://www.sai.msu.su/~megera/postgres/talks/big_uncertain_data.pdf), but never
> started :( It'd be nice if you start this very interesting for science project.

I love uncertainty, and I’ve always wished I could make it computable. I also wish folks around me had the same appreciation for it. My job is to say yes or no where the data suggest maybe, or maybe not. Needless to say, I feel a bit exercised.

I am reading the info you provided with keen interest.

> btw, now you can use range data type, check
> https://wiki.postgresql.org/images/7/73/Range-types-pgopen-2012.pdf <https://wiki.postgresql.org/images/7/73/Range-types-pgopen-2012.pdf>

Great stuff, I was not aware of it. I saw it in early development but did not know it made it to the core. I tried it (and will go and update a few kludgy apps where I had to use bad surrogates). It is not directly applicable to genomic loci because it will require additional constraints for intelligent matching. I want to go for compete encapsulation of constraints.

Part of the reason for such a perverse desire is that I use the database as a calculator — that is, I load some data in a one-off experiment and I literally type everything in psql while I muddle through. There is a limit on how much I can type and not screw things up beyond comprehension, so I want the query language to be as easy and interactive as possible. Having to drag along a set of additional constraints is not quite interactive and is error-prone.

Regards,

—Gene

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-12-22 02:59:08 Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256
Previous Message Craig Ringer 2017-12-22 02:15:35 Re: Finalizing logical replication limitations as well as potential features