Re: New Copy Formats - avro/orc/parquet

From: Nicolas Paris <niparisco(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: New Copy Formats - avro/orc/parquet
Date: 2018-02-11 20:00:12
Message-ID: 20180211200012.2agrfocyaf42td5v@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> > That is true, but the question is how significant the overhead is. If
> > it's 50% then reducing it would make perfect sense. If it's 1% then no
> > one if going to be bothered by it.
>
> I think it's pretty clear that it's going to be way way much more than
> 1%.

Good news but not sure to anderstand why.

> It's trivial to construct cases where input parsing / output
> formatting takes the majority of the time.

Binary -> ORC
^
|
PROGRAM parsing/output formating on the fly

> And a lot of that you're going to be able to avoid with binary formats.

Still the above diagram shows both parsing/formating step, isn't it ?

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andres Freund 2018-02-11 20:03:14 Re: New Copy Formats - avro/orc/parquet
Previous Message Sand Stone 2018-02-11 19:34:52 persistent read cache