PGSpider extention is newly released

Posted on 2021-08-30 by Toshiba
Related Open Source

We have just newly released PGSpider extension(pgspider_ext).

This is an extension to construct High-Performance SQL Cluster Engine for distributed big data.
PGSpider enables PostgreSQL to access a number of data sources using Foreign Data Wrapper(FDW) and retrieves the distributed data source vertically.

The main feature is:
* Node partitioned table
User can get records in multi tables on some data sources by one SQL easily.
If there are 2 data sources which have the following records:

    SELECT * FROM t1_node1; -- @node1
       i | t
     ----+---
      10 | a
      11 | b
    (2 rows)

    SELECT * FROM t1_node2; -- @node2
       i | t
     ----+---
      20 | c
      21 | d
    (2 rows)

PGSpider enables to collect these records with node identifier column like:

    SELECT * FROM t1;
      i | t | node
    ----+---+-------
     10 | a | node1
     11 | b | node1
     20 | c | node2
     21 | d | node2
     (4 rows)
  • Parallel processing
    PGSpider can fetch results from data sources in parallel.

  • Pushdown
    PGSpider can pushdown WHERE clause and aggregation functions to data sources.
    The shippability depends on datasource FDW.


This is developed by Toshiba Software Engineering & Technology Center.
Source repository : https://github.com/pgspider/pgspider_ext

Best Regards,
Mototaka Kanematsu