Implementing a FDW
Multicorn provides a simple interface for writing foreign data wrappers: the
multicorn.ForeignDataWrapper
interface.
Implementing a foreign data wrapper is as simple as inheriting from
multicorn.ForeignDataWrapper
and implemening the execute
method.
What are we trying to achieve?
Supposing we want to implement a foreign data wrapper which only returns a set
of 20 rows, containing in each column the name of the column itself
concatenated with the number of the line.
The goal of this tutorial is to be able to execute this:
CREATE FOREIGN TABLE constanttable (
test character varying,
test2 character varying
) server multicorn_srv options (
wrapper 'myfdw.ConstantForeignDataWrapper'
)
SELECT * from constanttable;
And obtain this as a result:
test | test2
---------|----------
test 0 | test2 0
test 1 | test2 1
test 2 | test2 2
test 3 | test2 3
test 4 | test2 4
test 5 | test2 5
test 6 | test2 6
test 7 | test2 7
test 8 | test2 8
test 9 | test2 9
test 10 | test2 10
test 11 | test2 11
test 12 | test2 12
test 13 | test2 13
test 14 | test2 14
test 15 | test2 15
test 16 | test2 16
test 17 | test2 17
test 18 | test2 18
test 19 | test2 19
(20 lignes)
How do we do that?
The fdw described above is pretty simple, implementing it should be easy!
First things first, we have to create a new python module.
This can be achieved with the most simple setup.py
file:
import subprocess from setuptools import setup, find_packages, Extension
setup(
name='myfdw',
version='0.0.1',
author='Ronan Dunklau',
license='Postgresql',
packages=['myfdw']
)
But let’s see the whole code. To be usable with the above CREATE FOREIGN
TABLE
statement, this module should be named myfdw.
from multicorn import ForeignDataWrapper
class ConstantForeignDataWrapper(ForeignDataWrapper):
def __init__(self, options, columns):
super(ConstantForeignDataWrapper, self).__init__(options, columns)
self.columns = columns
def execute(self, quals, columns):
for index in range(20):
line = {}
for column_name in self.columns:
line[column_name] = '%s %s' % (column_name, index)
yield line
You should have the following directory structure:
.
|-- myfdw/
| `-- __init__.py
`-- setup.py
To install it, just run python setup.py install
, and the file will be copied
to your global python installation, which should be the one your PostgreSQL
instance is using.
And that’s it! You just created your first foreign data wrapper. But let’s look
a bit more thoroughly to the class…
The first thing to do (although optional, since you can implement the interface
via duck-typing), is to import the base class and subclass it:
from multicorn import ForeignDataWrapper
class ConstantForeignDataWrapper(ForeignDataWrapper):
The init method must accept two arguments:
options
: A dictionary of options given in the OPTIONS
clause of the CREATE
FOREIGN TABLE
statement, minus the wrapper option.
columns
A mapping of the columns names given during the table creation,
associated to their types. Ex: {'test': 'character varying'}
.
Our access point do not need any options, thus we will only need to keep a
reference to the columns:
def __init__(self, options, columns):
super(ConstantForeignDataWrapper, self).__init__(options, columns)
self.columns = columns
The execute method is the core of the API. It is called with a list of Qual
objects, and a list column names, which we will ignore for now but more on that
later.
This method must return an iterable of the resulting lines. Each line can be
either a list containing an item by column, or a dictonary mappning the column
names to their value.
For this example, we chose to build a dictionary. Each column contains the
concatenation of the column name and the line index.
def execute(self, quals):
for index in range(20):
line = {}
for column_name in self.columns:
line[column_name] = '%s %s' % (column_name, index)
yield line
And that’s it !
Write API
Since PostgreSQL 9.3, foreign data wrappers can implement a write API.
In multicorn, this involves defining which column will be used as a primary key
(mandatory) and implementing the following methods at your discretion:
def insert(self, new_values)
def update(self, rowid, new_values)
def delete(self, rowid)
Defining the primary key
Which column is used as primary key is defined by the property rowid_column
.
In order to support the write API, you will have to implement this property to
return a column name, e.g.
@property
def rowid_column(self):
return 'id'
Without implementing this property any attempt to use the write API methods will
raise an exception.
Inserts
The insert
method receives the new_values
argument, which is a dictionary of
column names to values. This method should returns a dictionary which will be
used in case the INSERT
has a RETURNING
clause, e.g.
INSERT INTO my_foreign_table VALUES (some_value) RETURNING *;
In other words: The returned dictionary should be a mapping from column names to
values.
Updates
The update
method behaves like the insert
method with regards to output.
However, as input it receives an additional argument: the rowid
. This is the
value of the column named primary key by the rowid_column
property, and thus
defines which row should be updated.
Deletes
The delete
method receives only the rowid
argument, and should always return
nothing.
Transactions
If you want to handle transaction hooks, you can implement the following methods:
def commit(self)
def rollback(self)
def pre_commit(self)
The pre_commit
method will be called just before the local transaction
commits. You can raise an exception here to abort the current transaction were
your remote commit to fail.
The commit
method will be called just at commit time, while the rollback
method will be called whenever the local transaction is rollbacked.
Optimizations
As was noted in the code comments, the execute methods accept a quals
argument. This argument is a list of quals object, which are defined in
multicorn/__init__.py
.
A Qual object defines a simple condition wich can be used by the foreign data
wrapper to restrict the number of the results. The Qual class defines three
instance’s attributes:
field_name
: the name of the column concerned by the condition.
operator
: the name of the operator.
value
: the value expressed in the condition.
Let’s suppose we write the following query:
SELECT * from constanttable where test = 'test 2' and test2 like '%3%';
The method execute would be called with the following quals:
[Qual('test', '=', 'test 2'), Qual('test', '~~', '3')]
Now you can use this information to reduce the set of results to return to the
PostgreSQL server.
You don’t have to enforce those quals, PostgreSQL will check them
anyway. It’s nonetheless useful to reduce the amount of results you fetch over
the network, for example.
Similarly, the columns argument contains the list of needed columns. You can
use this information to reduce the amount of data that has to be fetched.
For example, the following query:
select test, test2 from constanttable;
would result in the following columns argument:
Once again, if you return more than these columns, everything should be fine.
Parameterized paths
The python FDW implementor can affect the planner by implementing the
get_path_keys
and get_rel_size
methods.
def get_rel_size(self, quals, columns):
This method must return a tuple of the form (expected_number_of_row
,
expected_mean_width_of_a_row
(in bytes)).
The quals and columns arguments can be used to compute those estimates.
For example, the imapfdw
computes a huge width whenever the payload column is
requested.
This method must return a list of tuple of the form (column_name
,
expected_number_of_row
).
The expected_number_of_row
must be computed as if a where column_name =
some_value
filter were applied.
This helps the planner to estimate parameterized paths cost, and change the
plan accordingly.
For example, informing the planner that a filter on a column may return exactly
one row, instead of the full billion, may help it on deciding to use a
nested-loop instead of a full sequential scan.
Error reporting
In the
multicorn.utils
module lies a simple utility function, log_to_postgres
.
This function is mapped to the PostgreSQL function erreport
.
It accepts three arguments:
Foreign Data Wrapper lifecycle
The foreign data wrapper associated to a table is instantiated on a per-process
basis, and it happens when the first query is run against it.
Usually, PostgreSQL server processes are spawned on a per-connection basis.
During the life time of a server process, the instance is cached. That means
that if you have to keep references to resources such as connections, you
should establish them in the __init__
method and cache them as instance
attributes.