Hi,
On 2020-12-17 09:43:30 +0300, Önder Kalacı wrote:
> The above part can be considered the core of the logic, executed per tuple.
> As far as I can see, it has two downsides.
>
> First, calling `expression_planner()` for every tuple can be quite
> expensive. I created a sample table, loaded data and ran a quick benchmark
> to see its effect. I attached the very simple script that I used to
> reproduce the issue on my laptop. I'm pretty sure you can find nicer ways
> of doing similar perf tests, just sharing as a reference.
>
> The idea of the test is to add a WHERE clause to a table, but none of the
> tuples are filtered out. They just go through this code-path and send it to
> the remote node.
>
> #rows Patched | Master
> 1M 00:00:25.067536 | 00:00:16.633988
> 10M 00:04:50.770791 | 00:02:40.945358
>
>
> So, it seems a significant overhead to me. What do you think?
That seems almost prohibitively expensive. I think at the very least
some of this work would need to be done in a cached manner, e.g. via
get_rel_sync_entry().
> Secondly, probably more importantly, allowing any operator is as dangerous
> as allowing any function as users can create/overload operator(s).
That's not safe, indeed. It's not even just create/overloading
operators, as far as I can tell the expression can contain just plain
function calls.
The issue also isn't primarily that the user can overload functions,
it's that logical decoding is a limited environment, and not everything
is safe to do within. You e.g. only catalog tables can be
accessed. Therefore I don't think we can allow arbitrary expressions.
> The other problematic area was the performance, as calling
> `expression_planner()` for every tuple can be very expensive. To avoid
> that, it might be considered to ask users to provide a function instead of
> a free form WHERE clause, such that if the function returns true, the tuple
> is sent. The allowed functions need to be immutable SQL functions with bool
> return type. As we can parse the SQL functions, we should be able to allow
> only functions that rely on the above mentioned procs. We can apply as many
> restrictions (such as no modification query) as possible. For example, see
> below:
> ```
I don't think that would get us very far.
From a safety aspect: A function's body can be changed by the user at
any time, therefore we cannot rely on analyses of the function's body.
From a performance POV: SQL functions are planned at every invocation,
so that'd not buy us much either.
I think what you would have to do instead is to ensure that the
expression is "simple enough", and then process it into a cheaply
executable format in get_rel_sync_entry(). I'd suggest that in the first
version you just allow a simple ANDed list of 'foo.bar op constant'
expressions.
Does that make sense?
Greetings,
Andres Freund