Re: pglogical_output - a general purpose logical decoding output plugin

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: pglogical_output - a general purpose logical decoding output plugin
Дата
Msg-id CAMsr+YECDGHG_WMfu8Vi8J3GdOuV=nUincYNWryb26d93ngtEg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pglogical_output - a general purpose logical decoding output plugin  (Andres Freund <andres@anarazel.de>)
Ответы Re: pglogical_output - a general purpose logical decoding output plugin  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On 29 January 2016 at 18:16, Andres Freund <andres@anarazel.de> wrote:
Hi,

so, I'm reviewing the output of:

Thankyou very much for the review.
 
> +             pglogical_output_plhooks \

I'm doubtful we want these plhooks. You aren't allowed to access normal
(non user catalog) tables in output plugins. That seems too much to
expose to plpgsql function imo.

You're right. We've got no way to make sure the user sticks to things that're reasonably safe.

The intent of that module was to allow people to write row and origin filters in plpgsql, to serve as an example of how to implement hooks, and to provide a tool usable in testing pglogical_output from pg_regress.

An example can be in C, since it's not safe to do it in plpgsql as you noted. A few toy functions will be sufficient for test use.

As for allowing users to flexibly filter, I'm stating to think that hooks in pglogical_output aren't really the best long term option. They're needed for now, but for 9.7+ we should look at whether it's practical to separate "what gets forwarded" policy from the mechanics of how it gets sent. pglogical_output currently just farms part of the logical decoding hook out to its own hooks, but it wouldn't have to do that if logical decoding let you plug in policy on what you send separately to how you send it. Either via catalogs or plugin functions separate to the output plugin.

(Kinda off topic, though, and I think we need the hooks for now, just not the plpgsql implementation).

 
> +++ b/contrib/pglogical_output/README.md

I don't think we've markdown in postgres so far - so let's just keep the
current content and remove the .md

I'm halfway through turning it all into SGML anyway. I just got sidetracked by other work. I'd be just as happy to leave it as markdown but figured SGML would likely be required.
 

> +==== Table metadata header
> +
> +|===
> +|*Message*|*Type/Size*|*Notes*
> +
> +|Message type|signed char|Literal ‘**R**’ (0x52)
> +|flags|uint8| * 0-6: Reserved, client _must_ ERROR if set and not recognised.
> +|relidentifier|uint32|Arbitrary relation id, unique for this upstream. In practice this will probably be the upstream table’s oid, but the downstream can’t assume anything.
> +|nspnamelength|uint8|Length of namespace name
> +|nspname|signed char[nspnamelength]|Relation namespace (null terminated)
> +|relnamelength|uint8|Length of relation name
> +|relname|char[relname]|Relation name (null terminated)
> +|attrs block|signed char|Literal: ‘**A**’ (0x41)
> +|natts|uint16|number of attributes
> +|[fields]|[composite]|Sequence of ‘natts’ column metadata blocks, each of which begins with a column delimiter followed by zero or more column metadata blocks, each with the same column metadata block header.

That's a fairly high overhead. Hm.

Yeah, and it shows in Oleksandr's measurements.  However, that's a metadata message that is sent only pretty infrequently if you enable relation metadata caching. Which is really necessary to get reasonable performance on anything but the simplest workloads, and is only optional because it makes it easier to write and test a client without it first.
 
> +== JSON protocol
> +
> +If `proto_format` is set to `json` then the output plugin will emit JSON
> +instead of the custom binary protocol. JSON support is intended mainly for
> +debugging and diagnostics.
> +

I'm fairly strongly opposed to including two formats in one output
plugin. I think the demand for being able to look into the binary
protocol should instead be satisfied by having a function that "expands"
the binary data returned into something easier to understand.

Per our discussion yesterday I think I agree with you on that now.

My thinking is that we should patch pg_recvlogical to be able to load a decoder plugin. Then extract the protocol decoding support from pglogical into a separately usable library that can be loaded by pg_recvlogical, pglogical downstream, and by SQL-level debug/test helper functions.

pg_recvlogical won't be able to decode binary or internal format field values, but you can simply not request that they be sent.
 
> +                     case PARAM_BINARY_BASETYPES_MAJOR_VERSION:
> +                             val = get_param_value(elem, false, OUTPUT_PARAM_TYPE_UINT32);
> +                             data->client_binary_basetypes_major_version = DatumGetUInt32(val);
> +                             break;

Why is the major version tied to basetypes (by name)? Seem more
generally useful.

I found naming that param rather awkward.

The idea is that we can rely on the Pg major version only for types defined in core.  It's mostly safe for built-in extensions in that few if any people ever upgrade them, but it's not strictly correct even there. Most of them (hstore, etc) don't expose their own versions so it's hard to know what to do about them.

What I want(ed?) to do is let a downstream enumerate the extensions it has and the version(s) it knows they're compatible with for send/recv and internal formats. But designing that was going to be hairy in the time available, and I think falling back on text representation for contribs and extensions is the safest choice. A solid way to express an extension compatibility matrix seemed rather too much to bite off as well as everything else.



> +                     case PARAM_RELMETA_CACHE_SIZE:
> +                             val = get_param_value(elem, false, OUTPUT_PARAM_TYPE_INT32);
> +                             data->client_relmeta_cache_size = DatumGetInt32(val);
> +                             break;

I'm not convinced this a) should be optional b) should have a size
limit. Please argue for that choice. And how the client should e.g. know
about evictions in that cache.

I don't think metadata every row makes sense.  So it shouldn't be optional in that sense. It was optional mostly because the downstream didn't initially support it and I thought it'd be useful to allow people to write simpler clients.

The cache logic is really quite simple though, so I'm happy enough to make it required.

Re the size limit, my thinking was that there are (unfortunately) real world apps that create and drop tables in production, that have tens of thousands or more schemas that each have hundreds or more tables, etc. They're awful and I wish people wouldn't do that, but they do. Rather than expecting some unknown and unbounded cache size being able to limit the number of tables for which the downstream is required to retain metadata seemed possibly useful. I'm far from wedded to the idea as it requires the upstream to maintain a metadata cache LRU (which it doesn't yet) and send cache eviction messages to the downstream. Cutting that whole idea out would simplify things.




> --- /dev/null
> +++ b/contrib/pglogical_output/pglogical_config.h
> @@ -0,0 +1,55 @@
> +#ifndef PG_LOGICAL_CONFIG_H
> +#define PG_LOGICAL_CONFIG_H
> +
> +#ifndef PG_VERSION_NUM
> +#error <postgres.h> must be included first
> +#endif

Huh?

It's a stray that should've been part of pglogical/compat.h (which will presumably get cut for inclusion in contrib anyway).
 
> +#include "nodes/pg_list.h"
> +#include "pglogical_output.h"
> +
> +inline static bool
> +server_float4_byval(void)
> +{
> +#ifdef USE_FLOAT4_BYVAL
> +     return true;
> +#else
> +     return false;
> +#endif
> +}
> +
> +inline static bool
> +server_float8_byval(void)
> +{
> +#ifdef USE_FLOAT8_BYVAL
> +     return true;
> +#else
> +     return false;
> +#endif
> +}
> +
> +inline static bool
> +server_integer_datetimes(void)
> +{
> +#ifdef USE_INTEGER_DATETIMES
> +     return true;
> +#else
> +     return false;
> +#endif
> +}
> +
> +inline static bool
> +server_bigendian(void)
> +{
> +#ifdef WORDS_BIGENDIAN
> +     return true;
> +#else
> +     return false;
> +#endif
> +}

Not convinced these should exists, and even moreso exposed in a header.

Yeah. The info's needed in both pglogical_config.c and pglogical_output.c but that can probably be avoided.
 
> +/*
> + * Returns Oid of the hooks function specified in funcname.
> + *
> + * Error is thrown if function doesn't exist or doen't return correct datatype
> + * or is volatile.
> + */
> +static Oid
> +get_hooks_function_oid(List *funcname)
> +{
> +     Oid                     funcid;
> +     Oid                     funcargtypes[1];
> +
> +     funcargtypes[0] = INTERNALOID;
> +
> +     /* find the the function */
> +     funcid = LookupFuncName(funcname, 1, funcargtypes, false);
> +
> +     /* Validate that the function returns void */
> +     if (get_func_rettype(funcid) != VOIDOID)
> +     {
> +             ereport(ERROR,
> +                             (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> +                              errmsg("function %s must return void",
> +                                             NameListToString(funcname))));
> +     }

Hm, this seems easy to poke holes into. I mean you later use it like:

> +     if (data->hooks_setup_funcname != NIL)
> +     {
> +             hooks_func = get_hooks_function_oid(data->hooks_setup_funcname);
> +
> +             old_ctxt = MemoryContextSwitchTo(data->hooks_mctxt);
> +             (void) OidFunctionCall1(hooks_func, PointerGetDatum(&data->hooks));
> +             MemoryContextSwitchTo(old_ctxt);

e.g. you basically assume the function the does something reasonable
with those types. Why don't we instead create a 'plogical_hooks' return
type, and have the function return that?

I want to avoid requiring any extension to be loaded for the output plugin, to just have it provide the mechanism for transporting data changes and not start adding types, user catalogs, etc.

That's still OK if the signature in pg_proc is 'returns internal', I just don't want anything visible from SQL.

My other reasons for this approach have been obsoleted now that we're installing the pglogical/hooks.h header in Pg's server includes. Originally I was afraid extensions would have to make a *copy* of the hooks struct definition in their own headers. In that case we'd be able to add entries only strictly at the end of the struct and could only use it by palloc0'ing it and passing a pointer. Of course, I didn't think of the case where the hook defining extension had the bigger of the two definitions...

So yeah. Happy to just return 'internal', DatumGetPointer it and cast to a pointer to our hooks struct.
 
> +     if (func_volatile(funcid) == PROVOLATILE_VOLATILE)
> +     {
> +             ereport(ERROR,
> +                             (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> +                              errmsg("function %s must not be VOLATILE",
> +                                             NameListToString(funcname))));
> +     }

Hm, not sure what that's supposed to achieve. You could argue for
requiring the function to be immutable (i.e. not stable or volatile),
but I'm not sure what that'd achieve.

It's a stupid holdover from the function's use elsewhere, and entirely my fault.
 
> +             old_ctxt = MemoryContextSwitchTo(data->hooks_mctxt);
> +             (void) (*data->hooks.startup_hook)(&args);
> +             MemoryContextSwitchTo(old_ctxt);

What is the hooks memory contexts intended to achieve? It's apparently
never reset. Normally output plugin calbacks are called in more
shortlived memory contexts, for good reason, to avoid leaks....

Mainly to make memory allocated by and used by hooks more visible when debugging, rather than showing it conflated with the main logical decoding context, while still giving hooks somewhere to store state that they may need across the decoding session.

Not going to argue strongly for it. Just seemed like something I'd regret not having when trying to figure out why the decoding context was massive for no apparent reason later...

> +bool
> +call_row_filter_hook(PGLogicalOutputData *data, ReorderBufferTXN *txn,
> +             Relation rel, ReorderBufferChange *change)
> +{
> +     struct  PGLogicalRowFilterArgs hook_args;
> +     MemoryContext old_ctxt;
> +     bool ret = true;
> +
> +     if (data->hooks.row_filter_hook != NULL)
> +     {
> +             hook_args.change_type = change->action;
> +             hook_args.private_data = data->hooks.hooks_private_data;
> +             hook_args.changed_rel = rel;
> +             hook_args.change = change;
> +
> +             elog(DEBUG3, "calling pglogical row filter hook");
> +
> +             old_ctxt = MemoryContextSwitchTo(data->hooks_mctxt);
> +             ret = (*data->hooks.row_filter_hook)(&hook_args);

Why aren't we passing txn to the filter? ISTM it'd be better to
basically reuse/extend the signature by the the original change
callback.

Yeah, probably.

I somewhat wish the original callbacks used struct arguments. Otherwise you land up with fun #ifdef's when supporting multiple Pg versions and it's hard to add new params. Quite annoying when dealing with extensions. OTOH it's presumably faster to use the usual C calling convention.
 
> +/* These must be available to pg_dlsym() */

No the following don't? And they aren't, since they're static functions?
_PG_init and _PG_output_plugin_init need to, but that's it.

Yeah. Total thinko.
 
> +/*
> + * COMMIT callback
> + */
> +void
> +pg_decode_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
> +                                      XLogRecPtr commit_lsn)
> +{

Missing static's?

Yes.
 

> +     /*
> +      * Nobody keeps pointers to entries in this hash table around so
> +      * it's safe to directly HASH_REMOVE the entries as soon as they are
> +      * invalidated. Finding them and flagging them invalid then removing
> +      * them lazily might save some memory churn for tables that get
> +      * repeatedly invalidated and re-sent, but it dodesn't seem worth
> +      * doing.
> +      *
> +      * Getting invalidations for relations that aren't in the table is
> +      * entirely normal, since there's no way to unregister for an
> +      * invalidation event. So we don't care if it's found or not.
> +      */
> +     (void) hash_search(RelMetaCache, &relid, HASH_REMOVE, NULL);
> + }

So, I don't buy this, like at all. The cache entry is passed to
functions, while we call output functions and such. Which in turn can
cause cache invalidations to be processed.

That was a misunderstanding on my part. I hadn't realised that cache invalidations could be fired so easily by apparently innocuous actions, and had assumed incorrectly that we'd get invalidations only outside the scope of a decoding callback, not during one.

Clearly need to fix this with the usual invalid flag based approach. Which in turn makes me agree with your proposal yesterday about adding a generic mechanism for extensions to register their interest in invalidations on tables, attach data to them, and not worry about the details of managing the hash table correctly etc.
 
> +struct PGLRelMetaCacheEntry
> +{
> +     Oid relid;
> +     /* Does the client have this relation cached? */
> +     bool is_cached;
> +     /* Field for API plugin use, must be alloc'd in decoding context */
> +     void *api_private;
> +};

I don't see how api_private can safely be used. At the very least it
needs a lot more documentation about memory lifetime rules and
such. Afaics we'd just forever leak memory atm.

Yeah. It's pretty much just wrong, and since I don't have a compelling reason for it I'm happy enough for it to just go away. Doing it correctly would pretty much require a callback to be registered for freeing it, and ... meh.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: Additional role attributes && superuser review
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Fwd: Core dump with nested CREATE TEMP TABLE