Re: The plan for FDW-based sharding

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: The plan for FDW-based sharding
Дата
Msg-id CA+TgmoacxkktW-y-tCJnTG7N3OwkTdo1eziAuU+w9=oObyOO4w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: The plan for FDW-based sharding  (Craig Ringer <craig@2ndquadrant.com>)
Ответы Re: The plan for FDW-based sharding  (Craig Ringer <craig@2ndquadrant.com>)
Список pgsql-hackers
On Fri, Mar 4, 2016 at 11:17 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> If FDW-based sharding works, I'm happy enough, I have no horse in this race.
> If it doesn't work I don't much care either. What I'm worried about is it if
> works like partitioning using inheritance works - horribly badly, but just
> well enough that it's served as an effective barrier to doing anything
> better.
>
> That's what I want to prevent. Sharding that only-just-works and then stops
> us getting anything better into core.

That's a reasonable worry.  Thanks for articulating it so clearly.
I've thought about that issue and I admit it's both real and serious,
but I've sort of taken the attitude of saying, well, I don't know how
to solve that problem, but there's so much other important work that
needs to be done before we get to the point where that's the blocker
that solving that problem doesn't seem like the most important thing
right now.

The sharding discussion we had in Vienna convinced me that, in the
long run, having PostgreSQL servers talk to other PostgreSQL servers
only using SQL is not going to be a winner.  I believe Postgres-XL has
already done something about that; I think it is passing plans around
directly.  So you could look at that and say - ha, the FDW approach is
a dead end!  But from my point of view, the important thing about the
FDW interface is that it provides a pluggable interface to the
planner.  We can now push down joins and sorts; hopefully soon we will
be able to push down aggregates and limits and so on.  That's the hard
part.  The deparsing code that turns the plan we want to execute in to
an SQL query that can be shipped over the wire is a detail.
Serializing some other on-the-wire representation of what we want the
remote side to do is small potatoes compared to having all of the
logic that lets you decide, in the first instance, what you want the
remote side to do.  I can imagine, in the long term, adding a new
sub-protocol (probably mediated via COPY BOTH) that uses a different
and more expressive on-the-wire representation.

Another foreseeable problem with the FDW approach is that you might
want to have a hash-partitioned table where there are multiple copies
of each piece data and they are spread out across the shards and you
can add and remove shards and the data automatically rebalances.
Table inheritance (or table partitioning) + postgres_fdw doesn't sound
so great in this situation because when you rebalance you need to
change the partitioning constraints and that requires a full table
lock on every node and the whole thing seems likely to end up being
somewhat annoyingly manual and overly constrained by locking.  But I'd
say two things about that.  The first is that I honestly think that
this would be a pretty nice problem to have.  If we had things working
well enough that this was the kind of problem we were trying to
tackle, we'd be light-years ahead of where we are today.  Sure,
everybody hates table inheritance, but I don't think it's right to say
that partitioning work is blocked because table inheritance exists: I
think the problem is that getting true table partitioning correct is
*hard*.  And Amit Langote is working on that and hopefully we will get
there, but it's not an easy problem.  I don't think sharding is an
easy problem either, and I think getting to a point where ease-of-use
is our big limiting factor would actually be better than the current
scenario where "it doesn't work at all" is the limiting factor.  I
don't want that to *block* other approaches, BUT I also think that
anybody who tries to start over from scratch and ignore all the good
work that has been done in FDW-land is not going to have a very fun
time.

The second thing I want to say about this problem is that I don't want
to presume that it's not a *solvable* problem.  Just because we use
the FDW technology as a base doesn't mean we can't invent new and
quite different stuff along the way.  One idea I've been toying with
is trying to create some notion of a "distributed" table.  This would
be a new relkind.  You'd have a single relation at the SQL level, not
an inheritance hierarchy, but under the hood the data would be spread
across a bunch of remote servers using the FDW interface.  So then you
reuse all of the query planner work and other enhancements that have
been put into the FDW stuff, but you'd present a much cleaner user
interface.  Or, maybe better, you could create a new FDW,
sharding_fdw, that works like postgres_fdw except that instead of
putting the data on one particular foreign server, it spreads the data
out across multiple servers and manages the sharding process under the
hood.  That would, again, let you reuse a lot of the work that's been
done to improve the FDW infrastructure while creating something
significantly more powerful than what postgres_fdw is today.  I don't
know, I don't have any ideas about this.  I think your concern is
valid, and I share it.  But I just fundamentally believe that it's
better to enhance what we have than to start inventing totally new
abstractions.  The FDW API is *really* powerful, and getting more
powerful, and I just have a very hard time believing that starting
over will be better.  Somebody can do that if they like and I'm not
gonna get in the way, but if it's got problems that could have been
avoided by basing that same work on the FDW stuff we've already got, I
do plan to point that out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: [PROPOSAL] VACUUM Progress Checker.
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Relation extension scalability