Обсуждение: A query planner that learns

Поиск
Список
Период
Сортировка

A query planner that learns

От
Scott Marlowe
Дата:
While all the talk of a hinting system over in hackers and perform is
good, and I have a few queries that could live with a simple hint system
pop up now and again, I keep thinking that a query planner that learns
from its mistakes over time is far more desirable.

Is it reasonable or possible for the system to have a way to look at
query plans it's run and look for obvious mistakes its made, like being
off by a factor of 10 or more in estimations, and slowly learn to apply
its own hints?

Seems to me that would be far more useful than my having to babysit the
queries that are running slow and come up with hints to have the
database do what I want.

I already log slow queries and review them once a week by running them
with explain analyze and adjust what little I can, like stats targets
and such.

It seems to me the first logical step would be having the ability to
flip a switch and when the postmaster hits a slow query, it saves both
the query that ran long, as well as the output of explain or explain
analyze or some bastardized version missing some of the inner timing
info.  Even just saving the parts of the plan where the planner thought
it would get 1 row and got instead 350,000 and was using a nested loop
to join would be VERY useful.  I could see something like that
eventually evolving into a self tuning system.

Well, I'm busy learning to be an Oracle DBA right now, so I can't do
it.  But it would be a very cool project for the next college student
who shows up looking for one.

Re: A query planner that learns

От
"Jim C. Nasby"
Дата:
On Thu, Oct 12, 2006 at 03:31:50PM -0500, Scott Marlowe wrote:
> While all the talk of a hinting system over in hackers and perform is
> good, and I have a few queries that could live with a simple hint system
> pop up now and again, I keep thinking that a query planner that learns
> from its mistakes over time is far more desirable.
>
> Is it reasonable or possible for the system to have a way to look at
> query plans it's run and look for obvious mistakes its made, like being
> off by a factor of 10 or more in estimations, and slowly learn to apply
> its own hints?
>
> Seems to me that would be far more useful than my having to babysit the
> queries that are running slow and come up with hints to have the
> database do what I want.
>
> I already log slow queries and review them once a week by running them
> with explain analyze and adjust what little I can, like stats targets
> and such.
>
> It seems to me the first logical step would be having the ability to
> flip a switch and when the postmaster hits a slow query, it saves both
> the query that ran long, as well as the output of explain or explain
> analyze or some bastardized version missing some of the inner timing
> info.  Even just saving the parts of the plan where the planner thought
> it would get 1 row and got instead 350,000 and was using a nested loop
> to join would be VERY useful.  I could see something like that
> eventually evolving into a self tuning system.

Saves it and then... does what? That's the whole key...

> Well, I'm busy learning to be an Oracle DBA right now, so I can't do
> it.  But it would be a very cool project for the next college student
> who shows up looking for one.

Why? There's a huge demand for PostgreSQL experts out there... or is
this for a current job?
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: A query planner that learns

От
Scott Marlowe
Дата:
On Thu, 2006-10-12 at 17:14, Jim C. Nasby wrote:
> On Thu, Oct 12, 2006 at 03:31:50PM -0500, Scott Marlowe wrote:
> > While all the talk of a hinting system over in hackers and perform is
> > good, and I have a few queries that could live with a simple hint system
> > pop up now and again, I keep thinking that a query planner that learns
> > from its mistakes over time is far more desirable.
> >
> > Is it reasonable or possible for the system to have a way to look at
> > query plans it's run and look for obvious mistakes its made, like being
> > off by a factor of 10 or more in estimations, and slowly learn to apply
> > its own hints?
> >
> > Seems to me that would be far more useful than my having to babysit the
> > queries that are running slow and come up with hints to have the
> > database do what I want.
> >
> > I already log slow queries and review them once a week by running them
> > with explain analyze and adjust what little I can, like stats targets
> > and such.
> >
> > It seems to me the first logical step would be having the ability to
> > flip a switch and when the postmaster hits a slow query, it saves both
> > the query that ran long, as well as the output of explain or explain
> > analyze or some bastardized version missing some of the inner timing
> > info.  Even just saving the parts of the plan where the planner thought
> > it would get 1 row and got instead 350,000 and was using a nested loop
> > to join would be VERY useful.  I could see something like that
> > eventually evolving into a self tuning system.
>
> Saves it and then... does what? That's the whole key...

It's meant as a first step.  I could certainly use a daily report on
which queries had bad plans so I'd know which ones to investigate
without having to run them each myself in explain analyze.  Again, my
point was to do it incrementally.  This is something someone could do
now, and someone could build on later.

To start with, it does nothing.  Just saves it for the DBA to look at.
Later, it could feed any number of the different hinting systems people
have been proposing.

It may well be that by first looking at the data collected from problems
queries, the solution for how to adjust the planner becomes more
obvious.

> > Well, I'm busy learning to be an Oracle DBA right now, so I can't do
> > it.  But it would be a very cool project for the next college student
> > who shows up looking for one.
>
> Why? There's a huge demand for PostgreSQL experts out there... or is
> this for a current job?

Long story.  I do get to do lots of pgsql stuff.  But right now I'm
learning Oracle as well, cause we use both DBs.  It's just that since I
know pgsql pretty well, and know oracle hardly at all, Oracle is taking
up lots more of my time.

Re: A query planner that learns

От
Erik Jones
Дата:
Forgive me if I'm way off here as I'm not all that familiar with the
internals of postgres, but isn't this what the genetic query optimizer
discussed the one of the manual's appendixes is supposed to do.  Or, is
that more about using the the known costs of atomic operations that make
up a query plan, i.e. more of a bottom up approach than what you're
discussing.  If it's the latter, then it sounds like what you're looking
for is a classifier system.  See <a
href="http://en.wikipedia.org/wiki/Learning_classifier_system">this</a>
wikepedia article for a short description of them along with a couple of
reference links, you can google for many more.

Scott Marlowe wrote:
> On Thu, 2006-10-12 at 17:14, Jim C. Nasby wrote:
>
>> On Thu, Oct 12, 2006 at 03:31:50PM -0500, Scott Marlowe wrote:
>>
>>> While all the talk of a hinting system over in hackers and perform is
>>> good, and I have a few queries that could live with a simple hint system
>>> pop up now and again, I keep thinking that a query planner that learns
>>> from its mistakes over time is far more desirable.
>>>
>>> Is it reasonable or possible for the system to have a way to look at
>>> query plans it's run and look for obvious mistakes its made, like being
>>> off by a factor of 10 or more in estimations, and slowly learn to apply
>>> its own hints?
>>>
>>> Seems to me that would be far more useful than my having to babysit the
>>> queries that are running slow and come up with hints to have the
>>> database do what I want.
>>>
>>> I already log slow queries and review them once a week by running them
>>> with explain analyze and adjust what little I can, like stats targets
>>> and such.
>>>
>>> It seems to me the first logical step would be having the ability to
>>> flip a switch and when the postmaster hits a slow query, it saves both
>>> the query that ran long, as well as the output of explain or explain
>>> analyze or some bastardized version missing some of the inner timing
>>> info.  Even just saving the parts of the plan where the planner thought
>>> it would get 1 row and got instead 350,000 and was using a nested loop
>>> to join would be VERY useful.  I could see something like that
>>> eventually evolving into a self tuning system.
>>>
>>
>> Saves it and then... does what? That's the whole key...
>>
>
> It's meant as a first step.  I could certainly use a daily report on
> which queries had bad plans so I'd know which ones to investigate
> without having to run them each myself in explain analyze.  Again, my
> point was to do it incrementally.  This is something someone could do
> now, and someone could build on later.
>
> To start with, it does nothing.  Just saves it for the DBA to look at.
> Later, it could feed any number of the different hinting systems people
> have been proposing.
>
> It may well be that by first looking at the data collected from problems
> queries, the solution for how to adjust the planner becomes more
> obvious.
>
>
>>> Well, I'm busy learning to be an Oracle DBA right now, so I can't do
>>> it.  But it would be a very cool project for the next college student
>>> who shows up looking for one.
>>>
>> Why? There's a huge demand for PostgreSQL experts out there... or is
>> this for a current job?
>>
>
> Long story.  I do get to do lots of pgsql stuff.  But right now I'm
> learning Oracle as well, cause we use both DBs.  It's just that since I
> know pgsql pretty well, and know oracle hardly at all, Oracle is taking
> up lots more of my time.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>


--
erik jones <erik@myemma.com>
software development
emma(r)


Re: A query planner that learns

От
"John D. Burger"
Дата:
Erik Jones wrote:

> Forgive me if I'm way off here as I'm not all that familiar with
> the internals of postgres, but isn't this what the genetic query
> optimizer discussed the one of the manual's appendixes is supposed
> to do.

No - it's not an "optimizer" in that sense.  When there are a small
enough set of tables involved, the planner uses a dynamic programming
algorithm to explore the entire space of all possible plans.  But the
space grows exponentially (I think) with the number of tables - when
this would take too long, the planner switches to a genetic algorithm
approach, which explores a small fraction of the plan space, in a
guided manner.

But with both approaches, the planner is just using the static
statistics gathered by ANALYZE to estimate the cost of each candidate
plan, and these statistics are based on sampling your data - they may
be wrong, or at least misleading.  (In particular, the statistic for
total number of unique values is frequently =way= off, per a recent
thread here.  I have been reading about this, idly thinking about how
to improve the estimate.)

The idea of a learning planner, I suppose, would be one that examines
cases where these statistics lead to very misguided expectations.
The simplest version of a "learning" planner could simply bump up the
statistics targets on certain columns.  A slightly more sophisticated
idea would be for some of the statistics to optionally use parametric
modeling (this column is a Gaussian, let's estimate the mean and
variance, this one is a Beta distribution ...).  Then the smarter
planner could spend some cycles applying more sophisticated
statistical modeling to problematic tables/columns.

- John D. Burger
   MITRE


Re: A query planner that learns

От
AgentM
Дата:
On Oct 13, 2006, at 11:47 , John D. Burger wrote:

> Erik Jones wrote:
>
>> Forgive me if I'm way off here as I'm not all that familiar with
>> the internals of postgres, but isn't this what the genetic query
>> optimizer discussed the one of the manual's appendixes is supposed
>> to do.
>
> No - it's not an "optimizer" in that sense.  When there are a small
> enough set of tables involved, the planner uses a dynamic
> programming algorithm to explore the entire space of all possible
> plans.  But the space grows exponentially (I think) with the number
> of tables - when this would take too long, the planner switches to
> a genetic algorithm approach, which explores a small fraction of
> the plan space, in a guided manner.
>
> But with both approaches, the planner is just using the static
> statistics gathered by ANALYZE to estimate the cost of each
> candidate plan, and these statistics are based on sampling your
> data - they may be wrong, or at least misleading.  (In particular,
> the statistic for total number of unique values is frequently =way=
> off, per a recent thread here.  I have been reading about this,
> idly thinking about how to improve the estimate.)
>
> The idea of a learning planner, I suppose, would be one that
> examines cases where these statistics lead to very misguided
> expectations.  The simplest version of a "learning" planner could
> simply bump up the statistics targets on certain columns.  A
> slightly more sophisticated idea would be for some of the
> statistics to optionally use parametric modeling (this column is a
> Gaussian, let's estimate the mean and variance, this one is a Beta
> distribution ...).  Then the smarter planner could spend some
> cycles applying more sophisticated statistical modeling to
> problematic tables/columns.

One simple first step would be to run an ANALYZE whenever a
sequential scan is executed. Is there a reason not to do this? It
could be controlled by a GUC variable in case someone wants
repeatable plans.

Further down the line, statistics could be collected during the
execution of any query- updating histograms on delete and update, as
well.

-M

Re: A query planner that learns

От
"Jim C. Nasby"
Дата:
On Thu, Oct 12, 2006 at 05:39:20PM -0500, Scott Marlowe wrote:
> > > It seems to me the first logical step would be having the ability to
> > > flip a switch and when the postmaster hits a slow query, it saves both
> > > the query that ran long, as well as the output of explain or explain
> > > analyze or some bastardized version missing some of the inner timing
> > > info.  Even just saving the parts of the plan where the planner thought
> > > it would get 1 row and got instead 350,000 and was using a nested loop
> > > to join would be VERY useful.  I could see something like that
> > > eventually evolving into a self tuning system.
> >
> > Saves it and then... does what? That's the whole key...
>
> It's meant as a first step.  I could certainly use a daily report on
> which queries had bad plans so I'd know which ones to investigate
> without having to run them each myself in explain analyze.  Again, my
> point was to do it incrementally.  This is something someone could do
> now, and someone could build on later.
>
> To start with, it does nothing.  Just saves it for the DBA to look at.
> Later, it could feed any number of the different hinting systems people
> have been proposing.
>
> It may well be that by first looking at the data collected from problems
> queries, the solution for how to adjust the planner becomes more
> obvious.

Yeah, that would be useful to have. The problem I see is storing that
info in a format that's actually useful... and I'm thinking that a
logfile doesn't qualify since you can't really query it.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: A query planner that learns

От
"Jim C. Nasby"
Дата:
On Fri, Oct 13, 2006 at 11:53:15AM -0400, AgentM wrote:
> One simple first step would be to run an ANALYZE whenever a
> sequential scan is executed. Is there a reason not to do this? It

Yes. You want a seqscan on a small (couple pages) table, and ANALYZE has
a very high overhead on some platforms.

Just recording the query plan and actual vs estimated rowcounts would be
a good start, though. And useful to DBA's, provided you had some means
to query against it.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: A query planner that learns

От
Scott Marlowe
Дата:
On Fri, 2006-10-13 at 12:48, Jim C. Nasby wrote:
> On Thu, Oct 12, 2006 at 05:39:20PM -0500, Scott Marlowe wrote:
> > > > It seems to me the first logical step would be having the ability to
> > > > flip a switch and when the postmaster hits a slow query, it saves both
> > > > the query that ran long, as well as the output of explain or explain
> > > > analyze or some bastardized version missing some of the inner timing
> > > > info.  Even just saving the parts of the plan where the planner thought
> > > > it would get 1 row and got instead 350,000 and was using a nested loop
> > > > to join would be VERY useful.  I could see something like that
> > > > eventually evolving into a self tuning system.
> > >
> > > Saves it and then... does what? That's the whole key...
> >
> > It's meant as a first step.  I could certainly use a daily report on
> > which queries had bad plans so I'd know which ones to investigate
> > without having to run them each myself in explain analyze.  Again, my
> > point was to do it incrementally.  This is something someone could do
> > now, and someone could build on later.
> >
> > To start with, it does nothing.  Just saves it for the DBA to look at.
> > Later, it could feed any number of the different hinting systems people
> > have been proposing.
> >
> > It may well be that by first looking at the data collected from problems
> > queries, the solution for how to adjust the planner becomes more
> > obvious.
>
> Yeah, that would be useful to have. The problem I see is storing that
> info in a format that's actually useful... and I'm thinking that a
> logfile doesn't qualify since you can't really query it.

grep / sed / awk can do amazing things to a text file.

I'd actually recommend URL encoding (or something like that) so they'd
be single lines, then you could grep for certain things and feed the
lines to a simple de-encoder.

We do it with our log files at work and can search through some fairly
large files for the exact entry we need fairly quickly.

Re: A query planner that learns

От
"Guy Rouillier"
Дата:
Jim C. Nasby wrote:

> Just recording the query plan and actual vs estimated rowcounts would
> be a good start, though. And useful to DBA's, provided you had some
> means to query against it.

If the DBMS stores the incoming query (or some parsed version of it) and
identifies frequency of usage over time, it can then spend spare cycles
more deeply analyzing frequently used queries.  Many DB servers have
usage patterns just like end-user workstations (not all, I realize):
they are busy for predictable periods of the day and fairly idle at
other times.  To provide acceptable response times, DBMSs don't have the
luxury of analyzing numerous query paths when queries are received.  But
they could do this offline so that the next time it sees the same query,
it can use a better-optimized plan.

Storing the query itself is probably a better idea than storing the
plan, since the plan may change over time.

--
Guy Rouillier


Re: A query planner that learns

От
Ron Johnson
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/13/06 10:47, John D. Burger wrote:
> Erik Jones wrote:
>
[snip]
> But with both approaches, the planner is just using the static
> statistics gathered by ANALYZE to estimate the cost of each candidate
> plan, and these statistics are based on sampling your data - they may be
> wrong, or at least misleading.  (In particular, the statistic for total
> number of unique values is frequently =way= off, per a recent thread
> here.  I have been reading about this, idly thinking about how to
> improve the estimate.)

What about an ANALYZE FULL <table>, reading every record in the
table, and ever node in every index, storing in pg_statistic (or
some new, similar table) such items as the AVG and STD of the number
of records in each page, and b-tree depth, keys per node, records
per key and per segment?  Maybe even "average distance between pages
in the tablespace".

This would let the optimizer know things like "the value of the
field which is first segment of an index (and which is the only part
of the index in the WHERE clause) describes 75% of the rows in the
table, and the records are all packed in tightly in the pages, and
the pages are close together", so the optimizer could decide "a
table scan would be much more efficient".

In some ways, this would be similar in functionality to the existing
histogram created by ANALYZE, but would provide a slightly different
picture.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFMcu5S9HxQb37XmcRAvVJAJ0VFfEoxwrKn15VqPaZz54SNY4tPACg47zB
r3hZ+HqHE/1bCJK/xNZzNRE=
=OP9+
-----END PGP SIGNATURE-----

Re: A query planner that learns

От
Alban Hertroys
Дата:
Jim C. Nasby wrote:
> On Thu, Oct 12, 2006 at 05:39:20PM -0500, Scott Marlowe wrote:
>> It may well be that by first looking at the data collected from problems
>> queries, the solution for how to adjust the planner becomes more
>> obvious.
>
> Yeah, that would be useful to have. The problem I see is storing that
> info in a format that's actually useful... and I'm thinking that a
> logfile doesn't qualify since you can't really query it.

You'd need something that contains the query plan (obviously), but also
all relevant information about the underlying data model and data
distribution. Some meta-data, like the version of PostgreSQL, is
probably required as well.

The current statistics contain some of this information, but from
reading this list I know that that's rarely enough information to
determine an error made by the planner.

Regards,
--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //

Re: A query planner that learns

От
Jochem van Dieten
Дата:
Scott Marlowe wrote:
> While all the talk of a hinting system over in hackers and perform is
> good, and I have a few queries that could live with a simple hint system
> pop up now and again, I keep thinking that a query planner that learns
> from its mistakes over time is far more desirable.
>
> Is it reasonable or possible for the system to have a way to look at
> query plans it's run and look for obvious mistakes its made, like being
> off by a factor of 10 or more in estimations, and slowly learn to apply
> its own hints?

Technically it is very feasible. But I think you might want to check US
Patent 6,763,359 before you start writing any code.


> It seems to me the first logical step would be having the ability to
> flip a switch and when the postmaster hits a slow query, it saves both
> the query that ran long, as well as the output of explain or explain
> analyze or some bastardized version missing some of the inner timing
> info.  Even just saving the parts of the plan where the planner thought
> it would get 1 row and got instead 350,000 and was using a nested loop
> to join would be VERY useful.  I could see something like that
> eventually evolving into a self tuning system.

I think it would be a good start if we can specify a
log_selectivity_error_threshold and if estimates are more then that
factor off, the query, parameters and planner estimates get logged for
later analysis. That would be driven entirely by selectivity estimates
and not (estimated) cost since cost is influenced by outside factors
such as other processes competing for resources. If a system for
statistical hints emerges from the current discussion we would indeed
have the input to start tuning the selectivity estimations.

Jochem


Re: A query planner that learns

От
Alvaro Herrera
Дата:
Jochem van Dieten wrote:
> Scott Marlowe wrote:
> >While all the talk of a hinting system over in hackers and perform is
> >good, and I have a few queries that could live with a simple hint system
> >pop up now and again, I keep thinking that a query planner that learns
> >from its mistakes over time is far more desirable.
> >
> >Is it reasonable or possible for the system to have a way to look at
> >query plans it's run and look for obvious mistakes its made, like being
> >off by a factor of 10 or more in estimations, and slowly learn to apply
> >its own hints?
>
> Technically it is very feasible. But I think you might want to check US
> Patent 6,763,359 before you start writing any code.

I think it would be a very good idea if you guys stopped looking at the
US patent database.  It does no good to anyone.  There's no way we can
avoid stomping on a patent or another -- there are patents for everything.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: A query planner that learns

От
"John D. Burger"
Дата:
Jochem van Dieten wrote:

> I think you might want to check US Patent 6,763,359 before you
> start writing any code.

http://tinyurl.com/yzjdve

- John D. Burger
   MITRE


Re: A query planner that learns

От
Madison Kelly
Дата:
Alvaro Herrera wrote:
> Jochem van Dieten wrote:
>> Scott Marlowe wrote:
>>> While all the talk of a hinting system over in hackers and perform is
>>> good, and I have a few queries that could live with a simple hint system
>>> pop up now and again, I keep thinking that a query planner that learns
>> >from its mistakes over time is far more desirable.
>>> Is it reasonable or possible for the system to have a way to look at
>>> query plans it's run and look for obvious mistakes its made, like being
>>> off by a factor of 10 or more in estimations, and slowly learn to apply
>>> its own hints?
>> Technically it is very feasible. But I think you might want to check US
>> Patent 6,763,359 before you start writing any code.
>
> I think it would be a very good idea if you guys stopped looking at the
> US patent database.  It does no good to anyone.  There's no way we can
> avoid stomping on a patent or another -- there are patents for everything.
>

Hasn't IBM release a pile of it's patents for use (or at least stated
they won't sue) to OSS projects? If so, is this patent covered by that
"amnesty"?

Simply ignoring patents because "there is a patent for everything" is a
recipe for disaster. Companies like MS are running out of ways to tear
open OSS and they are certainly not above (below?) suing the heck out of
OSS projects for patent infringement.

What's needed is reform in the USPO. Call you congress (wo)man and
complain, but don't flaunt the law; you will lose.

Madi

Re: A query planner that learns

От
Alvaro Herrera
Дата:
Madison Kelly wrote:
> Alvaro Herrera wrote:
> >Jochem van Dieten wrote:
> >>Scott Marlowe wrote:
> >>>While all the talk of a hinting system over in hackers and perform is
> >>>good, and I have a few queries that could live with a simple hint system
> >>>pop up now and again, I keep thinking that a query planner that learns
> >>>from its mistakes over time is far more desirable.
> >>>Is it reasonable or possible for the system to have a way to look at
> >>>query plans it's run and look for obvious mistakes its made, like being
> >>>off by a factor of 10 or more in estimations, and slowly learn to apply
> >>>its own hints?
> >>Technically it is very feasible. But I think you might want to check US
> >>Patent 6,763,359 before you start writing any code.
> >
> >I think it would be a very good idea if you guys stopped looking at the
> >US patent database.  It does no good to anyone.  There's no way we can
> >avoid stomping on a patent or another -- there are patents for everything.
>
> Hasn't IBM release a pile of it's patents for use (or at least stated
> they won't sue) to OSS projects? If so, is this patent covered by that
> "amnesty"?

This is useless as a policy, because we have plenty of companies basing
their proprietary code on PostgreSQL, which wouldn't be subject to the
grant (EnterpriseDB, Command Prompt, Fujitsu, SRA).  We do support them.


> Simply ignoring patents because "there is a patent for everything" is a
> recipe for disaster. Companies like MS are running out of ways to tear
> open OSS and they are certainly not above (below?) suing the heck out of
> OSS projects for patent infringement.

It has been said that unknowingly infringing a patent is much less
problematic than knowingly doing same.  We don't have the manpower to
implement the whole Postgres without infringing a single patent, so the
best approach is to refrain from researching possible patents applying
to us in the first place.

If people comes here and points at patents that we infringe or may
infringe, it will cause much more (useless) work for hackers which then
have to search alternative ways of doing the same things.


> What's needed is reform in the USPO. Call you congress (wo)man and
> complain, but don't flaunt the law; you will lose.

I agree.  However, I am not an US inhabitant in the first place, and
bless my parents for that.  Heck, I was even denied a visa -- twice.
Please do us all a favor and write to your congresspeople.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: A query planner that learns

От
Madison Kelly
Дата:
Alvaro Herrera wrote:
>> Hasn't IBM release a pile of it's patents for use (or at least stated
>> they won't sue) to OSS projects? If so, is this patent covered by that
>> "amnesty"?
>
> This is useless as a policy, because we have plenty of companies basing
> their proprietary code on PostgreSQL, which wouldn't be subject to the
> grant (EnterpriseDB, Command Prompt, Fujitsu, SRA).  We do support them.

More specifically, it is then up to the 3rd party (non-OSS) developers
to clear the patents. It's not the PgSQL team's problem in this case
(assuming it's the case at all).

>> Simply ignoring patents because "there is a patent for everything" is a
>> recipe for disaster. Companies like MS are running out of ways to tear
>> open OSS and they are certainly not above (below?) suing the heck out of
>> OSS projects for patent infringement.
>
> It has been said that unknowingly infringing a patent is much less
> problematic than knowingly doing same.  We don't have the manpower to
> implement the whole Postgres without infringing a single patent, so the
> best approach is to refrain from researching possible patents applying
> to us in the first place.
>
> If people comes here and points at patents that we infringe or may
> infringe, it will cause much more (useless) work for hackers which then
> have to search alternative ways of doing the same things.

"Plausible Deniability" and all that jazz. There is another truism
though; "Ignorance of the law is no excuse". Besides, claiming ignorance
doesn't keep you out of the courts in the first place. The people who
would attack OSS applications generally have very, very deep pockets and
can run a project out of money before the trial was over. They could do
that non-the-less (SCO, hello?) but I still suggest NOT tempting fate.

It's unfortunate that this legal mess causes the developers more
headaches than they need, but it still can't be ignored, imho.

>> What's needed is reform in the USPO. Call you congress (wo)man and
>> complain, but don't flaunt the law; you will lose.
>
> I agree.  However, I am not an US inhabitant in the first place, and
> bless my parents for that.  Heck, I was even denied a visa -- twice.
> Please do us all a favor and write to your congresspeople.

Heh, I'm not an American either, so I can't do anything but shake my
head (and be equally glad that my own personal OSS program is here in
Canada).

American industry wonders why they are losing so many IT positions...
It's such a difficult and unfriendly environment there for anyone but
the biggest companies. Far too litigious.

Madi

Re: A query planner that learns

От
AgentM
Дата:
On Oct 16, 2006, at 16:17 , Madison Kelly wrote:

> Alvaro Herrera wrote:
>> Jochem van Dieten wrote:
>>> Scott Marlowe wrote:
>>>> While all the talk of a hinting system over in hackers and
>>>> perform is
>>>> good, and I have a few queries that could live with a simple
>>>> hint system
>>>> pop up now and again, I keep thinking that a query planner that
>>>> learns
>>> >from its mistakes over time is far more desirable.
>>>> Is it reasonable or possible for the system to have a way to
>>>> look at
>>>> query plans it's run and look for obvious mistakes its made,
>>>> like being
>>>> off by a factor of 10 or more in estimations, and slowly learn
>>>> to apply
>>>> its own hints?
>>> Technically it is very feasible. But I think you might want to
>>> check US Patent 6,763,359 before you start writing any code.
>> I think it would be a very good idea if you guys stopped looking
>> at the
>> US patent database.  It does no good to anyone.  There's no way we
>> can
>> avoid stomping on a patent or another -- there are patents for
>> everything.
>
> Hasn't IBM release a pile of it's patents for use (or at least
> stated they won't sue) to OSS projects? If so, is this patent
> covered by that "amnesty"?
>
> Simply ignoring patents because "there is a patent for everything"
> is a recipe for disaster. Companies like MS are running out of ways
> to tear open OSS and they are certainly not above (below?) suing
> the heck out of OSS projects for patent infringement.
>
> What's needed is reform in the USPO. Call you congress (wo)man and
> complain, but don't flaunt the law; you will lose.

Alvaro's advice is sound. If the patent holder can prove that a
developer looked at a patent (for example, from an email in a mailing
list archive) and the project proceeded with the implementation
regardless, malice can been shown and "damages" can be substantially
higher. You're screwed either way but your safest bet is to never
look at patents.

Disclaimer: I am not a lawyer- I don't even like lawyers.

-M

Re: A query planner that learns

От
Madison Kelly
Дата:
AgentM wrote:
> Alvaro's advice is sound. If the patent holder can prove that a
> developer looked at a patent (for example, from an email in a mailing
> list archive) and the project proceeded with the implementation
> regardless, malice can been shown and "damages" can be substantially
> higher. You're screwed either way but your safest bet is to never look
> at patents.
>
> Disclaimer: I am not a lawyer- I don't even like lawyers.


Nor am I a lawyer, but I still hold that hoping "ignorance" will be a
decent defense is very, very risky. In the end I am not a pgSQL
developer so it isn't in my hands either way.

Madi

Re: A query planner that learns

От
"Brian Mathis"
Дата:
On 10/17/06, Madison Kelly <linux@alteeve.com> wrote:
AgentM wrote:
> Alvaro's advice is sound. If the patent holder can prove that a
> developer looked at a patent (for example, from an email in a mailing
> list archive) and the project proceeded with the implementation
> regardless, malice can been shown and "damages" can be substantially
> higher. You're screwed either way but your safest bet is to never look
> at patents.
>
> Disclaimer: I am not a lawyer- I don't even like lawyers.


Nor am I a lawyer, but I still hold that hoping "ignorance" will be a
decent defense is very, very risky. In the end I am not a pgSQL
developer so it isn't in my hands either way.

Madi



I also am NAL, but I know enough about the patent system (in the US) to know that ignorance *IS* a defense.  If you are ignorant of the patent, you only have to pay the damages.  If you knew about the patent and did it anyway, you have to pay *triple* damages.  Ignorance will save you lots of money.

You may not like it, but that's the way it is.

Re: A query planner that learns

От
Madison Kelly
Дата:
Brian Mathis wrote:
> I also am NAL, but I know enough about the patent system (in the US) to
> know that ignorance *IS* a defense.  If you are ignorant of the patent,
> you only have to pay the damages.  If you knew about the patent and did
> it anyway, you have to pay *triple* damages.  Ignorance will save you
> lots of money.
>
> You may not like it, but that's the way it is.
>

I got that part. :) If you _do_ end up in court, plausible deniability
helps.

My position though is that it is better, in the long term, to be aware
of the patents and take the time to work around them so that *no*
damages need to be paid. Or, as might be that chance in this case, to
get a written "okay" from the patent holder for the use of the methods
protected by the patent in a given program.

Colour me funny, but wouldn't staying out of the courts in the first
place not be the best option?

Madi

Re: A query planner that learns

От
"Brian Mathis"
Дата:
On 10/17/06, Madison Kelly <linux@alteeve.com> wrote:
Brian Mathis wrote:
> I also am NAL, but I know enough about the patent system (in the US) to
> know that ignorance *IS* a defense.  If you are ignorant of the patent,
> you only have to pay the damages.  If you knew about the patent and did
> it anyway, you have to pay *triple* damages.  Ignorance will save you
> lots of money.
>
> You may not like it, but that's the way it is.
>

I got that part. :) If you _do_ end up in court, plausible deniability
helps.

My position though is that it is better, in the long term, to be aware
of the patents and take the time to work around them so that *no*
damages need to be paid. Or, as might be that chance in this case, to
get a written "okay" from the patent holder for the use of the methods
protected by the patent in a given program.

Colour me funny, but wouldn't staying out of the courts in the first
place not be the best option?

Madi

Yes, good idea :)

Re: A query planner that learns

От
Scott Ribe
Дата:
> Colour me funny, but wouldn't staying out of the courts in the first
> place not be the best option?

Yes, however some people feel that given the way the patent office is
spewing huge quantities of patents, many on old well-known techniques, and
the the absurd difficulty of reading patent claims, and the deliberately
vague language used in the hope of broadening the scope later, that it is
actually not possible to keep track of them, therefore not possible to avoid
infringement with any certainty. So, better to fall back to the 2nd-best
position...

It's a difficult question to answer because there are 0 good answers.

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: A query planner that learns

От
Alvaro Herrera
Дата:
Madison Kelly wrote:
> Brian Mathis wrote:
> >I also am NAL, but I know enough about the patent system (in the US) to
> >know that ignorance *IS* a defense.  If you are ignorant of the patent,
> >you only have to pay the damages.  If you knew about the patent and did
> >it anyway, you have to pay *triple* damages.  Ignorance will save you
> >lots of money.
> >
> >You may not like it, but that's the way it is.
> >
>
> I got that part. :) If you _do_ end up in court, plausible deniability
> helps.
>
> My position though is that it is better, in the long term, to be aware
> of the patents and take the time to work around them so that *no*
> damages need to be paid. Or, as might be that chance in this case, to
> get a written "okay" from the patent holder for the use of the methods
> protected by the patent in a given program.
>
> Colour me funny, but wouldn't staying out of the courts in the first
> place not be the best option?

Yeah.  I invite you to do all the extra (useless) development work
required.  But please do not charge other people with it.  Whoever
investigates patents and lets pgsql-hackers know about them, is charging
the Postgres community with that work.  We sure don't need it.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: A query planner that learns

От
Madison Kelly
Дата:
Alvaro Herrera wrote:
> Yeah.  I invite you to do all the extra (useless) development work
> required.  But please do not charge other people with it.  Whoever
> investigates patents and lets pgsql-hackers know about them, is charging
> the Postgres community with that work.  We sure don't need it.

As a developer (different project that uses pgsql), I am no more happy
about the current status of the USPO than you are. I think it's a giant
cess-pool of greed, ignorance and waste... BUT... it is currently the
law in the USA.

How fast would Oracle, Microsoft or others jump on a chance to tear
PostgreSQL apart by dragging it through the courts and crushing it under
any fines (regardless of the amount)? If that happened, *all* of the
pgsql-hacker's time would be wasted, never mind the countless other
projects that rely on PgSQL.

As Scott said, there are 0 good answers to this problem. If PgSQL is
going to be active in the US then it has to play by the (bent and
twisted) rules. That is why I started my comments with "call your
congress(wo)man". Simply ignoring that which you don't like is just far
to risky with the sharks in our waters.

It's extra work, and it's unfair to the developers who already put their
free time into this program, but IMHO it's the only safe way to go.
Again though, not being a pgsql-hacker my comments here are just that.

Madi

Re: A query planner that learns

От
AgentM
Дата:
On Oct 17, 2006, at 10:46 , Madison Kelly wrote:

> Brian Mathis wrote:
>> I also am NAL, but I know enough about the patent system (in the
>> US) to know that ignorance *IS* a defense.  If you are ignorant of
>> the patent, you only have to pay the damages.  If you knew about
>> the patent and did it anyway, you have to pay *triple* damages.
>> Ignorance will save you lots of money.
>> You may not like it, but that's the way it is.
>
> I got that part. :) If you _do_ end up in court, plausible
> deniability helps.
>
> My position though is that it is better, in the long term, to be
> aware of the patents and take the time to work around them so that
> *no* damages need to be paid. Or, as might be that chance in this
> case, to get a written "okay" from the patent holder for the use of
> the methods protected by the patent in a given program.
>
> Colour me funny, but wouldn't staying out of the courts in the
> first place not be the best option?

That would be a nice, but naïve, approach. It is likely that (without
admitting any guilt by any party) postgresql already steps on some
patents. In fact, any project you can think of likely steps on some
patents. There are patents on network communication: the "getting a
message from a server to client" sort of thing.

If you spent the next twenty years searching through patents and
creating patches for postgresql to circumvent the patents, not only
would you turn postgresql into a shriveled raisin of its current
self, you would be exposing postgresql to greater damages than if you
had never looked at the patents.

***Please- for the safety of everyone on these lists- do not mention
or link to any specific patents. This mailing list has a publicly-
accessible archive which could be used against me or you (at least in
the USA).***

Best regards,
M

Re: A query planner that learns

От
Rafal Pietrak
Дата:
On Tue, 2006-10-17 at 10:24 -0400, Madison Kelly wrote:
> Nor am I a lawyer, but I still hold that hoping "ignorance" will be a
> decent defense is very, very risky. In the end I am not a pgSQL
> developer so it isn't in my hands either way.

If I may.

The "hoping, ignorance will save you" line of defence is in fact very
very risky.

But "civil disobidience" is not. Well, it may be risky, but it is valid.

There is already strong support for the opinion (e.g. European battle
against software patents), that current patent/copyright regulations,
are devastating to inovations. And in fact contradict their 'intended
goal' - the stimulation of inovations.

At this point, there is no other way but to ignore such legislation - to
the point where we can: like, by not reading patent files. Be warrned,
when the law strikes back, you loose.

Calming down a little, I'd add, that this particular piece of 16'th
century law does not quite fit todays reality.

Just my 2c .... and I admit, it's not on the subject in subject :)

--
-R