Обсуждение: Efficiency vs. code bloat for SELECT wrappers

Поиск
Список
Период
Сортировка

Efficiency vs. code bloat for SELECT wrappers

От
Colin Wetherbee
Дата:
Greetings.

I am working on a PostgreSQL-backed mod_perl web application that's just
in its infancy.

Let's say I have a users table that holds about 15 columns of data about
each user.

If I write one Perl sub for each operation on the table (e.g. one that
gets the username and password hash, another that gets the last name and
first name, etc.), there will be a whole lot of subs, each of which
performs one very specific task.

If I write one larger Perl sub that grabs the whole row, and then I deal
with the contents of the row in Perl, ignoring columns as I please, it
will require fewer subs and, in turn, imply cleaner code.

My concern is that I don't know what efficiency I would be forfeiting on
the PostgreSQL side of the application by always querying entire rows if
my transaction occurs entirely within a single table.  Of course, I
would want to handle more complicated JOINs and other more intensive
operations on the PostgreSQL side.

I don't think the application will ever store a tuple larger than about
512 bytes in any table, so the network speed wouldn't really come into play.

Thanks.

Colin

Re: Efficiency vs. code bloat for SELECT wrappers

От
Tom Lane
Дата:
Colin Wetherbee <cww@denterprises.org> writes:
> Let's say I have a users table that holds about 15 columns of data about
> each user.

> If I write one Perl sub for each operation on the table (e.g. one that
> gets the username and password hash, another that gets the last name and
> first name, etc.), there will be a whole lot of subs, each of which
> performs one very specific task.

> If I write one larger Perl sub that grabs the whole row, and then I deal
> with the contents of the row in Perl, ignoring columns as I please, it
> will require fewer subs and, in turn, imply cleaner code.

> My concern is that I don't know what efficiency I would be forfeiting on
> the PostgreSQL side of the application by always querying entire rows if
> my transaction occurs entirely within a single table.

Not nearly as much as you would lose anytime you perform two independent
queries to fetch different fields of the same row.  What you really need
to worry about here is making sure you only fetch the row once
regardless of which field(s) you want out of it.  It's not clear to me
whether your second design concept handles that, but if it does then
I think it'd be fine.

The only case where custom field sets might be important is if you have
fields that are wide enough to potentially get TOASTed (ie more than a
kilobyte or so apiece).  Then it'd be worth the trouble to not fetch
those when you don't need them.  But that apparently isn't the case
with this table.

            regards, tom lane

Re: Efficiency vs. code bloat for SELECT wrappers

От
Colin Wetherbee
Дата:
Tom Lane wrote:
> Colin Wetherbee <cww@denterprises.org> writes:
>> Let's say I have a users table that holds about 15 columns of data about
>> each user.
>
>> If I write one Perl sub for each operation on the table (e.g. one that
>> gets the username and password hash, another that gets the last name and
>> first name, etc.), there will be a whole lot of subs, each of which
>> performs one very specific task.
>
>> If I write one larger Perl sub that grabs the whole row, and then I deal
>> with the contents of the row in Perl, ignoring columns as I please, it
>> will require fewer subs and, in turn, imply cleaner code.
>
>> My concern is that I don't know what efficiency I would be forfeiting on
>> the PostgreSQL side of the application by always querying entire rows if
>> my transaction occurs entirely within a single table.
>
> Not nearly as much as you would lose anytime you perform two independent
> queries to fetch different fields of the same row.  What you really need
> to worry about here is making sure you only fetch the row once
> regardless of which field(s) you want out of it.  It's not clear to me
> whether your second design concept handles that, but if it does then
> I think it'd be fine.

Yes, the second design concept would handle that.

> The only case where custom field sets might be important is if you have
> fields that are wide enough to potentially get TOASTed (ie more than a
> kilobyte or so apiece).  Then it'd be worth the trouble to not fetch
> those when you don't need them.  But that apparently isn't the case
> with this table.

Sounds good.

Thanks.

Colin

Re: Efficiency vs. code bloat for SELECT wrappers

От
Colin Wetherbee
Дата:
Sam Mason wrote:
> On Sun, Dec 16, 2007 at 06:31:56PM -0500, Colin Wetherbee wrote:
>> If I write one Perl sub for each operation on the table (e.g. one that
>> gets the username and password hash, another that gets the last name and
>> first name, etc.), there will be a whole lot of subs, each of which
>> performs one very specific task.
>>
>> If I write one larger Perl sub that grabs the whole row, and then I deal
>> with the contents of the row in Perl, ignoring columns as I please, it
>> will require fewer subs and, in turn, imply cleaner code.
>
> It sounds as though you're just treating the database as a relatively
> dumb datastore.  They can be used as this, and Toms comments are as
> always good, but relational databases come into their own when you're
> writing more complex queries.  When I'm writing my code I tend to put
> the SQL statements directly in with the rest of the code, abstracting
> away from the database tends to make things more complicated than they
> need to be.

Because I know Perl a whole lot better than SQL, PostgreSQL, and even
the Perl DBI, I'm always inclined to wrap the database stuff in a nice
little package and forget about it.  This method has worked well for me
in the past, but the project I'm starting is much bigger in terms of
database use than anything else I've written.

You say you write SQL directly in your application code, rather than
writing task-oriented wrappers.  I like that idea, but I can see that
getting really messy.  What is your opinion on how it affects code
maintenance and things like that?

Thanks.

Colin

Re: Efficiency vs. code bloat for SELECT wrappers

От
Sam Mason
Дата:
On Mon, Dec 17, 2007 at 12:49:46PM -0500, Colin Wetherbee wrote:
> Because I know Perl a whole lot better than SQL, PostgreSQL, and even
> the Perl DBI, I'm always inclined to wrap the database stuff in a nice
> little package and forget about it.  This method has worked well for me
> in the past, but the project I'm starting is much bigger in terms of
> database use than anything else I've written.

SQL allows you to express some sorts of problems very succinctly and
efficiently, the problem is that you need some experience of SQL before
knowing when is best to use imperative vs relational tools.

Without knowing what sort of thing you're trying to do I'm unable to
give any appropriate examples.  I'd definitely say that spending time
getting to know databases has made me much more productive in total.
Mainly because with relational databases you just don't have to worry
about lots of the, mostly distracting, details that you do in imperative
languages.

> You say you write SQL directly in your application code, rather than
> writing task-oriented wrappers.  I like that idea, but I can see that
> getting really messy.  What is your opinion on how it affects code
> maintenance and things like that?

It's normally very manageable, but it'll depend on where you like to
put your machinery.  I seem to be relying more and more on (database)
server side code and most of the UI code just tends to do some initial
simple input validation and then hand all the data over to the database.
My UI code then takes any results back from the database and feeds it
into the appropriate places.  I'd guess that maybe 5 to 10% of the lines
in my programs contain some SQL---i.e. two or three lines of SQL for
every twenty or thirty lines.  I don't think I'd save much, if any,
lines in total if I tried to put some database abstraction layer in the
middle.  I can say with reasonable certainty that it would be much more
complicated to keep everything working as efficiently.

Luckily I've been able to design most of the programs I work on as
relatively simple layers over a database, I'm not sure if you're able to
work like this.


  Sam

Re: Efficiency vs. code bloat for SELECT wrappers

От
Ted Byers
Дата:
--- Colin Wetherbee <cww@denterprises.org> wrote:

> Sam Mason wrote:
> > On Sun, Dec 16, 2007 at 06:31:56PM -0500, Colin
> Wetherbee wrote:
> >> If I write one Perl sub for each operation on the
> table (e.g. one that
> >> gets the username and password hash, another that
> gets the last name and
> >> first name, etc.), there will be a whole lot of
> subs, each of which
> >> performs one very specific task.
> >>
Right. First rule of software engineering is keep
functions as small as possible, focussed on one thing
wherever practicable.  It doesn't matter if the
language is Perl or C++ or Java, or a stored procedure
in an RDBMS.  One can always create additional driver
functions that use the elemental simple functions to
do more complex tasks (bearing in mind the
complexities that will inevitably arise in multiple
user situations).

> >> If I write one larger Perl sub that grabs the
> whole row, and then I deal
> >> with the contents of the row in Perl, ignoring
> columns as I please, it
> >> will require fewer subs and, in turn, imply
> cleaner code.
> >
Define "cleaner code."  The more data, and the more
complex that data, the more code you have to write,
regardless of whether that is in one function or
several.  Either way, done badly, can be a problem for
both maintenance and performance.

> > It sounds as though you're just treating the
> database as a relatively
> > dumb datastore.  They can be used as this, and
> Toms comments are as
> > always good, but relational databases come into
> their own when you're
> > writing more complex queries.  When I'm writing my
> code I tend to put
> > the SQL statements directly in with the rest of
> the code, abstracting
> > away from the database tends to make things more
> complicated than they
> > need to be.
>
> Because I know Perl a whole lot better than SQL,
> PostgreSQL, and even
> the Perl DBI, I'm always inclined to wrap the
> database stuff in a nice
> little package and forget about it.  This method has
> worked well for me
> in the past, but the project I'm starting is much
> bigger in terms of
> database use than anything else I've written.
>
I routinely keep my SQL code distinct from my Perl,
java or C++ code.  When a client program needs to do
something with the database, then either a child
process executes a script I have written, if the
client program doesn't need to do anything with data
drawn from the database, or I have all the SQL code in
one or more stored procedures, and use the appropriate
client interface to invoke the stored procedure(s).
Whether the SQL is in a specific script or in a stored
procedure, my SQL code is kept distinct from the
client code, regardles of the language I have used for
that.  I find this even MORE useful as my projects get
bigger.

> You say you write SQL directly in your application
> code, rather than
> writing task-oriented wrappers.  I like that idea,
> but I can see that
> getting really messy.  What is your opinion on how
> it affects code
> maintenance and things like that?
>
From what I have seen, even in small, almost trivial,
client programs, I find this gets messy real quick.
I, therefore, hate the idea of mixing SQL in with
client code (it is all, in my view, application code).
 I like the use of middleware objects since, if well
designed, they can make developing the code required
all the simpler.  It is only when badly done that an
abstraction leads to complicated code that is a
nightmare to maintain; worse if it is inadequately
documented.  The whole purpose of abstraction, whether
one is using objected oriented development of
middleware, or generic programming, or structured
programming, or functional programming, is to analyze
a complex problem into more manageable parts.  The
parts themselves become easier to code (witness java
beans of various kinds, such as backing beans - my
beans get their data either from the database or from
the user interface - in either case, they make
interaction between a web based interface and the
database back end MUCH simpler to code), and the
relationships among the parts are easier to
understand.  Each kind of abstraction has its place.
It is up to the analyst or architect to figure out how
many layers and what abstractions are appropriate for
a given project.

HTH

Ted

Re: Efficiency vs. code bloat for SELECT wrappers

От
Colin Wetherbee
Дата:
Sam Mason wrote:
> Luckily I've been able to design most of the programs I work on as
> relatively simple layers over a database, I'm not sure if you're able to
> work like this.

I'm not at liberty to divulge much of the application concept, but
consider, if you will, an application like Gmail or any other web-based
mail service.

I'm venturing a guess that the database code in Gmail scripts isn't
overly-complicated, and there are probably a handful of "task
categories" that get executed by the application.  Folder operations
might be lumped into one category, and SMTP operations into another, for
example.  Each category probably has a few variations, like retrieving
an email with or without full headers.  Overall, though, I would wager
that the front-end, UI-type stuff in Gmail is much more complicated than
the database code, especially with all the Javascript it uses (though, I
suspect most of that is relatively static code).

This is roughly the distribution of code I'm implementing: lots of web
stuff with only a few database hits per page, most of which are SELECT
queries.

So, really, I don't think my application would be considered to be
"relatively simple layers over a database", since the UI part will be so
full-featured.  I doubt I would ever see 5-10% of the lines accessing
the database in this application.  A better estimate would probably be
around 1% or 1.5%.

My guess, having written this, is that your approach might be more
useful for applications that rely heavily on interaction with a
database.  I'd appreciate any more comments you have on this, though.

Colin

Re: Efficiency vs. code bloat for SELECT wrappers

От
Colin Wetherbee
Дата:
Ted Byers wrote:
> --- Colin Wetherbee <cww@denterprises.org> wrote:
>
>> Sam Mason wrote:
>>> On Sun, Dec 16, 2007 at 06:31:56PM -0500, Colin
>> Wetherbee wrote:
>>>> If I write one Perl sub for each operation on the
>> table (e.g. one that
>>>> gets the username and password hash, another that
>> gets the last name and
>>>> first name, etc.), there will be a whole lot of
>> subs, each of which
>>>> performs one very specific task.
>>>>
> Right. First rule of software engineering is keep
> functions as small as possible, focussed on one thing
> wherever practicable.  It doesn't matter if the
> language is Perl or C++ or Java, or a stored procedure
> in an RDBMS.  One can always create additional driver
> functions that use the elemental simple functions to
> do more complex tasks (bearing in mind the
> complexities that will inevitably arise in multiple
> user situations).

I've programmed in this way for years, basically ever since I learned
object-oriented programming.  I find it "cleaner" to keep functional
elements separate and access them sequentially from larger, more
broadly-focused functions.

>>>> If I write one larger Perl sub that grabs the
>> whole row, and then I deal
>>>> with the contents of the row in Perl, ignoring
>> columns as I please, it
>>>> will require fewer subs and, in turn, imply
>> cleaner code.
> Define "cleaner code."  The more data, and the more
> complex that data, the more code you have to write,
> regardless of whether that is in one function or
> several.  Either way, done badly, can be a problem for
> both maintenance and performance.

See above.

> I routinely keep my SQL code distinct from my Perl,
> java or C++ code.  When a client program needs to do
> something with the database, then either a child
> process executes a script I have written, if the
> client program doesn't need to do anything with data
> drawn from the database, or I have all the SQL code in
> one or more stored procedures, and use the appropriate
> client interface to invoke the stored procedure(s).
> Whether the SQL is in a specific script or in a stored
> procedure, my SQL code is kept distinct from the
> client code, regardles of the language I have used for
> that.  I find this even MORE useful as my projects get
> bigger.

This seems like quite a departure from Sam's recommendation.  Now, I'm torn!

Colin

Re: Efficiency vs. code bloat for SELECT wrappers

От
Sam Mason
Дата:
On Mon, Dec 17, 2007 at 01:29:10PM -0500, Ted Byers wrote:
> I routinely keep my SQL code distinct from my Perl,
> java or C++ code.  When a client program needs to do
> something with the database, then either a child
> process executes a script I have written, if the
> client program doesn't need to do anything with data
> drawn from the database, or I have all the SQL code in
> one or more stored procedures, and use the appropriate
> client interface to invoke the stored procedure(s).
> Whether the SQL is in a specific script or in a stored
> procedure, my SQL code is kept distinct from the
> client code, regardles of the language I have used for
> that.  I find this even MORE useful as my projects get
> bigger.

Humm, this discussion is horribly domain specific.  Abstractions are the
bread and butter of programming and designing them appropriately makes
the difference between things turning into a mess later on or not.  The
only solid rules I stick to is that when I'm working to solve a problem
I've not solved before, I will get it (i.e. the way I go about solving
the problem, getting the right answer is generally pretty easy) wrong
the first time and probably only get it reasonable the third or fourth
time.

Luckily, big bits of software take a while to accumulate so it's
possible to do the naive thing first and then, when you know how things
work, better abstractions can be found and incorporated.  For example,
I'd be tempted to replace your "get password hash" with a "verify
password" procedure that actually hashes the password client side and
then sends it over to the database for actual validation.  However, you
may actually care what the password hash is, at which point the hashing
should be broken out and revealed to the appropriate balance of the
software.

> Each kind of abstraction has its place.
> It is up to the analyst or architect to figure out how
> many layers and what abstractions are appropriate for
> a given project.

Indeed.  But to be able to choose a suitable set of abstractions, it
helps for the designer to know the ins and outs of the tools being
used.  I was trying to encourage Colin to look at using databases
in a different way, different styles of programming suit different
applications and hiding the database can have detrimental effects as
well as positive effects.


  Sam

Re: Efficiency vs. code bloat for SELECT wrappers

От
Ted Byers
Дата:
--- Sam Mason <sam@samason.me.uk> wrote:

> On Mon, Dec 17, 2007 at 01:29:10PM -0500, Ted Byers
> wrote:
> > I routinely keep my SQL code distinct from my
> Perl,
> > java or C++ code.  When a client program needs to
> do
> > something with the database, then either a child
> > process executes a script I have written, if the
> > client program doesn't need to do anything with
> data
> > drawn from the database, or I have all the SQL
> code in
> > one or more stored procedures, and use the
> appropriate
> > client interface to invoke the stored
> procedure(s).
> > Whether the SQL is in a specific script or in a
> stored
> > procedure, my SQL code is kept distinct from the
> > client code, regardles of the language I have used
> for
> > that.  I find this even MORE useful as my projects
> get
> > bigger.
>
> Humm, this discussion is horribly domain specific.
> Abstractions are the
> bread and butter of programming and designing them
> appropriately makes
> the difference between things turning into a mess
> later on or not.  The
> only solid rules I stick to is that when I'm working
> to solve a problem
> I've not solved before, I will get it (i.e. the way
> I go about solving
> the problem, getting the right answer is generally
> pretty easy) wrong
> the first time and probably only get it reasonable
> the third or fourth
> time.
>

Right.  But there is a quicker way.  I like to work
with people who have experience that is different from
mine, expecting they will have seen a different suite
of problems and found solutions I have come to rely
on.  In that way, in discussing my current problems, I
can learn something new, and that much faster than
trial and error.  Of course, there are always test or
proof of concept programs, where I test ideas.  For
example, I have routinely been advised that left joins
are faster than correlated subqueries.  In the past
month, I have found two problems requiring either a
correlated subquery or a left join, and in one case
the correlated subquery was more than ten times faster
while in the other the left join was about twice as
fast.  In both cases, the results returned by the two
approaches were identical, but there were significant
differences in performance; most astonishing in the
one case that proved to be quite different than
expected.

> > Each kind of abstraction has its place.
> > It is up to the analyst or architect to figure out
> how
> > many layers and what abstractions are appropriate
> for
> > a given project.
>
> Indeed.  But to be able to choose a suitable set of
> abstractions, it
> helps for the designer to know the ins and outs of
> the tools being
> used.  I was trying to encourage Colin to look at
> using databases
> in a different way, different styles of programming
> suit different
> applications and hiding the database can have
> detrimental effects as
> well as positive effects.
>
I find the opportunity to look at problems in a
different light priceless.

But some practices generally turn out to be counter
productive.  Producing overly long functions, or
mixing code involving different languages into the
same file, often leads to an unmaintainable mess.
More often than not, such practices are a consequence
of poor design.  But even here, there is no hard and
fast rule, since some algorithms, such as numeric
quadrature or QR factorization of general real
matrices can not be written in a short, simple
function although they logically ought to be one
function in the simplest cases; but even these admit
enhancements that warrant more interesting data
structures and ancillary functions.  But for these
algorithms, which admittedly have nothing to do with
databases, the top consideration is the speed of
provably correct code.  If that means it is too
complex for junior or even intermediate programmers,
so be it.  I have seen and used library code for
number crunching that only a specialist in numeric
methods have a hope of easily understanding, and been
glad I didn't have to write those functions myself.
;-)

For Colin's situation, what I would have suggested
would be a simple extension of what he normally does,
and that is look at an additional layer that gets the
data from the database as efficiently as possible, and
cache it in that layer, providing it to the UI layer
as required.  Similarly, perhaps in a different layer,
get user data from the interface and cache it, sending
it to the database back end at the most appropriate
time in the most efficient way.  And perhaps with all
this, some form of database connection pooling;
although this latter option will depend on how many
concurrent users there may be in the worse case
scenario (or maybe that is the best case scenario
since it would mean that there is significant demand
for what the app does ;-).

I haven't attempted this in Perl, but it is dirt
simple in Java/J2EE.  In a web application, based,
e.g. on Tomcat, it is easy to set this up with as many
layers as one can justify, especially with an IDE like
NetBeans6.  Not having thought about how to do this
using perl, I wouldn't know what to advise regarding
how to put this into practice if one wants to stick
with perl.

HTH

Ted

Re: Efficiency vs. code bloat for SELECT wrappers

От
Colin Wetherbee
Дата:
Colin Wetherbee wrote:
> My guess, having written this, is that your approach might be more
> useful for applications that rely heavily on interaction with a
> database.  I'd appreciate any more comments you have on this, though.

Tom, Sam, and Ted (a lovely assortment of three-letter names), thank you
very much for your input on my situation.  You've given me quite a lot
to consider.

For now, I will stick with Tom's and Ted's recommendations for a while,
but I'll also keep Sam's ideas in mind if this application becomes more
heavily reliant on the database.

Colin