Обсуждение: Image File System Question

Поиск
Список
Период
Сортировка

Image File System Question

От
vishal saberwal
Дата:
hi,

My server is Postgres 8.0.1 on fedora core2.
My clients are remote and interface with my server using .NET GUI.

We are trying to store many images/icons/audio/video clips in our system.
We expect many of these. The way we are doing it is using Hierarchical File System.

I understand we need to limit the size of these directories by controlling number of resources in each directory.

The way i want to let anone access these resources is only through stored procedures rather than direct downloading.
I am trying to create stored procedure/function API for the same. These functions will check for the permissions for the user and/or the file,
check the location from the schema and then would "stream" it out for the GUI to use.

The question is:
(1) How do i use the database stored procedures/functions as a tunnel for just streaming the data rather than storing it in database?
That is, a function that given a Image ID for example, will read the location from the table and then just go to the location and stream out the bits.
(2) Am i right in saying that it can't solely be done iwth plpgsql but would need somem c/c++ api.
(3) Has anyone here done something like this and can share with me how he/she implemented this.

I did do my homework of googling for something like this but may be my search skills were not strong enough to find some substantial information/HOW TOs or examples.

thanks,
vish
(vishal saberwal)

Re: Image File System Question

От
Richard Huxton
Дата:
vishal saberwal wrote:
> hi,
>
> My server is Postgres 8.0.1 on fedora core2.
> My clients are remote and interface with my server using .NET GUI.
>
> We are trying to store many images/icons/audio/video clips in our system.
> We expect many of these. The way we are doing it is using Hierarchical
> File System.

HFS is an old Apple Macintosh filesystem - an unlikely choice. You'll
probably find it's ext3.

> I understand we need to limit the size of these directories by
> controlling number of resources in each directory.

Less important nowadays than it used to be, but usually done via hashing
the filename. So, file 123456.gif is stored in 1/2/3/123456.gif

> The way i want to let anone access these resources is only through
> stored procedures rather than direct downloading.

Why?
What are the benefits of this system?

> I am trying to create stored procedure/function API for the same. These
> functions will check for the permissions for the user and/or the file,
> check the location from the schema and then would "stream" it out for
> the GUI to use.

Why not just set up a webserver and get it to authenticate to your database?

> The question is:
> (1) How do i use the database stored procedures/functions as a tunnel
> for just streaming the data rather than storing it in database?
> That is, a function that given a Image ID for example, will read the
> location from the table and then just go to the location and stream out
> the bits.
> (2) Am i right in saying that it can't solely be done iwth plpgsql but
> would need somem c/c++ api.

Any of the "untrusted" languages (which of course includes "C"). File
access will be different in each of course - pick whichever you are most
familiar with. The key differences between a "trusted" and "untrusted"
version of a language are:
1. Untrusted languages can access the rest of the system
2. Functions in untrusted languages can only be added by a superuser.

> (3) Has anyone here done something like this and can share with me how
> he/she implemented this.
>
> I did do my homework of googling for something like this but may be my
> search skills were not strong enough to find some substantial
> information/HOW TOs or examples.

Well, there is the "procedural languages" section of the manuals. It
might also be worth checking on pgfoundry to see if there is anything
useful there.

--
   Richard Huxton
   Archonet Ltd

Re: Image File System Question

От
vishal saberwal
Дата:
thanks for your response,

We are kind of jailing (may be hiding would be a better term) resources behind the database/Stored procedures and GUI needs to have a feel as if the data is comming from database.
Its a requirement for the project that any communication of resources and data be done through stored procedures/function calls.
What other options do you suggest.

Did try finding something about HFS for ext3/xfs but in vain. The information i found was the conversion between these file systems.

There are not just the Users but different devices and applications as clients that would talk to the database and may not require the web servers.

I would appreciate if someone can share if they have done sommething similar or if someone can point me to the right resource.

pgfoundry does talk about something with Npgsql but thats a different layer. Some devices and applications may not go through that layer.

Thanks,
vish

On 11/4/05, Richard Huxton <dev@archonet.com> wrote:
vishal saberwal wrote:
> hi,
>
> My server is Postgres 8.0.1 on fedora core2.
> My clients are remote and interface with my server using .NET GUI.
>
> We are trying to store many images/icons/audio/video clips in our system.
> We expect many of these. The way we are doing it is using Hierarchical
> File System.

HFS is an old Apple Macintosh filesystem - an unlikely choice. You'll
probably find it's ext3.

> I understand we need to limit the size of these directories by
> controlling number of resources in each directory.

Less important nowadays than it used to be, but usually done via hashing
the filename. So, file 123456.gif is stored in 1/2/3/123456.gif

> The way i want to let anone access these resources is only through
> stored procedures rather than direct downloading.

Why?
What are the benefits of this system?

> I am trying to create stored procedure/function API for the same. These
> functions will check for the permissions for the user and/or the file,
> check the location from the schema and then would "stream" it out for
> the GUI to use.

Why not just set up a webserver and get it to authenticate to your database?

> The question is:
> (1) How do i use the database stored procedures/functions as a tunnel
> for just streaming the data rather than storing it in database?
> That is, a function that given a Image ID for example, will read the
> location from the table and then just go to the location and stream out
> the bits.
> (2) Am i right in saying that it can't solely be done iwth plpgsql but
> would need somem c/c++ api.

Any of the "untrusted" languages (which of course includes "C"). File
access will be different in each of course - pick whichever you are most
familiar with. The key differences between a "trusted" and "untrusted"
version of a language are:
1. Untrusted languages can access the rest of the system
2. Functions in untrusted languages can only be added by a superuser.

> (3) Has anyone here done something like this and can share with me how
> he/she implemented this.
>
> I did do my homework of googling for something like this but may be my
> search skills were not strong enough to find some substantial
> information/HOW TOs or examples.

Well, there is the "procedural languages" section of the manuals. It
might also be worth checking on pgfoundry to see if there is anything
useful there.

--
   Richard Huxton
   Archonet Ltd

Re: Image File System Question

От
Douglas McNaught
Дата:
vishal saberwal <vishalsaberwal@gmail.com> writes:

> Did try finding something about HFS for ext3/xfs but in vain. The
> information i found was the conversion between these file systems.

What exactly do you mean by HFS?  All modern filesystems are
hierarchical.

-Doug

Re: Image File System Question

От
vishal saberwal
Дата:
HFS is the filesystem richard was tlking about. I am not familiar with this file system but i guessed it was HFS+ that he was talking about.
yes, i understand all modern filesystems are Hirearchical file systems.

thanks,
vish

On 11/4/05, Douglas McNaught <doug@mcnaught.org> wrote:
vishal saberwal <vishalsaberwal@gmail.com> writes:

> Did try finding something about HFS for ext3/xfs but in vain. The
> information i found was the conversion between these file systems.

What exactly do you mean by HFS?  All modern filesystems are
hierarchical.

-Doug

Re: Image File System Question

От
Richard Huxton
Дата:
vishal saberwal wrote:
> HFS is the filesystem richard was tlking about. I am not familiar with
> this file system but i guessed it was HFS+ that he was talking about.
> yes, i understand all modern filesystems are Hirearchical file systems.

Ah, you said (H)ierarchical (F)ile (S)ystem in the original message
(note the capitals). Since I'm a long-term Macintosh owner, I took it as
a reference to the Apple-Mac filesystem. In fact you just meant a
filesystem (since they are all hierarchical as Doug said).

--
   Richard Huxton
   Archonet Ltd

Re: Image File System Question

От
Douglas McNaught
Дата:
vishal saberwal <vishalsaberwal@gmail.com> writes:

> HFS is the filesystem richard was tlking about. I am not familiar with
> this file system but i guessed it was HFS+ that he was talking about.
> yes, i understand all modern filesystems are Hirearchical file systems.

Oh, OK--reading your original message, you just want to store the data
files in a hierarchical arrangement of some sort.

-Doug

Re: Image File System Question

От
Csaba Nagy
Дата:
See my comments below.

On Fri, 2005-11-04 at 15:24, vishal saberwal wrote:
> thanks for your response,
>
> We are kind of jailing (may be hiding would be a better term)
> resources behind the database/Stored procedures and GUI needs to have
> a feel as if the data is comming from database.
> Its a requirement for the project that any communication of resources
> and data be done through stored procedures/function calls.
> What other options do you suggest.

Hmm... this sounds like an application server is what you need. The app
server is the interface to the user, and should get the data from the
file system or data base or whatever else. Now some data base vendors
would argue that you can do all what you need with the DB and put all
code into it, but I guess that's just not true (there are more efficient
ways to store some data than a data base, and a web server will
definitely serve you files faster than any DB), and an efficient setup
will need a separate application server in front of the data base. For
one thing, it is a lot easier to cluster/scale the app server than the
data base, and you can use it to integrate multiple data sources. IMHO,
using the data base as an app server is just a bad choice.

> Did try finding something about HFS for ext3/xfs but in vain. The
> information i found was the conversion between these file systems.
>
> There are not just the Users but different devices and applications as
> clients that would talk to the database and may not require the web
> servers.

If you're coding your clients, then you can code them to access any
given API, including HTTP, or even some proprietary API/protocol you
design. HTTP is not made only for humans, and I guess is as easy to
write code which accesses data through HTTP as it would writing code
accessing any of the postgres client APIs. Or maybe it's not that easy,
but not a lot harder.

[snip]

HTH,
Csaba.



Re: Image File System Question

От
vishal saberwal
Дата:
thanks for all your prompt reponses but i am still not clear with the way to solve the problem.
I am sorry if i wasn't clear with my problem the first time.

My problem isn't file system and the way i lay it out.
What i need to know is a way to stream out external resource files via  stored procedure calls.

Steps:
(1) User calls stored procedure ReadImage(ImageID).
(2) Stored procedure reads the Image table to find the location (field) for the given ID.
(3) Now this stored procedure ("HOW TO?") opens the file.
(4) Then it "Streams" it out to the User (as if it was a BLOB).
(5) For User its just like a lo_export but he has no clue if its stored in database or a seperate resource file system.

Resource file system can be accessed only through the database, where we will check for permissions if we need to (though i know i can implement permissions in the file systems as well).

What is the best way to do Steps (3) and (4) above?

thanks,
vish
(vishal saberwal)

On 11/4/05, Douglas McNaught <doug@mcnaught.org> wrote:
vishal saberwal <vishalsaberwal@gmail.com> writes:

> HFS is the filesystem richard was tlking about. I am not familiar with
> this file system but i guessed it was HFS+ that he was talking about.
> yes, i understand all modern filesystems are Hirearchical file systems.

Oh, OK--reading your original message, you just want to store the data
files in a hierarchical arrangement of some sort.

-Doug

Re: Image File System Question

От
Jan Peterson
Дата:
We have a similar functional requirement to Vishal's, but with an
added twist.  Currently, we are utilizing the pg_largeobject table to
store context data delivered through our java application.  We are
finding, however, that very large pg_largeobject tables degrade our
performance in other areas (primarily VACUUM) due to I/O limitations
of our hardware (an appliance-like device).  We've been experimenting
with customized functions that provide similar access mechanisms as
are available with large object support.  Here are some examples of
what we've been experimenting with:

CREATE OR REPLACE FUNCTION fileread(varchar, varchar, int8, int8)
  RETURNS bytea AS
$BODY$
    my ($type, $hash, $offset, $length) = @_;
    my $file = $_SHARED{filename}->($type, $hash);
    my $data;
    my $fh;

    # some argument validation code removed here

    if (! open($fh, "< $file")) {
        elog(ERROR, "unable to open $file: $!");
    }
    binmode $fh;

    if ((defined($offset)) and ($offset > 0)) {
        if (! sysseek($fh, $offset, 0)) {
            elog(ERROR, "can't seek to pos $offset in $file: $!");
        }
    }

    my $numread = sysread($fh, $data, $length);
    if (! defined($numread)) {
        elog(ERROR, "unable to read $file: $!");
    }

    close($fh);
    return undef if $numread == 0;

    # here is where it gets ugly due to the way we have to munge
    # the data coming back from perl -> postgres

    $data =~ s/([\0\\\'])/sprintf("\\\\%03o",ord($1))/ge;
    return($data);
$BODY$
  LANGUAGE 'plperlu' VOLATILE;

There is also a function that sets up the %_SHARED hash to hold some
utility functions (filename is one of them, whose job it is to convert
the hash ... a.k.a. filename ... to a full path containing
sub-directories).  A similar function filewrite() is also used to
handling paging the data into the filesystem.

Problems we've encountered with this mechanism are: the evil
substitution required to quote the bytea value being returned from the
function (because we potentially have binary data including null
characters, single quotes, and backslashes), the lack of persistance
of %_SHARED (fixed in 8.1, I think, but we're using 8.0.2 + some local
patches), and the limitations of the underlying filesystem (many of
our objects are < 4k in size, but the linux ext3 filesystem we're
using has no support for storing multiple fragments or tail fragments
in a single data block, so the minimum file size on the filesystem is
4k, which kills us).

We're looking now at re-writing this stuff in C and storing the tail
fragment < 4k of the data stream directly in a bytea column in our
main table.  Still no idea if this will really solve our VACUUM
problems, but it's the best thing we've been able to come up with so
far.  Another drawback, of course, is the lack of transactional
security of this externally stored data (what if we have a statement
that does a select deletefile('type', 'hash'); and then it needs to
roll back?  answer: we're hosed).  I'd be happy to hear any
suggestions for solutions to the above problems.

        -jan-
--
Jan L. Peterson
<jan.l.peterson@gmail.com>