Обсуждение: Re: [HACKERS] Binary cursor header changed from 20 to 16 Bytes?

Поиск
Список
Период
Сортировка

Re: [HACKERS] Binary cursor header changed from 20 to 16 Bytes?

От
"G. Anthony Reina"
Дата:
Tom Lane wrote:

> There is *no* header overhead for binary data as far as libpq or the
> FE/BE protocol is concerned; what you get from PQgetvalue() is just
> a pointer to whatever the backend's internal representation of the
> data type is.  It's certainly possible for particular data types to
> change representation from time to time, though I didn't recall anyone
> planning such a thing for 6.5.  What data type is the column you're
> retrieving, anyway?  (I'm guessing float4 array, perhaps?)  What kind
> of platform is the backend running on?
>
>                         regards, tom lane

Right on the money. The column being retrieved is a float4 array. I am running
the backend on a Red Hat Linux 6.0 machine (Pentium II / 400 MHz / 512 Meg RAM /
128 Meg Shared buffers). The clients are all SGI machines (O2, Impact, and
Indy).

-Tony




Re: [HACKERS] Binary cursor header changed from 20 to 16 Bytes?

От
Bruce Momjian
Дата:
> Tom Lane wrote:
> 
> > There is *no* header overhead for binary data as far as libpq or the
> > FE/BE protocol is concerned; what you get from PQgetvalue() is just
> > a pointer to whatever the backend's internal representation of the
> > data type is.  It's certainly possible for particular data types to
> > change representation from time to time, though I didn't recall anyone
> > planning such a thing for 6.5.  What data type is the column you're
> > retrieving, anyway?  (I'm guessing float4 array, perhaps?)  What kind
> > of platform is the backend running on?
> >
> >                         regards, tom lane
> 
> Right on the money. The column being retrieved is a float4 array. I am running
> the backend on a Red Hat Linux 6.0 machine (Pentium II / 400 MHz / 512 Meg RAM /
> 128 Meg Shared buffers). The clients are all SGI machines (O2, Impact, and
> Indy).
> 

I don't think you can do binary cursors across architectures.  The
internal formats for most types are different, though you may be able to
get away with string fields and int if the endian is the same.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Binary cursor header changed from 20 to 16 Bytes?

От
Tom Lane
Дата:
"G. Anthony Reina" <reina@nsi.edu> writes:
> Tom Lane wrote:
>> There is *no* header overhead for binary data as far as libpq or the
>> FE/BE protocol is concerned; what you get from PQgetvalue() is just
>> a pointer to whatever the backend's internal representation of the
>> data type is.  It's certainly possible for particular data types to
>> change representation from time to time, though I didn't recall anyone
>> planning such a thing for 6.5.  What data type is the column you're
>> retrieving, anyway?  (I'm guessing float4 array, perhaps?)  What kind
>> of platform is the backend running on?
>> 
>> regards, tom lane

> Right on the money. The column being retrieved is a float4 array. I am
> running the backend on a Red Hat Linux 6.0 machine (Pentium II / 400
> MHz / 512 Meg RAM / 128 Meg Shared buffers). The clients are all SGI
> machines (O2, Impact, and Indy).

OK, I think I see what is going on here.  If you look at the
declarations for arrays in src/include/utils/array.h, the overhead for
a one-dimensional array is 5 * sizeof(int) (total object size, #dims,
flags word, lo bound, hi bound) rounded up to the next MAXALIGN()
boundary to ensure that the array element data is aligned safely.

I was not quite right about the libpq presentation of a binary cursor
being the same as the backend's internal representation.  Actually,
it seems that the length word is stripped off --- all variable-size
datatypes are required to start with a word that is the total object
size, and what you get from libpq is a pointer to the word after that.

So, what you should have been measuring was 4*sizeof(int) plus
the array alignment padding, if any.

6.4 had some hardwired assumptions about compiler alignment behavior,
which were giving us grief on platforms that didn't conform, so as of
6.5 we determine the actual alignment properties of the
compiler/platform during configure.  The old code used to be aligning
the array overhead to a multiple of 8 whether that was appropriate or
not, but the new code is only aligning to the largest alignment multiple
actually observed on the target platform.  Evidently that's just 4 on
your system.

Bottom line: yes, the change from 20 to 16 is likely to persist.

I think there is actually a bug in the way libpq is doing this, because
it is allocating space for the stored varlena object minus the total-
size word.  This means that any internal alignment assumptions are *not*
being respected --- for example, in a machine that does need MAXALIGN
of 8, the client-side representation of an array object will fail to
have its array members aligned at a multiple-of-8 address.  libpq ought
to allocate space for and store the whole varlena object including
length word, the same as it is in the backend, so that internal fields
of the varlena will have the same alignment as in the backend.  Will put
this on my todo list.

> The clients are all SGI machines (O2, Impact, and Indy).

You realize, of course, that using a binary cursor in a cross-platform
environment is a fairly dangerous thing to do.  Any cross-machine
discrepancies in data format or alignment become your problem...
        regards, tom lane


Re: [HACKERS] Binary cursor header changed from 20 to 16 Bytes?

От
"G. Anthony Reina"
Дата:
Bruce Momjian wrote:

> I don't think you can do binary cursors across architectures.  The
> internal formats for most types are different, though you may be able to
> get away with string fields and int if the endian is the same.
>

No, it works just fine. All you have to do is to swap the endian format (Linux Intel
is little endian; SGI is big endian). We've been using this approach since Postgres
6.3.

-Tony




Re: [HACKERS] Binary cursor header changed from 20 to 16 Bytes?

От
Bruce Momjian
Дата:
> Bruce Momjian wrote:
> 
> > I don't think you can do binary cursors across architectures.  The
> > internal formats for most types are different, though you may be able to
> > get away with string fields and int if the endian is the same.
> >
> 
> No, it works just fine. All you have to do is to swap the endian format (Linux Intel
> is little endian; SGI is big endian). We've been using this approach since Postgres
> 6.3.
> 

What doesn't work?  Floats?  Alignment problems?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Binary cursor header changed from 20 to 16 Bytes?

От
"G. Anthony Reina"
Дата:
Tom Lane wrote:

>  Will put this on my todo list.
>
> > The clients are all SGI machines (O2, Impact, and Indy).
>
> You realize, of course, that using a binary cursor in a cross-platform
> environment is a fairly dangerous thing to do.  Any cross-machine
> discrepancies in data format or alignment become your problem...
>
>                         regards, tom lane

Thanks Tom. I just wanted to make sure the subject was brought up to help
others in case they had been racking their brains on the problem.

As I wrote to Bruce, the cross architecture seems to work just fine as long
as you have make sure to swap the endians in the data. So it looks like you
can do something else that was not in the original planning. Another kudo
for the database architecture!

-Tony




Re: [HACKERS] Binary cursor header changed from 20 to 16 Bytes?

От
"G. Anthony Reina"
Дата:
Bruce Momjian wrote:

>
> > No, it works just fine. All you have to do is to swap the endian format (Linux Intel
> > is little endian; SGI is big endian). We've been using this approach since Postgres
> > 6.3.
> >
>
> What doesn't work?  Floats?  Alignment problems?
>

The only thing that seems to have problems is when you select multiple variables. For
this case, you have to put all of your arrays at the end.

e.g.      sprintf(data_string, "DECLARE data_cursor BINARY CURSOR "         "FOR SELECT repetition, cycle,
time_instantsFROM %s_proc WHERE "          "subject= '%s' and arm = '%s' and rep = %s and cycle = %s",
task_name,subject_name[subject], arm_name[arm],         repetition_name[i], cycle_name[j]);
 
        res = PQexec(conn, data_string);        if (PQresultStatus(res) != PGRES_COMMAND_OK) {
printf("\n\nERRORissuing command ... %s\n", data_string);             exit_nicely(conn);        }        PQclear(res);
     sprintf(data_string, "FETCH ALL IN data_cursor");        res = PQexec(conn, data_string);        if
(PQresultStatus(res)!= PGRES_TUPLES_OK) {             printf("\n\nERROR issuing command ... %s\n", data_string);
    exit_nicely(conn);         }
 
        /* Move binary-transferred data to desired variable float array */        memmove(bin_time, (PQgetvalue(res, 0,
2)),(number_of_bins + 1) *
 
sizeof(float));
        PQclear(res);        switch_endians_4bytes(bin_time, number_of_bins + 1);        res = PQexec(conn, "CLOSE
data_cursor");       PQclear(res);        res = PQexec(conn, "END");        PQclear(res);
 


So in the above case, I can get the repetition (single int value), cycle (single int
value), and time_instants (variable array of float values) out as a binary cursor. But
need to put the variable array at the end to make it work correctly. In this case, I
don't need to offset by 16 bytes to get the 2nd and 3rd column (cycles and
time_instants); I only need to do this for the 1st column (repetition).

My switch_endians_4_bytes looks like this:

void switch_endians_4bytes(int *temp_array, int size_of_array)
{   short int test_endianess_word = 0x0001;   char *test_endianess_byte = (char *) &test_endianess_word;
   int i;   int temp_int;   char *temp_char, byte0, byte1, byte2, byte3;
   if (test_endianess_byte[0] == BIG_ENDIAN) {
        for (i = 0; i < size_of_array; i++) {
            temp_int = temp_array[i];            temp_char = (char *) (&temp_int);            byte0 = *temp_char;
    byte1 = *(++temp_char);            byte2 = *(++temp_char);            byte3 = *(++temp_char);            temp_char
=(char *) (&temp_int);            *temp_char = byte3;            *(++temp_char) = byte2;            *(++temp_char) =
byte1;           *(++temp_char) = byte0;            temp_array[i] = temp_int;        }   }
 
}


where BIG_ENDIAN is defined as 0. Because I test the machine at run-time for its
endianess, I can run this on both of my platforms and it will either switch or not switch
depending on the need (assuming that the server is on a little endian machine).

-Tony




Re: [HACKERS] Binary cursor header changed from 20 to 16 Bytes?

От
Tom Lane
Дата:
"G. Anthony Reina" <reina@nsi.edu> writes:
> The only thing that seems to have problems is when you select multiple
> variables. For this case, you have to put all of your arrays at the
> end.

That doesn't make a lot of sense to me either.  What happens if you
don't?

> I don't need to offset by 16 bytes to get the 2nd and 3rd column (cycles and
> time_instants); I only need to do this for the 1st column (repetition).

Right, there'd not be any array overhead for non-array datatypes...
        regards, tom lane


Re: [HACKERS] Binary cursor header changed from 20 to 16 Bytes?

От
"G. Anthony Reina"
Дата:
Tom Lane wrote:

> "G. Anthony Reina" <reina@nsi.edu> writes:
> > The only thing that seems to have problems is when you select multiple
> > variables. For this case, you have to put all of your arrays at the
> > end.
>
> That doesn't make a lot of sense to me either.  What happens if you
> don't?
>
   It comes back as "gibberish". But we haven't really experimented with what
the gibberish is (e.g. alignment off, etc). Once we figured out the trick about
putting the arrays at the end, we stopped fooling with it. It would be a nice
little experiment since it appears that this kind of thing isn't frequently done.
   Anyone else out there using a binary cursor between two different computer
architectures?

>
> > I don't need to offset by 16 bytes to get the 2nd and 3rd column (cycles and
> > time_instants); I only need to do this for the 1st column (repetition).
>

Sorry I misspoke but you interpretted correctly anyway. The 1st and 2nd columns
(just single ints) don't need the 16 byte offset, just the 3rd column (variable
array). We've tried this with both int and float variable arrays and it works
fine.


-Tony