Обсуждение: Convert from hex to string

Поиск

Список

Период

Сортировка

Convert from hex to string

От

Yuriy Rusinov

Дата:

25 ноября 2015 г., 18:47:12

Hello, Colleagues !

I have to transform string, encoded to hexadecimal to text, but if I try to
select encode('Qt is great!', 'hex'); I receive valid and correct results

517420697320677265617421

but if I try to select decode ('517420697320677265617421', 'hex'), I
receive the same string, such as
'\x517420697320677265617421, which way I have to do for valid convert
to/from hexadecimal ?

Thanks for attention.
--
Best regards,
Sincerely yours,
Yuriy Rusinov.

Re: Convert from hex to string

От

Adrian Klaver

Дата:

25 ноября 2015 г., 19:25:52

On 11/25/2015 07:47 AM, Yuriy Rusinov wrote:
> Hello, Colleagues !
>
> I have to transform string, encoded to hexadecimal to text, but if I try to
> select encode('Qt is great!', 'hex'); I receive valid and correct results
>
> 517420697320677265617421
>
> but if I try to select decode ('517420697320677265617421', 'hex'), I
> receive the same string, such as
> '\x517420697320677265617421, which way I have to do for valid convert
> to/from hexadecimal ?

aklaver@test=> select encode('Qt is great!', 'hex');
           encode
--------------------------
  517420697320677265617421
(1 row)

aklaver@test=> select decode ('517420697320677265617421', 'hex');
            decode
----------------------------
  \x517420697320677265617421
(1 row)


aklaver@test=> select convert_from('\x517420697320677265617421', 'UTF8');
  convert_from
--------------
  Qt is great!
(1 row)


>
> Thanks for attention.
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Convert from hex to string

От

Francisco Olarte

Дата:

25 ноября 2015 г., 19:56:46

Hello Yuriy...

On Wed, Nov 25, 2015 at 4:47 PM, Yuriy Rusinov <yrusinov@gmail.com> wrote:
> I have to transform string, encoded to hexadecimal to text, but if I try to
> select encode('Qt is great!', 'hex'); I receive valid and correct results
>
> 517420697320677265617421
>
> but if I try to select decode ('517420697320677265617421', 'hex'), I
> receive the same string, such as
> '\x517420697320677265617421, which way I have to do for valid convert
> to/from hexadecimal ?

I seem to recall having answered this a very short time ago, but maybe
it was in the spanish list.

decode/encode are for converting bytes to a string. You need to
convert the string to bytes in a controlled way first ( bear in mind
there are implicit conversions ).

What you want is, given a text:

1.- Convert it to a bytea, in a controlled encoding: convert_to(string
text, dest_encoding name) => bytea
2.- Then encode the bytes in hex: encode(data bytea, format text) => text

then, to revert it you:

3.- Decode the hex string to bytes: decode(string text, format text) => bytea
4.- Convert the bytea, in a controlled encoding, to text:
convert_from(string bytea, src_encoding name) => text

As you see, they are nicelly paired. I see another response which just
does encode , decode+convert_from. This works because the database
does implicit conversions, but I would not recomend it. I cannot try
it because all my databases are UTF-8 but I feel Adrians example would
not work if your database encoding is NOT UTF-8 ( and you use any char
outside of ascii range ).

Look at the docs of the functions, section 9.4 table 9.7 int the 9.4.5 manual.

If you do it this way, you can also choose the encoding, ie, if you
know your data is latin1 you can convert from/to it and save a few
bytes, or you can convert to/from utf8 an insure you can represent
anything. Then you can encode/decode the bytes in whatever sutis you,
hex, as in yuour eample or base64 if you need to save a few bytes.

Types are there for a reason.

Francisco Olarte.

Re: Convert from hex to string

От

Adrian Klaver

Дата:

25 ноября 2015 г., 20:22:15

On 11/25/2015 08:56 AM, Francisco Olarte wrote:
> Hello Yuriy...
>
> On Wed, Nov 25, 2015 at 4:47 PM, Yuriy Rusinov <yrusinov@gmail.com> wrote:
>> I have to transform string, encoded to hexadecimal to text, but if I try to
>> select encode('Qt is great!', 'hex'); I receive valid and correct results
>>
>> 517420697320677265617421
>>
>> but if I try to select decode ('517420697320677265617421', 'hex'), I
>> receive the same string, such as
>> '\x517420697320677265617421, which way I have to do for valid convert
>> to/from hexadecimal ?
>
> I seem to recall having answered this a very short time ago, but maybe
> it was in the spanish list.
>
> decode/encode are for converting bytes to a string. You need to
> convert the string to bytes in a controlled way first ( bear in mind
> there are implicit conversions ).
>
> What you want is, given a text:
>
> 1.- Convert it to a bytea, in a controlled encoding: convert_to(string
> text, dest_encoding name) => bytea
> 2.- Then encode the bytes in hex: encode(data bytea, format text) => text
>
> then, to revert it you:
>
> 3.- Decode the hex string to bytes: decode(string text, format text) => bytea

Can't this be shortened to:

aklaver@test=> select convert_to( 'é', 'latin1');
  convert_to
------------
  \xe9
(1 row)

aklaver@test=> select convert_from( '\xe9', 'latin1');
  convert_from
--------------
  é

since convert_to() outputs bytea and convert_from() accepts bytea?

> 4.- Convert the bytea, in a controlled encoding, to text:
> convert_from(string bytea, src_encoding name) => text
>
> As you see, they are nicelly paired. I see another response which just
> does encode , decode+convert_from. This works because the database
> does implicit conversions, but I would not recomend it. I cannot try
> it because all my databases are UTF-8 but I feel Adrians example would
> not work if your database encoding is NOT UTF-8 ( and you use any char
> outside of ascii range ).

If you are doing all this in the same database I am not sure how the
above applies?

Would you not just use the database encoding for the src_encoding?

>
> Look at the docs of the functions, section 9.4 table 9.7 int the 9.4.5 manual.
>
> If you do it this way, you can also choose the encoding, ie, if you
> know your data is latin1 you can convert from/to it and save a few
> bytes, or you can convert to/from utf8 an insure you can represent
> anything. Then you can encode/decode the bytes in whatever sutis you,
> hex, as in yuour eample or base64 if you need to save a few bytes.
>
> Types are there for a reason.
>
> Francisco Olarte.
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Convert from hex to string

От

Adrian Klaver

Дата:

25 ноября 2015 г., 21:27:23

On 11/25/2015 08:56 AM, Francisco Olarte wrote:
> Hello Yuriy...
>
> On Wed, Nov 25, 2015 at 4:47 PM, Yuriy Rusinov <yrusinov@gmail.com> wrote:
>> I have to transform string, encoded to hexadecimal to text, but if I try to
>> select encode('Qt is great!', 'hex'); I receive valid and correct results
>>
>> 517420697320677265617421
>>
>> but if I try to select decode ('517420697320677265617421', 'hex'), I
>> receive the same string, such as
>> '\x517420697320677265617421, which way I have to do for valid convert
>> to/from hexadecimal ?
>
> I seem to recall having answered this a very short time ago, but maybe
> it was in the spanish list.
>
> decode/encode are for converting bytes to a string. You need to
> convert the string to bytes in a controlled way first ( bear in mind
> there are implicit conversions ).
>
> What you want is, given a text:
>
> 1.- Convert it to a bytea, in a controlled encoding: convert_to(string
> text, dest_encoding name) => bytea
> 2.- Then encode the bytes in hex: encode(data bytea, format text) => text
>
> then, to revert it you:
>
> 3.- Decode the hex string to bytes: decode(string text, format text) => bytea
> 4.- Convert the bytea, in a controlled encoding, to text:
> convert_from(string bytea, src_encoding name) => text
>
> As you see, they are nicelly paired. I see another response which just
> does encode , decode+convert_from. This works because the database
> does implicit conversions, but I would not recomend it. I cannot try
> it because all my databases are UTF-8 but I feel Adrians example would
> not work if your database encoding is NOT UTF-8 ( and you use any char
> outside of ascii range ).

To follow up:

postgres@postgres=# CREATE DATABASE latin1_db ENCODING 'LATIN1' LC_CTYPE
'C' LC_COLLATE 'C' TEMPLATE template0;
CREATE DATABASE

postgres@latin1_db=# \l latin1_db
                            List of databases
    Name    |  Owner   | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+---------+-------+-------------------
  latin1_db | postgres | LATIN1   | C       | C     |


postgres@postgres=# \c latin1_db
You are now connected to database "latin1_db" as user "postgres".

postgres@latin1_db=# \encoding
UTF8

postgres@latin1_db=# select encode('é', 'hex');
  encode
--------
  e9
(1 row)

postgres@latin1_db=# select decode('e9', 'hex');
  decode
--------
  \xe9
(1 row)

postgres@latin1_db=# select convert_from('\xe9', 'latin1');
  convert_from
--------------
  é
(1 row)

>
> Look at the docs of the functions, section 9.4 table 9.7 int the 9.4.5 manual.
>
> If you do it this way, you can also choose the encoding, ie, if you
> know your data is latin1 you can convert from/to it and save a few
> bytes, or you can convert to/from utf8 an insure you can represent
> anything. Then you can encode/decode the bytes in whatever sutis you,
> hex, as in yuour eample or base64 if you need to save a few bytes.
>
> Types are there for a reason.
>
> Francisco Olarte.
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Convert from hex to string

От

Francisco Olarte

Дата:

25 ноября 2015 г., 21:30:56

On Wed, Nov 25, 2015 at 6:22 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:
>> 1.- Convert it to a bytea, in a controlled encoding: convert_to(string
>> text, dest_encoding name) => bytea
>> 2.- Then encode the bytes in hex: encode(data bytea, format text) => text
>>
>> then, to revert it you:
>>
>> 3.- Decode the hex string to bytes: decode(string text, format text) =>
>> bytea
>
>
> Can't this be shortened to:
>
> aklaver@test=> select convert_to( 'é', 'latin1');
>  convert_to
> ------------
>  \xe9
> (1 row)
>
> aklaver@test=> select convert_from( '\xe9', 'latin1');
>  convert_from
> --------------
>  é
>
> since convert_to() outputs bytea and convert_from() accepts bytea?

Yes, but he originally asked to convert to hex, I assumed a hex string.

Note you are using hex because this is th default encoding for bytea
output in the database, as the database needs to convert everything to
text to send it to psql to display it and psql sends text which the
database needs to convert to operate.

But if you are using something like, say, java, you would need to bind
convert_to output and convert_from input to a byte[] ( although maybe
jdbc is sending/receivine hex strings in the client encoding, the wire
protocol is transparent to you ), which you can then print however you
like, if you want to bind String with hex encoded data you nide the
encode/decode steps.

>> As you see, they are nicelly paired. I see another response which just
>> does encode , decode+convert_from. This works because the database
>> does implicit conversions, but I would not recomend it. I cannot try
>> it because all my databases are UTF-8 but I feel Adrians example would
>> not work if your database encoding is NOT UTF-8 ( and you use any char
>> outside of ascii range ).
>
> If you are doing all this in the same database I am not sure how the above
> applies?

You explicitly used convert_from with UTF8, if the database was latin
1 or ebcdic you would have an encoding mismatch, as the text output
routines will convert the input text to bytea using that encoding.

> Would you not just use the database encoding for the src_encoding?

I do not quite understand the question, if you want your
encode-decode-convert_from towork, yes, you use the database encoding
in convert from, but I did not see a 'show client encoding' or similar
thing in your sample.

Anyway, I was assuming the hex conversion was needed for something
more complex than just pasting the data, for that anything can go,
including a \g | perl -pe 'unpack H*'. So I tried to show how the data
flows without relying on any implicit conversion, the
convert_to+encode => decode+convert_from works in any client encoding,
even in a thing like ebcdic.

Francisco Olarte.

Re: Convert from hex to string

От

Francisco Olarte

Дата:

25 ноября 2015 г., 21:49:53

Mail pingpong day. ;-)

On Wed, Nov 25, 2015 at 7:27 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

> postgres@latin1_db=# \encoding
> UTF8

This does not matter, as you are sending/receiving hex data, and the
encoding done when sending query results to you gets reverted when you
send queries back to the server.

> postgres@latin1_db=# select convert_from('\xe9', 'latin1');
>  convert_from
> --------------
>  é
> (1 row)

This DOES matter, you've had to change the query for it to work, you
had to look up ( or know beforehand ) the database encoding and change
it accordingly. . I mean, when you do encode('text',....) what you are
really asking for is encode( implicit_text_to_bytea_conversion(text),
....), so you save one step, but them you have to change your query to
the 'latin1' encoding you discovered. This is what I meant, you had to
look at the database properties. But if you do an explicit
convert_form with any encoding capable of representing all your data,
like utf-8 which can represent anything,  the database encoding does
not matter. And it should not, proper code should work with any
database encoding. Even more, I can do encode(convert_to(utf8)) in a
latin1 database conecting with any encoding, then send the hex to
convert_from(decode(),utf8) to an ebcdic database use another encoding
( Of course I may need to transcode the hex, but not to the ebcdic,
but to the client encoding been used in teh second case ), ant it
would work as long as all the characters In the source data are
representable in the destination database ( if they are not you are
out luck in any scheme you may think off ).

You cannot encode generically an string to hex unless you define an
encoding. How do you encode '1', "31" or "F1" ? or maybe "0031" or
"3100"? You can do a quick hack, but normally what you want is first
to encode a sequence of characters to a sequence of bytes and then
hex-encode that, as nearly everybody uses the same conversion for
hex-encoding a byte sequence. This means you can have a '0' in a
ebcdic database, transform it to to [0x30] byte array, encode this as
"30" and then transform the later to 00 30 00 10 because you are using
UTF16-BE wire encoding. Encoding is tricky enough without relying on
implicit convertion or on a character being the same as a byte.

Francisco Olarte.

Re: Convert from hex to string

От

Yuriy Rusinov

Дата:

25 ноября 2015 г., 22:18:38

Thanks a lot.

All works fine.

On Wed, Nov 25, 2015 at 9:49 PM, Francisco Olarte <folarte@peoplecall.com> wrote:

Mail pingpong day. ;-)

On Wed, Nov 25, 2015 at 7:27 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

> postgres@latin1_db=# \encoding
> UTF8

This does not matter, as you are sending/receiving hex data, and the
encoding done when sending query results to you gets reverted when you
send queries back to the server.

> postgres@latin1_db=# select convert_from('\xe9', 'latin1');
> convert_from
> --------------
> é
> (1 row)

This DOES matter, you've had to change the query for it to work, you
had to look up ( or know beforehand ) the database encoding and change
it accordingly. . I mean, when you do encode('text',....) what you are
really asking for is encode( implicit_text_to_bytea_conversion(text),
....), so you save one step, but them you have to change your query to
the 'latin1' encoding you discovered. This is what I meant, you had to
look at the database properties. But if you do an explicit
convert_form with any encoding capable of representing all your data,
like utf-8 which can represent anything, the database encoding does
not matter. And it should not, proper code should work with any
database encoding. Even more, I can do encode(convert_to(utf8)) in a
latin1 database conecting with any encoding, then send the hex to
convert_from(decode(),utf8) to an ebcdic database use another encoding
( Of course I may need to transcode the hex, but not to the ebcdic,
but to the client encoding been used in teh second case ), ant it
would work as long as all the characters In the source data are
representable in the destination database ( if they are not you are
out luck in any scheme you may think off ).

You cannot encode generically an string to hex unless you define an
encoding. How do you encode '1', "31" or "F1" ? or maybe "0031" or
"3100"? You can do a quick hack, but normally what you want is first
to encode a sequence of characters to a sequence of bytes and then
hex-encode that, as nearly everybody uses the same conversion for
hex-encoding a byte sequence. This means you can have a '0' in a
ebcdic database, transform it to to [0x30] byte array, encode this as
"30" and then transform the later to 00 30 00 10 because you are using
UTF16-BE wire encoding. Encoding is tricky enough without relying on
implicit convertion or on a character being the same as a byte.

Francisco Olarte.

Best regards,
Sincerely yours,
Yuriy Rusinov.

Re: Convert from hex to string

От

Adrian Klaver

Дата:

25 ноября 2015 г., 23:33:06

On 11/25/2015 10:49 AM, Francisco Olarte wrote:
> Mail pingpong day. ;-)
>
> On Wed, Nov 25, 2015 at 7:27 PM, Adrian Klaver
> <adrian.klaver@aklaver.com> wrote:
>
>> postgres@latin1_db=# \encoding
>> UTF8
>
> This does not matter, as you are sending/receiving hex data, and the
> encoding done when sending query results to you gets reverted when you
> send queries back to the server.
>
>
>> postgres@latin1_db=# select convert_from('\xe9', 'latin1');
>>   convert_from
>> --------------
>>   é
>> (1 row)
>
> This DOES matter, you've had to change the query for it to work, you
> had to look up ( or know beforehand ) the database encoding and change
> it accordingly. . I mean, when you do encode('text',....) what you are
> really asking for is encode( implicit_text_to_bytea_conversion(text),
> ....), so you save one step, but them you have to change your query to
> the 'latin1' encoding you discovered. This is what I meant, you had to
> look at the database properties. But if you do an explicit
> convert_form with any encoding capable of representing all your data,
> like utf-8 which can represent anything,  the database encoding does
> not matter. And it should not, proper code should work with any
> database encoding. Even more, I can do encode(convert_to(utf8)) in a
> latin1 database conecting with any encoding, then send the hex to
> convert_from(decode(),utf8) to an ebcdic database use another encoding
> ( Of course I may need to transcode the hex, but not to the ebcdic,
> but to the client encoding been used in teh second case ), ant it
> would work as long as all the characters In the source data are
> representable in the destination database ( if they are not you are
> out luck in any scheme you may think off ).


I will grant you that working with encodings is like working with
timestamps, explicit is better. The thing I am having a problem with is
how not knowing the context of the bytea value is different in the
implicit vs explicit case:

create table hex_test(hex_fld bytea);

aklaver@latin1_db=> select convert_to('é', 'utf8');
  convert_to
------------
  \xc3a9
(1 row)

aklaver@latin1_db=> insert into hex_test values ('\xc3a9');


aklaver@latin1_db=> select encode('é', 'hex');
  encode
--------
  e9
(1 row)

aklaver@latin1_db=> select decode('e9', 'hex');
  decode
--------
  \xe9
(1 row)


aklaver@latin1_db=> insert into hex_test values ('\xe9');

aklaver@latin1_db=> select * from hex_test ;
  hex_fld
---------
  \xc3a9
  \xe9
(2 rows)


aklaver@latin1_db=> select convert_from(hex_fld, 'latin1') from hex_test ;
  convert_from
--------------
  Ã©
  é
(2 rows)

aklaver@latin1_db=> select convert_from(hex_fld, 'utf8') from hex_test ;
ERROR:  invalid byte sequence for encoding "UTF8": 0xe9

Granted the above is contrived and bound to fail, but the point is you
need to know what created the bytea however it got there. Now if you are
in charge of both ends of the process, then the above is your own fault.
Otherwise, you are down to detective work on what encoding was used
whether it was implicit or explicit. As the OP was working in a single
context I am not seeing the issue in making use of that context to do
the heavy lifting. For the use cases that you show I agree that a
defined convert_to/encode/decode/convert_from chain is a best practice
and something I had not really thought out, so thanks.

>
> You cannot encode generically an string to hex unless you define an
> encoding. How do you encode '1', "31" or "F1" ? or maybe "0031" or
> "3100"? You can do a quick hack, but normally what you want is first
> to encode a sequence of characters to a sequence of bytes and then
> hex-encode that, as nearly everybody uses the same conversion for
> hex-encoding a byte sequence. This means you can have a '0' in a
> ebcdic database, transform it to to [0x30] byte array, encode this as
> "30" and then transform the later to 00 30 00 10 because you are using
> UTF16-BE wire encoding. Encoding is tricky enough without relying on
> implicit convertion or on a character being the same as a byte.
>
> Francisco Olarte.
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Convert from hex to string

От

Francisco Olarte

Дата:

26 ноября 2015 г., 10:13:46

Hi Adrian:

On Wed, Nov 25, 2015 at 9:33 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:
> I will grant you that working with encodings is like working with
> timestamps, explicit is better. The thing I am having a problem with is how
> not knowing the context of the bytea value is different in the implicit vs
> explicit case:

> Granted the above is contrived and bound to fail, but the point is you need
> to know what created the bytea however it got there. Now if you are in
> charge of both ends of the process, then the above is your own fault.
> Otherwise, you are down to detective work on what encoding was used whether
> it was implicit or explicit. As the OP was working in a single context I am
> not seeing the issue in making use of that context to do the heavy lifting.
> For the use cases that you show I agree that a defined
> convert_to/encode/decode/convert_from chain is a best practice and something
> I had not really thought out, so thanks.

Well, I tend to think as a programmer. So I was thinking of the
scenario where you want to get some text from the database and move it
around, and you control all the code. What I was trying to point is
that using explicit all the character sets that matter are in the
database, so I can move it around freely, as I'm the one defining the
queries, while in the implicit case I have to know, or get from the
catalogs, the encoding of the database. I do not know what context the
OP was working and wanted to point he was mixing types. Postgres has a
lot of them, specially to/from text, and I've found the hard way that
lots of implicit conversions are great for one shot programs or
interactive tests, but relying on implicit type conversions for real
production code, put in a source, causes a lot of problems. In a
single session case you can even use implicit conversion + encode and
then paste the result into a convert_from adding quotes and x and it's
going to work, but if you write down that in code you are going to be
confused if something fails later, things like:

cdrs=> select encode('Año','hex');
  encode
----------
 41c3b16f
cdrs=> select convert_from('\x41c3b16f', 'UTF-8');
 convert_from
--------------
 Año
(1 row)

seem like you are converting back and forth, but then:

cdrs=> select convert_from(encode('Año','hex'), 'UTF-8');
ERROR:  function convert_from(text, unknown) does not exist
LINE 1: select convert_from(encode('Año','hex'), 'UTF-8');

And also, the encode() example will give different results depending
on database encoding. Using explicit charsets and correct types frees
me from it. After all, to make the full round trip a
covert-to+encode+decode+convert-from is needed, and making it explciti
makes things easier.

After all, not everybody knows that all values in the database pass
through a conversion to/from text to be sent to / received from psql,
and that you can use it if you just want the hex digits in the current
database encoding:

cdrs=> select 'Año'::bytea;
   bytea
------------
 \x41c3b16f
(1 row)

But I doubt using cast instead of encode can be recomended.

And for the heavy lifting, you are just saving some keystrokes, which
IMO is a cheap price to pay for having a nearly self documenting
conversion chain.

Anyway, I think people should be told to respect types, and people
should be teach that strings are sequences of characters, that to do
what people think is 'hex encoding' ( two digits per *byte*, no
delimiter ), you need to first transform the string to bytes, then
hex-encode that. Nearly all the encoding problems I see is because
people thing strings are sequences of bytes, which they ceased to be
when multibyte encodings where detected ( and even without them, in
Java and I think some of the windows NT API Strings where sequences of
16 bits thingos, first UCS-2. then UTF16. Once people stop trying to
encode/decode strings directly normally they problems vanish.

Francisco Olarte.

Re: Convert from hex to string

От

Adrian Klaver

Дата:

26 ноября 2015 г., 18:48:04

On 11/25/2015 11:12 PM, Francisco Olarte wrote:
> Hi Adrian:
>
> On Wed, Nov 25, 2015 at 9:33 PM, Adrian Klaver
> <adrian.klaver@aklaver.com> wrote:
>> I will grant you that working with encodings is like working with
>> timestamps, explicit is better. The thing I am having a problem with is how
>> not knowing the context of the bytea value is different in the implicit vs
>> explicit case:
>
>> Granted the above is contrived and bound to fail, but the point is you need
>> to know what created the bytea however it got there. Now if you are in
>> charge of both ends of the process, then the above is your own fault.
>> Otherwise, you are down to detective work on what encoding was used whether
>> it was implicit or explicit. As the OP was working in a single context I am
>> not seeing the issue in making use of that context to do the heavy lifting.
>> For the use cases that you show I agree that a defined
>> convert_to/encode/decode/convert_from chain is a best practice and something
>> I had not really thought out, so thanks.
>
> Well, I tend to think as a programmer. So I was thinking of the
> scenario where you want to get some text from the database and move it
> around, and you control all the code. What I was trying to point is
> that using explicit all the character sets that matter are in the
> database, so I can move it around freely, as I'm the one defining the
> queries, while in the implicit case I have to know, or get from the
> catalogs, the encoding of the database. I do not know what context the
> OP was working and wanted to point he was mixing types. Postgres has a
> lot of them, specially to/from text, and I've found the hard way that
> lots of implicit conversions are great for one shot programs or
> interactive tests, but relying on implicit type conversions for real
> production code, put in a source, causes a lot of problems. In a
> single session case you can even use implicit conversion + encode and
> then paste the result into a convert_from adding quotes and x and it's
> going to work, but if you write down that in code you are going to be
> confused if something fails later, things like:
>
> cdrs=> select encode('Año','hex');
>    encode
> ----------
>   41c3b16f
> cdrs=> select convert_from('\x41c3b16f', 'UTF-8');
>   convert_from
> --------------
>   Año
> (1 row)
>
> seem like you are converting back and forth, but then:
>
> cdrs=> select convert_from(encode('Año','hex'), 'UTF-8');
> ERROR:  function convert_from(text, unknown) does not exist
> LINE 1: select convert_from(encode('Año','hex'), 'UTF-8');
>
> And also, the encode() example will give different results depending
> on database encoding. Using explicit charsets and correct types frees
> me from it. After all, to make the full round trip a
> covert-to+encode+decode+convert-from is needed, and making it explciti
> makes things easier.
>
>
> After all, not everybody knows that all values in the database pass
> through a conversion to/from text to be sent to / received from psql,
> and that you can use it if you just want the hex digits in the current
> database encoding:
>
> cdrs=> select 'Año'::bytea;
>     bytea
> ------------
>   \x41c3b16f
> (1 row)
>
> But I doubt using cast instead of encode can be recomended.
>
> And for the heavy lifting, you are just saving some keystrokes, which
> IMO is a cheap price to pay for having a nearly self documenting
> conversion chain.
>
> Anyway, I think people should be told to respect types, and people
> should be teach that strings are sequences of characters, that to do
> what people think is 'hex encoding' ( two digits per *byte*, no
> delimiter ), you need to first transform the string to bytes, then
> hex-encode that. Nearly all the encoding problems I see is because
> people thing strings are sequences of bytes, which they ceased to be
> when multibyte encodings where detected ( and even without them, in
> Java and I think some of the windows NT API Strings where sequences of
> 16 bits thingos, first UCS-2. then UTF16. Once people stop trying to
> encode/decode strings directly normally they problems vanish.

I am going to go over the above some more, but in the mean time thanks
for the detailed explanations of a subject I am weak on. I am getting
into this in dealing with the Python 2 --> 3 changes in its string/bytes
handling. Wish the Python developers had listened to the apart about
respecting types and not made str mean two entirely different things in
2 vs 3:)

>
> Francisco Olarte.
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Convert from hex to string

От

Jim Nasby

Дата:

07 декабря 2015 г., 03:47:20

On 11/26/15 1:12 AM, Francisco Olarte wrote:
> Anyway, I think people should be told to respect types, and people
> should be teach that strings are sequences of characters, that to do
> what people think is 'hex encoding' ( two digits per*byte*, no
> delimiter ), you need to first transform the string to bytes, then
> hex-encode that.

BTW, it wouldn't be that difficult to create a hex data type that worked
like bytea but accepted only hex strings. Likewise for a base64 type.
Only downside is you'd probably have to do it in C, because no other
language can handle cstring and I don't think there's any built-in
conversion functions with the correct parameters.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Convert from hex to string