> Erlang中文手册 > bom_to_encoding/1 检测一个二进制数据的 UTF 字节顺序标记

unicode:bom_to_encoding/1

检测一个二进制数据的 UTF 字节顺序标记

用法:

bom_to_encoding(Bin) -> {Encoding, Length}

内部实现:

-spec bom_to_encoding(Bin) -> {Encoding, Length} when
      Bin :: binary(),
      Encoding ::  'latin1' | 'utf8'
                 | {'utf16', endian()}
                 | {'utf32', endian()},
      Length :: non_neg_integer().

bom_to_encoding(>) ->
    {utf8,3};
bom_to_encoding(>) ->
    {{utf32,big},4};
bom_to_encoding(>) ->
    {{utf32,little},4};
bom_to_encoding(>) ->
    {{utf16,big},2};
bom_to_encoding(>) ->
    {{utf16,little},2};
bom_to_encoding(Bin) when is_binary(Bin) ->
    {latin1,0}.

检测一个二进制数据 Bin 的 UTF 字节顺序标记(Byte Order Mark)

unicode:bom_to_encoding(>).
unicode:bom_to_encoding(>).
unicode:bom_to_encoding(>).

如果找不到字节顺序标记,则返回 {latin1,0}。

unicode:bom_to_encoding(>).

下面把读入的文件 test.txt 的编码 encoding 设置为输出端的编码:

{ok, File} = file:open("test.txt", [read, binary]),
{ok, Bin} = file:read(File, 4),
{Encoding, _Length} = unicode:bom_to_encoding(Bin),
io:setopts(File, [{encoding, Encoding}]).