如何使用 Python 检测文本文件的编码

Created: November-22, 2018

Python 中有一个有用的包 - chardet，它有助于检测文件中使用的编码。实际上没有程序可以 100％放心地说使用了哪种编码 - 这就是为什么 chardet 给编码文件编码的概率最高的原因。Chardet 可以检测以下编码：

你可以使用 pip 命令安装 chardet ：

pip install chardet

之后你可以在命令行中使用 chardet：

% chardetect somefile someotherfile
somefile: windows-1252 with confidence 0.5
someotherfile: ascii with confidence 1.0

或者在 python 中：

import chardet    
rawdata = open(file, "r").read()
result = chardet.detect(rawdata)
charenc = result['encoding']