Charset Normalizer

Library that help you read text from unknown charset encoding. Project motivated by chardet, I’m trying to resolve the issue by taking another approach. All IANA character set names for which the Python core library provides codecs are supported.



This library aim to assist you in finding what encoding suit the best to content. It DOES NOT try to uncover the originating encoding, in fact this program does not care about it.

By originating we means the one that was precisely used to encode a text file.


my_byte_str = 'Bonjour, je suis à la recherche d\'une aide sur les étoiles'.encode('cp1252')

We ARE NOT looking for cp1252 BUT FOR Bonjour, je suis à la recherche d'une aide sur les étoiles. Because of this

my_byte_str.decode('cp1252') == my_byte_str.decode('cp1256') == my_byte_str.decode('cp1258') == my_byte_str.decode('iso8859_14')
# Print True !

There is no wrong answer to decode my_byte_str to get the exact same result. This is where this library differ from others. There’s not specific probe per encoding table.


  • Encoding detection on a stream, bytes or file.
  • Transpose any encoded content to Unicode the best we can.
  • Detect spoken language in text.


Using PyPi

pip install charset_normalizer

Basic Usage


This package comes with a CLI

usage: normalizer [-h] [--verbose] [--normalize] [--replace] [--force]
                  file [file ...]

normalizer ./data/

|       Filename       | Encoding | Language |             Alphabets              | Chaos | Coherence |
| data/ |  cp1252  |  French  | Basic Latin and Latin-1 Supplement | 0.0 % |  84.924 % |


Just print out normalized text

from charset_normalizer import CharsetNormalizerMatches as CnM

Normalize any text file

from charset_normalizer import CharsetNormalizerMatches as CnM
    CnM.normalize('./') # should write to disk my_subtitle-***.srt
except IOError as e:
    print('Sadly, we are unable to perform charset normalization.', str(e))

Upgrade your code without effort

from charset_normalizer import detect

The above code will behave the same as chardet.