관리-도구
편집 파일: universaldetector.cpython-38.opt-1.pyc
U ��.e�0 � @ s� d Z ddlZddlZddlZddlmZ ddlmZmZm Z ddl mZ ddlm Z ddlmZ dd lmZ G d d� de�ZdS )a Module containing the UniversalDetector detector class, which is the primary class a user of ``chardet`` should use. :author: Mark Pilgrim (initial port to Python) :author: Shy Shalom (original C code) :author: Dan Blanchard (major refactoring for 3.0) :author: Ian Cordasco � N� )�CharSetGroupProber)� InputState�LanguageFilter�ProbingState)�EscCharSetProber)�Latin1Prober)�MBCSGroupProber)�SBCSGroupProberc @ sn e Zd ZdZdZe�d�Ze�d�Ze�d�Z dddd d ddd d�Z ejfdd�Z dd� Zdd� Zdd� ZdS )�UniversalDetectoraq The ``UniversalDetector`` class underlies the ``chardet.detect`` function and coordinates all of the different charset probers. To get a ``dict`` containing an encoding and its confidence, you can simply run: .. code:: u = UniversalDetector() u.feed(some_bytes) u.close() detected = u.result g�������?s [�-�]s (|~{)s [�-�]zWindows-1252zWindows-1250zWindows-1251zWindows-1256zWindows-1253zWindows-1255zWindows-1254zWindows-1257)z iso-8859-1z iso-8859-2z iso-8859-5z iso-8859-6z iso-8859-7z iso-8859-8z iso-8859-9ziso-8859-13c C sN d | _ g | _d | _d | _d | _d | _d | _|| _t� t �| _d | _| � � d S )N)�_esc_charset_prober�_charset_probers�result�done� _got_data�_input_state� _last_char�lang_filter�loggingZ getLogger�__name__�logger�_has_win_bytes�reset)�selfr � r �I/usr/lib/python3.8/site-packages/pip/_vendor/chardet/universaldetector.py�__init__Q s zUniversalDetector.__init__c C sV dddd�| _ d| _d| _d| _tj| _d| _| jr>| j� � | j D ]}|� � qDdS )z� Reset the UniversalDetector and all of its probers back to their initial states. This is called by ``__init__``, so you only need to call this directly in between analyses of different documents. N� ��encoding� confidence�languageF� )r r r r r � PURE_ASCIIr r r r r )r �proberr r r r ^ s zUniversalDetector.resetc C s> | j r dS t|�sdS t|t�s(t|�}| js�|�tj�rJdddd�| _nv|�tj tj f�rldddd�| _nT|�d�r�dddd�| _n:|�d �r�d ddd�| _n |�tjtjf�r�dddd�| _d| _| jd dk r�d| _ dS | j tjk�r.| j�|��rtj| _ n*| j tjk�r.| j�| j| ��r.tj| _ |dd� | _| j tjk�r�| j�s^t| j�| _| j�|�tjk�r:| jj| j�� | jjd�| _d| _ n�| j tjk�r:| j�s�t | j�g| _| jt!j"@ �r�| j�#t$� � | j�#t%� � | jD ]:}|�|�tjk�r�|j|�� |jd�| _d| _ �q&�q�| j&�|��r:d| _'dS )a� Takes a chunk of a document and feeds it through all of the relevant charset probers. After calling ``feed``, you can check the value of the ``done`` attribute to see if you need to continue feeding the ``UniversalDetector`` more data, or if it has made a prediction (in the ``result`` attribute). .. note:: You should always call ``close`` when you're done feeding in your document if ``done`` is not already ``True``. Nz UTF-8-SIG� �?� r zUTF-32s �� zX-ISO-10646-UCS-4-3412s ��zX-ISO-10646-UCS-4-2143zUTF-16Tr ���)(r �len� isinstance� bytearrayr � startswith�codecs�BOM_UTF8r �BOM_UTF32_LE�BOM_UTF32_BE�BOM_LE�BOM_BEr r r# �HIGH_BYTE_DETECTOR�search� HIGH_BYTE�ESC_DETECTORr Z ESC_ASCIIr r r �feedr ZFOUND_IT�charset_name�get_confidencer! r r r ZNON_CJK�appendr r �WIN_BYTE_DETECTORr )r Zbyte_strr$ r r r r6 o s� � �� � � � �� � zUniversalDetector.feedc C st | j r| jS d| _ | js&| j�d� n�| jtjkrBdddd�| _n�| jtjkr�d}d}d}| j D ]"}|sjq`|� � }||kr`|}|}q`|r�|| jkr�|j}|j� � }|� � }|�d �r�| jr�| j�||�}|||jd�| _| j�� tjk�rn| jd dk�rn| j�d� | j D ]`}|�s�qt|t��rP|jD ] }| j�d|j|j|� � � �q,n| j�d|j|j|� � � �q| jS ) z� Stop analyzing the current document and come up with a final prediction. :returns: The ``result`` attribute, a ``dict`` with the keys `encoding`, `confidence`, and `language`. Tzno data received!�asciir% r&