reuse.extract module¶
Utilities related to the extraction of REUSE information out of files.
- reuse.extract.get_encoding_module() ModuleType[source]¶
Get the module used to detect the encodings of files.
- reuse.extract.set_encoding_module(name: Literal['magic', 'charset_normalizer', 'chardet']) ModuleType[source]¶
Set the module used to detect the encodings of files, and return the module.
- reuse.extract.CHUNK_SIZE = 65536¶
Default chunk size for reading files.
- reuse.extract.LINE_SIZE = 1024¶
Default line size for reading files.
- reuse.extract.HEURISTICS_CHUNK_SIZE = 2048¶
Default chunk size used to heuristically detect file type, encoding, et cetera.
- class reuse.extract.FilterBlock(text: str, in_ignore_block: bool)[source]¶
Bases:
NamedTupleA simple tuple that holds a block of text, and whether that block of text is in an ignore block.
- reuse.extract.filter_ignore_block(text: str, in_ignore_block: bool = False) FilterBlock[source]¶
Filter out blocks beginning with REUSE_IGNORE_START and ending with REUSE_IGNORE_END to remove lines that should not be treated as copyright and licensing information.
- Parameters:
text – The text out of which the ignore blocks must be filtered.
in_ignore_block – Whether the text is already in an ignore block. This is useful when you parse subsequent chunks of text, and one chunk does not close the ignore block.
- Returns:
A
FilterBlocktuple that contains the filtered text and a boolean that signals whether the ignore block is still open.
- reuse.extract.extract_reuse_info(text: str) ReuseInfo[source]¶
Extract REUSE information from a multi-line text block.
- Raises:
ExpressionError – if an SPDX expression could not be parsed.
ParseError – if an SPDX expression could not be parsed.
- reuse.extract.detect_encoding(chunk: bytes) str | None[source]¶
Find the encoding of the bytes chunk, and return it as normalised name. See
encodings.normalize_encoding(). If no encoding could be found, returnNone.If the chunk is empty or the encoding of the chunk is ASCII,
'utf_8'is returned.
- reuse.extract.detect_newline(chunk: bytes, encoding: str = 'ascii') str[source]¶
Return one of
'\n','\r'or'\r\n'depending on the line endings used in chunk. Returnos.linesepif there are no line endings.
- reuse.extract.reuse_info_of_file(fp: BinaryIO, chunk_size: int = 65536, line_size: int = 1024) ReuseInfo[source]¶
Read from fp to extract REUSE information. It is read in chunks of chunk_size, additionally reading up to line_size until the next newline.
This function decodes the binary data into UTF-8 and removes REUSE ignore blocks before attempting to extract the REUSE information.