reuse.extract module

Utilities related to the extraction of REUSE information out of files.

reuse.extract.get_encoding_module() ModuleType[source]

Get the module used to detect the encodings of files.

reuse.extract.set_encoding_module(name: Literal['magic', 'charset_normalizer', 'chardet']) ModuleType[source]

Set the module used to detect the encodings of files, and return the module.

reuse.extract.CHUNK_SIZE = 65536

Default chunk size for reading files.

reuse.extract.LINE_SIZE = 1024

Default line size for reading files.

reuse.extract.HEURISTICS_CHUNK_SIZE = 2048

Default chunk size used to heuristically detect file type, encoding, et cetera.

class reuse.extract.FilterBlock(text: str, in_ignore_block: bool)[source]

Bases: NamedTuple

A simple tuple that holds a block of text, and whether that block of text is in an ignore block.

text: str

Alias for field number 0

in_ignore_block: bool

Alias for field number 1

reuse.extract.filter_ignore_block(text: str, in_ignore_block: bool = False) FilterBlock[source]

Filter out blocks beginning with REUSE_IGNORE_START and ending with REUSE_IGNORE_END to remove lines that should not be treated as copyright and licensing information.

Parameters:
  • text – The text out of which the ignore blocks must be filtered.

  • in_ignore_block – Whether the text is already in an ignore block. This is useful when you parse subsequent chunks of text, and one chunk does not close the ignore block.

Returns:

A FilterBlock tuple that contains the filtered text and a boolean that signals whether the ignore block is still open.

reuse.extract.extract_reuse_info(text: str) ReuseInfo[source]

Extract REUSE information from a multi-line text block.

Raises:
  • ExpressionError – if an SPDX expression could not be parsed.

  • ParseError – if an SPDX expression could not be parsed.

reuse.extract.detect_encoding(chunk: bytes) str | None[source]

Find the encoding of the bytes chunk, and return it as normalised name. See encodings.normalize_encoding(). If no encoding could be found, return None.

If the chunk is empty or the encoding of the chunk is ASCII, 'utf_8' is returned.

reuse.extract.detect_newline(chunk: bytes, encoding: str = 'ascii') str[source]

Return one of '\n', '\r' or '\r\n' depending on the line endings used in chunk. Return os.linesep if there are no line endings.

reuse.extract.reuse_info_of_file(fp: BinaryIO, chunk_size: int = 65536, line_size: int = 1024) ReuseInfo[source]

Read from fp to extract REUSE information. It is read in chunks of chunk_size, additionally reading up to line_size until the next newline.

This function decodes the binary data into UTF-8 and removes REUSE ignore blocks before attempting to extract the REUSE information.

reuse.extract.contains_reuse_info(text: str) bool[source]

The text contains REUSE info.