idmtools.utils.hashing module#

Fast hash of Python objects.

class idmtools.utils.hashing.Hasher(hash_name='md5')[source]#

Bases: _Pickler

A subclass of pickler to do hashing, rather than pickling.

__init__(hash_name='md5')[source]#

Initialize our hasher.

Parameters:: hash_name – Hash type to use. Defaults to md5

hash(obj, return_digest=True)[source]#

Hash an object.

Parameters:

obj – Object to hash
return_digest – Should the digest be returned?

Returns:

None if return_digest is False, otherwise the hash digest is returned

save(obj)[source]#

Save an object to hash.

Parameters:: obj – Obj to save.
Returns:: None

memoize(obj)[source]#: Disable memoization for strings so hashing happens on value and not reference.

save_set(set_items)[source]#

Save set hashing.

Parameters:: set_items – Set items
Returns:: None

idmtools.utils.hashing.hash_obj(obj, hash_name='md5')[source]#

Quick calculation of a hash to identify uniquely Python objects.

Parameters:

obj – Object to hash
hash_name – The hashing algorithm to use. ‘md5’ is faster; ‘sha1’ is considered safer.

idmtools.utils.hashing.ignore_fields_in_dataclass_on_pickle(item)[source]#

Ignore certain fields for pickling on dataclasses.

Parameters:: item – Item to pickle
Returns:: State of item to pickle

idmtools.utils.hashing.calculate_md5(filename: str, chunk_size: int = 8192) → str[source]#

Calculate MD5.

Parameters:

filename – Filename to caclulate md5 for
chunk_size – Chunk size

Returns:

md5 as string

idmtools.utils.hashing.calculate_md5_stream(stream: BytesIO | BinaryIO, chunk_size: int = 8192, hash_type: str = 'md5', file_hash=None)[source]#

Calculate md5 on stream.

Parameters:

chunk_size –
stream –
hash_type – Hash function
file_hash – File hash

Returns:

md5 of stream