idmtools.utils.hashing module

Fast hash of Python objects.

Copyright 2021, Bill & Melinda Gates Foundation. All rights reserved.

class idmtools.utils.hashing.Hasher(hash_name='md5')[source]

Bases: _Pickler

A subclass of pickler to do hashing, rather than pickling.

__init__(hash_name='md5')[source]

Initialize our hasher.

Parameters:

hash_name – Hash type to use. Defaults to md5

hash(obj, return_digest=True)[source]

Hash an object.

Parameters:
  • obj – Object to hash

  • return_digest – Should the digest be returned?

Returns:

None if return_digest is False, otherwise the hash digest is returned

save(obj)[source]

Save an object to hash.

Parameters:

obj – Obj to save.

Returns:

None

memoize(obj)[source]

Disable memoization for strings so hashing happens on value and not reference.

save_set(set_items)[source]

Save set hashing.

Parameters:

set_items – Set items

Returns:

None

idmtools.utils.hashing.hash_obj(obj, hash_name='md5')[source]

Quick calculation of a hash to identify uniquely Python objects.

Parameters:
  • obj – Object to hash

  • hash_name – The hashing algorithm to use. ‘md5’ is faster; ‘sha1’ is considered safer.

idmtools.utils.hashing.ignore_fields_in_dataclass_on_pickle(item)[source]

Ignore certain fields for pickling on dataclasses.

Parameters:

item – Item to pickle

Returns:

State of item to pickle

idmtools.utils.hashing.calculate_md5(filename: str, chunk_size: int = 8192) str[source]

Calculate MD5.

Parameters:
  • filename – Filename to caclulate md5 for

  • chunk_size – Chunk size

Returns:

md5 as string

idmtools.utils.hashing.calculate_md5_stream(stream: BytesIO | BinaryIO, chunk_size: int = 8192, hash_type: str = 'md5', file_hash=None)[source]

Calculate md5 on stream.

Parameters:
  • chunk_size

  • stream

  • hash_type – Hash function

  • file_hash – File hash

Returns:

md5 of stream