API Reference

DataDirectory

class datatc.data_directory.DataDirectory(path, contents=None, magic_data_interface=<datatc.data_interface.MagicDataInterfaceBase object>)

Manages saving, loading, and viewing data files within a specific data path.

latest()

Return the latest data file or directory, as determined alphabetically.

Return type

Union[DataDirectory, DataFile]

classmethod list_projects()

List all data directories previously registered via register_project.

Return type

None

classmethod load(hint)

Shortcut for load_project.

classmethod load_project(hint)

Create a DataDirectory from a project hint previously registered via register_project.

ls(full=False)

Print the contents of the data directory. Defaults to printing all subdirectories, but not all files.

Parameters

full – Whether to print all files.

Return type

None

mkdir(dir_name)

Create a new directory within the current directory. :type dir_name: str :param dir_name: Name for the new directory

Returns: None

classmethod register_project(project_hint, project_path)

Register a hint for a project data directory so that it can be easily reloaded via load(hint).

Return type

None

save(data, file_name, **kwargs)

Save a data object within the data directory.

Parameters
  • data (Any) – data object to save.

  • file_name (str) – file name for the saved object, including file extension. The file extension is used to determine the file type and method for saving the data.

  • **kwargs – Remaining args are passed to the data interface save function.

Return type

None

select(hint)

Return the DataDirectory from self.contents that matches the hint. If more than one file matches the hint, then select the one that file whose type matches the hint exactly. Otherwise raise an error and display all matches.

Parameters

hint (str) – string to use to search for a file within the directory.

Raises
  • FileNotFoundError – if no file can be found in the data directory that matches the hint.

  • ValueError – if more than one file is found in the data directory that matches the hint.

Return type

Union[DataDirectory, DataFile]

class datatc.data_directory.SelfAwareDataDirectory(path, contents=None)

Subclass of DataDirectory that manages interacting with the file expression of SelfAwareData.

get_info()

Get metadata about the SelfAwareData object.

Return type

Dict[str, str]

load(data_interface_hint=None, load_function=True, **kwargs)

Load a saved data transformer- the data and the function that generated it.

Parameters
  • data_interface_hint (Optional[str]) – file extension indicating the data interface to use to load the file.

  • load_function (bool) – Whether to load the transformation function of the SelfAwareData object. Specify False if the current environment does not support the dependencies of the transformation function.

  • **kwargs – Remaining args are passed to the data interface save function.

Return type

SelfAwareData

DataFile

class datatc.data_directory.DataFile(path, contents=None)
load(data_interface_hint=None, **kwargs)

Load a data file.

Parameters
  • data_interface_hint – file extension indicating the data interface to use to load the file.

  • **kwargs – Remaining args are passed to the data interface save function.

Return type

Any

SelfAwareData

class datatc.self_aware_data.SelfAwareData(data, metadata=None)

A wrapper around a dataset that also contains the code that generated the data. SelfAwareData can re-run it’s transformation steps on a new dataset.

classmethod load(file_path, data_interface_hint=None, **kwargs)

Load a SelfAwareData object.

Parameters
  • file_path (str) – Path to the SAD to load.

  • data_interface_hint – Hint for which kind of data interface to use to load the data (file extension).

Example Usage:
>>> sad = SelfAwareData.load('~/project/data/sad_dir__2021-01-01_12-00__standard_features.csv')
Return type

SelfAwareData

classmethod load_from_file(file_path, **kwargs)

Create a SelfAwareData object with a initial SourceFileTransformStep

Parameters

file_path (str) – path to a standard file (not already a SelfAwareData)

Returns: SelfAwareData with a TransformSequence containing a SourceFileTransformStep pointing to file_path

Return type

SelfAwareData

print_steps()

Print the code of the transformation steps that generated the data.

rerun(data)

Rerun the same transformation function that generated this SelfAwareData on a new data object.

Parameters

data

Returns:

Return type

Any

save(file_path, **kwargs)

Save a SelfAwareData object.

Parameters

file_path (str) – Path for where to save the SAD, including file extension.

Returns: Path to the saved SAD

Return type

Path

transform(transformer_func, tag='', enforce_clean_git=True, get_git_hash_from=None, **kwargs)

Transform a SelfAwareData, generating a new SelfAwareData object.

Parameters
  • transformer_func (Callable) – Transform function to apply to data.

  • tag (str) – (optional) short description of the transform for reference

  • enforce_clean_git – Whether to only allow the save to proceed if the working state of the git directory is clean.

  • get_git_hash_from (Optional[Any]) – Locally installed module from which to get git information. Use this arg if transform_func is defined outside of a module tracked by git.

Returns: new transform directory name, for adding to contents dict.

Return type

SelfAwareData

MagicDataInterface

datatc.data_interface.MagicDataInterface

alias of <datatc.data_interface.MagicDataInterfaceBase object at 0x7f336d899e50>