API Reference¶
DataDirectory¶
- class datatc.data_directory.DataDirectory(path, contents=None, magic_data_interface=<datatc.data_interface.MagicDataInterfaceBase object>)¶
Manages saving, loading, and viewing data files within a specific data path.
- latest()¶
Return the latest data file or directory, as determined alphabetically.
- Return type
Union
[DataDirectory
,DataFile
]
- classmethod list_projects()¶
List all data directories previously registered via register_project.
- Return type
None
- classmethod load(hint)¶
Shortcut for load_project.
- classmethod load_project(hint)¶
Create a DataDirectory from a project hint previously registered via register_project.
- ls(full=False)¶
Print the contents of the data directory. Defaults to printing all subdirectories, but not all files.
- Parameters
full – Whether to print all files.
- Return type
None
- mkdir(dir_name)¶
Create a new directory within the current directory. :type dir_name:
str
:param dir_name: Name for the new directoryReturns: None
- classmethod register_project(project_hint, project_path)¶
Register a hint for a project data directory so that it can be easily reloaded via load(hint).
- Return type
None
- save(data, file_name, **kwargs)¶
Save a data object within the data directory.
- Parameters
data (
Any
) – data object to save.file_name (
str
) – file name for the saved object, including file extension. The file extension is used to determine the file type and method for saving the data.**kwargs – Remaining args are passed to the data interface save function.
- Return type
None
- select(hint)¶
Return the DataDirectory from self.contents that matches the hint. If more than one file matches the hint, then select the one that file whose type matches the hint exactly. Otherwise raise an error and display all matches.
- Parameters
hint (
str
) – string to use to search for a file within the directory.- Raises
FileNotFoundError – if no file can be found in the data directory that matches the hint.
ValueError – if more than one file is found in the data directory that matches the hint.
- Return type
Union
[DataDirectory
,DataFile
]
- class datatc.data_directory.SelfAwareDataDirectory(path, contents=None)¶
Subclass of DataDirectory that manages interacting with the file expression of SelfAwareData.
- get_info()¶
Get metadata about the SelfAwareData object.
- Return type
Dict
[str
,str
]
- load(data_interface_hint=None, load_function=True, **kwargs)¶
Load a saved data transformer- the data and the function that generated it.
- Parameters
data_interface_hint (
Optional
[str
]) – file extension indicating the data interface to use to load the file.load_function (
bool
) – Whether to load the transformation function of the SelfAwareData object. Specify False if the current environment does not support the dependencies of the transformation function.**kwargs – Remaining args are passed to the data interface save function.
- Return type
DataFile¶
- class datatc.data_directory.DataFile(path, contents=None)¶
- load(data_interface_hint=None, **kwargs)¶
Load a data file.
- Parameters
data_interface_hint – file extension indicating the data interface to use to load the file.
**kwargs – Remaining args are passed to the data interface save function.
- Return type
Any
SelfAwareData¶
- class datatc.self_aware_data.SelfAwareData(data, metadata=None)¶
A wrapper around a dataset that also contains the code that generated the data. SelfAwareData can re-run it’s transformation steps on a new dataset.
- classmethod load(file_path, data_interface_hint=None, **kwargs)¶
Load a SelfAwareData object.
- Parameters
file_path (
str
) – Path to the SAD to load.data_interface_hint – Hint for which kind of data interface to use to load the data (file extension).
- Example Usage:
>>> sad = SelfAwareData.load('~/project/data/sad_dir__2021-01-01_12-00__standard_features.csv')
- Return type
- classmethod load_from_file(file_path, **kwargs)¶
Create a SelfAwareData object with a initial SourceFileTransformStep
- Parameters
file_path (
str
) – path to a standard file (not already a SelfAwareData)
Returns: SelfAwareData with a TransformSequence containing a SourceFileTransformStep pointing to file_path
- Return type
- print_steps()¶
Print the code of the transformation steps that generated the data.
- rerun(data)¶
Rerun the same transformation function that generated this SelfAwareData on a new data object.
- Parameters
data –
Returns:
- Return type
Any
- save(file_path, **kwargs)¶
Save a SelfAwareData object.
- Parameters
file_path (
str
) – Path for where to save the SAD, including file extension.
Returns: Path to the saved SAD
- Return type
Path
- transform(transformer_func, tag='', enforce_clean_git=True, get_git_hash_from=None, **kwargs)¶
Transform a SelfAwareData, generating a new SelfAwareData object.
- Parameters
transformer_func (
Callable
) – Transform function to apply to data.tag (
str
) – (optional) short description of the transform for referenceenforce_clean_git – Whether to only allow the save to proceed if the working state of the git directory is clean.
get_git_hash_from (
Optional
[Any
]) – Locally installed module from which to get git information. Use this arg if transform_func is defined outside of a module tracked by git.
Returns: new transform directory name, for adding to contents dict.
- Return type
MagicDataInterface¶
- datatc.data_interface.MagicDataInterface¶
alias of <datatc.data_interface.MagicDataInterfaceBase object at 0x7f336d899e50>