combustion.data¶

PLACEHOLDER

combustion.data

Saving and Loading
Window Operations

Saving and Loading ¶

combustion.data.save_hdf5(dataset, path, num_shards=None, shard_size=None, verbose=True)[source]¶

Saves the contents of the dataset to one or more HDF5 files.

Serialization is performed as follows:

Dataset partitions are determined if required by num_shards or shard_size. By default, only a single file containing the entire dataset will be produced.
Examples are read by iterating over the dataset and are written to disk. For multiple shards, a shard index is added to the filename given in path.
Attributes accessible by vars(self) are attached as HDF5 attributes, allowing for loading of instance variables. Tensors are not saved in this way, as all attributes should be small.

Note

Serialization requires the h5py library. See http://docs.h5py.org/en/stable/index.html for more details.

Note

When saving multiple shards, the file created at path will be created from a h5py.VirtualSource. See Virtual Dataset for more details.

Parameters

dataset (Datset) – The dataset to save.
path (str) – The filepath to save to. Ex foo/bar.h5.
num_shards (int, optional) – If given, num_shards files will be created, each containing 1 / num_shards of the dataset. Exclusive with shard_size. Must be a positive int.
shard_size (int, optional) – If given, multiple files will be created such that each file contains shard_size examples. Exclusive with num_shards. Must be a positive int.
verbose (bool, optional) – If False, do not print progress updates during saving.

Return type

None

combustion.data.save_torch(dataset, path, prefix='example_', verbose=True)[source]¶

Saves the contents of the dataset to multiple files using torch.save().

Note

This is less elegant than HDF5 serialization, but is a thread safe alternative.

Parameters

dataset (Dataset) – The dataset to save.
path (str) – The filepath to save to. Ex foo/bar.
prefix (str, optional) – A prefix to append to each .pth file. Output files will be of the form {path}/{prefix}{index}.pth
verbose (bool, optional) – If False, do not print progress updates during saving.

Return type

None

class combustion.data.SerializeMixin[source]¶

Mixin to enable serialization a map or iterable style dataset to disk in HDF5 or Torch file format.

static load(path, fmt=None, transform=None, target_transform=None, **kwargs)[source]¶

Loads the contents of a dataset previously saved with save(), returning a HDF5Dataset.

Warning

Using HDF5 in a parallel / multithreaded manner poses additional challenges that have not yet been overcome. As such, using a HDF5Dataset with torch.utils.data.DataLoader when num_workers > 1 will yield incorrect data. For in situations where multiple threads will be used, prefer saving with fmt="torch". See Parallel HDF5 for more details.

Note

Loading HDF5 files requires the h5py library. See http://docs.h5py.org/en/stable/index.html for more details.

Note

Dataset attributes are preserved when loading a HDF5 file, but not a Torch file.

Parameters

path (str) – The filepath to load from. See HDF5Dataset.load() for more details
fmt (str, optional) – The expected type of data to load. By default the data type is inferred from the file extensions found in path. HDF5 files are matched by the .h5 extension, and Torch files are matched by the .pth extension. If a mix of hdf5 and pth files are present in path, fmt can be used to ensure only the desired file types are loaded.
transform (callable, optional) – A tranform to be applied to the data tensor See HDF5Dataset for more details
target_transform (callable, optional) – A tranform to be applied to the label tensor See HDF5Dataset for more details
**kwargs – Forwarded to the constructors for HDF5Dataset or TorchDataset, depending on what dataset is constructed.

Return type

combustion.data.serialize.HDF5Dataset

save(path, fmt='hdf5', num_shards=None, shard_size=None, prefix='example_', verbose=True)[source]¶

Saves the contents of the dataset to disk. See save_hdf5() and save_torch() respectively for more information on how saving functions for HDF5 or Torch files.

Note

Serialization requires the h5py library. See http://docs.h5py.org/en/stable/index.html for more details.

Parameters

path (str) – The filepath to save to. Ex foo/bar.h5
fmt (str, optional) – The format to save in. Should be one of hdf5, torch.
num_shards (int, optional) – If given, num_shards files will be created, each containing 1 / num_shards of the dataset. Exclusive with shard_size. Must be a positive int. Only has an effect when fmt is "hdf5".
shard_size (int, optional) – If given, multiple files will be created such that each file contains shard_size examples. Exclusive with num_shards. Must be a positive int. Only has an effect when fmt is "hdf5".
prefix (str, optional) – Passted to save_torch() if fmt is "hdf5"
verbose (bool, optional) – If False, do not print progress updates during saving.

Return type

None

class combustion.data.HDF5Dataset(path, transform=None, target_transform=None)[source]¶

Dataset used to read from HDF5 files. See SerializeMixin for more details

Note

Requires the h5py library. See http://docs.h5py.org/en/stable/index.html for more details.

Note

This class is intended for use with HDF5 files produced by Combustion’s save methods. It may work with other HDF5 files, but this has not been verified yet.

Parameters

path (str) – The filepath to load from. When loading a sharded dataset, path should point to the virtual dataset master file. Ex "foo/bar.h5"
transform (optional, callable) – Transform to be applied to data tensors.
target_transform (optional, callable) – Transform to be applied to label tensors. If given, the loaded dataset must produce

class combustion.data.TorchDataset(path, transform=None, target_transform=None, pattern='*.pth')[source]¶

Dataset used to read serialized examples in torch format. See SerializeMixin for more details.

Parameters

path (str) – The path to the saved dataset. Note that unlike HDF5Dataset, path is a directory rather than a file.
transform (optional, callable) – Transform to be applied to data tensors.
target_transform (optional, callable) – Transform to be applied to label tensors. If given, the loaded dataset must produce
pattern (optional, str) – Pattern of filenames to match.

Window Operations ¶

class combustion.data.Window(before=0, after=0)[source]¶

Helper to apply a window over an iterable or set of indices.

Parameters

before (int, optional) – The number of prior elements to include in the window.
after (int, optional) – The number of proceeding elements to include in the window.

estimate_size(num_frames)[source]¶

Given a number of examples in the un-windowed input, estimate the number of examples in the windowed result.

Parameters: num_frames (int) – The number of frames in the un-windowed dataset.
Returns: Estimated number of frames in the windowed output.
Return type: int

abstract indices(pos)[source]¶

Given an index pos, return a tuple of indices that are part of the window centered at pos.

Parameters: pos (int) – The index of the window center
Returns: A tuple of ints giving the indices of a window centered at pos.
Return type: Tuple[int, ..]

class combustion.data.DenseWindow(before=0, after=0)[source]¶

Helper to apply a dense window over an iterable or set of indices. A dense window includes all indices from center-before to center+after. For a window that includes only frames (center-before, center, center+after), see SparseWindow.

Parameters

before (int, optional) – The number of prior elements to include in the window.
after (int, optional) – The number of proceeding elements to include in the window.

indices(pos)[source]¶

Given an index pos, return a tuple of indices that are part of the window centered at pos.

Parameters: pos (int) – The index of the window center
Returns: A tuple of ints giving the indices of a window centered at pos.
Return type: Tuple[int, ..]

class combustion.data.SparseWindow(before=0, after=0)[source]¶

Helper to apply a sparse window over an iterable or set of indices. A sparse window only includes frames (center-before, center, center+after). For a window that includes all indices from center-before to center+after, see DenseWindow

Parameters

before (int, optional) – The number of prior elements to include in the window.
after (int, optional) – The number of proceeding elements to include in the window.

indices(pos)[source]¶

Given an index pos, return a tuple of indices that are part of the window centered at pos.

Parameters: pos (int) – The index of the window center
Returns: A tuple of ints giving the indices of a window centered at pos.
Return type: Tuple[int, ..]

combustion.data¶

Saving and Loading ¶

Window Operations ¶

Docs

Tutorials

Resources

combustion.data¶

Saving and Loading¶

Window Operations¶

Docs

Tutorials

Resources

Saving and Loading ¶

Window Operations ¶