combustion.data¶
PLACEHOLDER
combustion.data
Saving and Loading¶
-
combustion.data.
save_hdf5
(dataset, path, num_shards=None, shard_size=None, verbose=True)[source]¶ Saves the contents of the dataset to one or more HDF5 files.
- Serialization is performed as follows:
Dataset partitions are determined if required by
num_shards
orshard_size
. By default, only a single file containing the entire dataset will be produced.Examples are read by iterating over the dataset and are written to disk. For multiple shards, a shard index is added to the filename given in
path
.Attributes accessible by
vars(self)
are attached as HDF5 attributes, allowing for loading of instance variables. Tensors are not saved in this way, as all attributes should be small.
Note
Serialization requires the h5py library. See http://docs.h5py.org/en/stable/index.html for more details.
Note
When saving multiple shards, the file created at
path
will be created from ah5py.VirtualSource
. See Virtual Dataset for more details.- Parameters
dataset (Datset) – The dataset to save.
path (str) – The filepath to save to. Ex
foo/bar.h5
.num_shards (int, optional) – If given, num_shards files will be created, each containing
1 / num_shards
of the dataset. Exclusive withshard_size
. Must be a positive int.shard_size (int, optional) – If given, multiple files will be created such that each file contains
shard_size
examples. Exclusive withnum_shards
. Must be a positive int.verbose (bool, optional) – If False, do not print progress updates during saving.
- Return type
-
combustion.data.
save_torch
(dataset, path, prefix='example_', verbose=True)[source]¶ Saves the contents of the dataset to multiple files using
torch.save()
.Note
This is less elegant than HDF5 serialization, but is a thread safe alternative.
- Parameters
- Return type
-
class
combustion.data.
SerializeMixin
[source]¶ Mixin to enable serialization a map or iterable style dataset to disk in HDF5 or Torch file format.
-
static
load
(path, fmt=None, transform=None, target_transform=None, **kwargs)[source]¶ Loads the contents of a dataset previously saved with save(), returning a
HDF5Dataset
.Warning
Using HDF5 in a parallel / multithreaded manner poses additional challenges that have not yet been overcome. As such, using a
HDF5Dataset
withtorch.utils.data.DataLoader
whennum_workers > 1
will yield incorrect data. For in situations where multiple threads will be used, prefer saving withfmt="torch"
. See Parallel HDF5 for more details.Note
Loading HDF5 files requires the h5py library. See http://docs.h5py.org/en/stable/index.html for more details.
Note
Dataset attributes are preserved when loading a HDF5 file, but not a Torch file.
- Parameters
path (str) – The filepath to load from. See HDF5Dataset.load() for more details
fmt (str, optional) – The expected type of data to load. By default the data type is inferred from the file extensions found in
path
. HDF5 files are matched by the.h5
extension, and Torch files are matched by the.pth
extension. If a mix ofhdf5
andpth
files are present inpath
,fmt
can be used to ensure only the desired file types are loaded.transform (callable, optional) – A tranform to be applied to the data tensor See HDF5Dataset for more details
target_transform (callable, optional) – A tranform to be applied to the label tensor See HDF5Dataset for more details
**kwargs – Forwarded to the constructors for
HDF5Dataset
orTorchDataset
, depending on what dataset is constructed.
- Return type
combustion.data.serialize.HDF5Dataset
-
save
(path, fmt='hdf5', num_shards=None, shard_size=None, prefix='example_', verbose=True)[source]¶ Saves the contents of the dataset to disk. See
save_hdf5()
andsave_torch()
respectively for more information on how saving functions for HDF5 or Torch files.Note
Serialization requires the h5py library. See http://docs.h5py.org/en/stable/index.html for more details.
- Parameters
path (str) – The filepath to save to. Ex foo/bar.h5
fmt (str, optional) – The format to save in. Should be one of
hdf5
,torch
.num_shards (int, optional) – If given, num_shards files will be created, each containing
1 / num_shards
of the dataset. Exclusive withshard_size
. Must be a positive int. Only has an effect whenfmt
is"hdf5"
.shard_size (int, optional) – If given, multiple files will be created such that each file contains
shard_size
examples. Exclusive withnum_shards
. Must be a positive int. Only has an effect whenfmt
is"hdf5"
.prefix (str, optional) – Passted to
save_torch()
iffmt
is"hdf5"
verbose (bool, optional) – If False, do not print progress updates during saving.
- Return type
-
static
-
class
combustion.data.
HDF5Dataset
(path, transform=None, target_transform=None)[source]¶ Dataset used to read from HDF5 files. See
SerializeMixin
for more detailsNote
Requires the h5py library. See http://docs.h5py.org/en/stable/index.html for more details.
Note
This class is intended for use with HDF5 files produced by Combustion’s save methods. It may work with other HDF5 files, but this has not been verified yet.
- Parameters
path (str) – The filepath to load from. When loading a sharded dataset, path should point to the virtual dataset master file. Ex
"foo/bar.h5"
transform (optional, callable) – Transform to be applied to data tensors.
target_transform (optional, callable) – Transform to be applied to label tensors. If given, the loaded dataset must produce
-
class
combustion.data.
TorchDataset
(path, transform=None, target_transform=None, pattern='*.pth')[source]¶ Dataset used to read serialized examples in torch format. See
SerializeMixin
for more details.- Parameters
path (str) – The path to the saved dataset. Note that unlike
HDF5Dataset
,path
is a directory rather than a file.transform (optional, callable) – Transform to be applied to data tensors.
target_transform (optional, callable) – Transform to be applied to label tensors. If given, the loaded dataset must produce
pattern (optional, str) – Pattern of filenames to match.
Window Operations¶
-
class
combustion.data.
Window
(before=0, after=0)[source]¶ Helper to apply a window over an iterable or set of indices.
- Parameters
-
class
combustion.data.
DenseWindow
(before=0, after=0)[source]¶ Helper to apply a dense window over an iterable or set of indices. A dense window includes all indices from
center-before
tocenter+after
. For a window that includes only frames (center-before
,center
,center+after
), see SparseWindow.- Parameters
-
class
combustion.data.
SparseWindow
(before=0, after=0)[source]¶ Helper to apply a sparse window over an iterable or set of indices. A sparse window only includes frames (
center-before
,center
,center+after
). For a window that includes all indices fromcenter-before
tocenter+after
, seeDenseWindow
- Parameters