mass_automation.formula package#
Submodules#
mass_automation.formula.augmentations module#
- class mass_automation.formula.augmentations.Add(delta)#
Bases:
object
Adds uniformly sampled intensity shift
- class mass_automation.formula.augmentations.AugmentationWrapper(*args)#
Bases:
object
Wraps the set of augmentations
- class mass_automation.formula.augmentations.DroppingAug(threshold, probability)#
Bases:
object
Drops a subspectrum with certain probability
- class mass_automation.formula.augmentations.RandomNoise(sigma)#
Bases:
object
Adds random noise to the subspectrum
- class mass_automation.formula.augmentations.Scale(delta)#
Bases:
object
Scales the subspectrum randomly
- class mass_automation.formula.augmentations.Shift(delta)#
Bases:
object
Shifts subspectra; shift sampled uniformly from delta
mass_automation.formula.check_formula module#
- mass_automation.formula.check_formula.add_new_peak(spectrum, delta, vector, teor_vector, dist_error, distance=1)#
Adds best peak’s mass and intensity to the array of registered masses and intensities respectively.
- Parameters:
spectrum (Spectrum) – The spectrum, where function gets peak candidates and chooses the best one to add to the resulting vector.
delta (float) – The distance between the last mass in the vector of masses and the center of the slice, where get_peak_candidates tries to find peaks.
vector (np.ndarray) – The 2D-array, where the first array is the vector of masses, where the function should add the best mass and the second array id the array of intensities, where the function should add the best intensity.
teor_vector (np.array) – The theoretical vector of intensities, which best_peak uses to count the cosine distance.
dist_error (float) – The possible error in distance between the peaks, characterizes the radius of the second and the next peak’s vicinity.
distance (int) – The parameter, used in peak finding algorithm. (default is 1).
- Returns:
np.ndarray
The 2D numpy array of masses and intensities with the best chosen last mass and intensity respectively
and indicator of matching the peak.
- mass_automation.formula.check_formula.best_peak(pos_peak_masses, pos_peak_ints, teor_vector, ints_vector)#
The function chooses the best peak to add by counting the cosine distance between the theoretical vector of intensities and the vector of intensities with the registered peak.
- Parameters:
pos_peak_masses (np.array) – The array of possible best peak masses.
pos_peak_ints (np.array) – The array of possible best peak intensities.
teor_vector (np.array) – The theoretical vector of intensities, which function uses to count the cosine distance.
ints_vector (np.array) – The vector of previous intensities. By adding possible best peak intensity to the ints_vector, the function gets the new possible intensities vector and counts the cosine distance between it and the theoretical vector.
- Returns:
Tuple[float, float]
The tuple contains the best peak’s mass and the best peak’s intensity.
- mass_automation.formula.check_formula.check_presence(spectrum, formula, cal_error=0.006, dist_error=0.003, distance=50, max_peaks=5)#
Calculates the cosine distance between the peaks of the theoretical isotope distribution and the peaks in their confidence intervals.
- Parameters:
spectrum (Spectrum) – The spectrum, where algorithm tries to detect the substance.
formula (Formula) – The Formula class of the substance.
cal_error (float) – The radius of the first peak’ vicinity (default is 0.006).
dist_error (float) – The possible error in distance between the peaks, characterizes the radius of the second and the next peak’s vicinity (default is 0.001).
distance (int) – The parameter, used in peak finding algorithm. (default is 50).
max_peaks (int) – Limits the number of possible first peaks (default is 5).
- Returns:
Tuple[float, np.ndarray, np.ndarray, float, float]
Cosine distance, the arrays of possible masses and intensities and matched peaks percentage.
Mass error (in ppm)
- mass_automation.formula.check_formula.del_isotopologues(masses: ndarray, peaks: ndarray, return_ids: Optional[bool] = False) Union[Tuple[ndarray, ndarray], List[int]] #
Delete low intense isotopologues.
- Parameters:
masses (np.ndarray) – The array of masses from the isotopic distribution.
peaks (np.ndarray) – The array of intensities from the isotopic distribution.
return_ids (bool) – Return ids of the most intensive peaks
- Returns:
Contains two numpy arrays with new masses and intensities without the isotopologues or just the ids, depending on the
return_ids
parameter.- Return type:
Union[Tuple[np.ndarray, np.ndarray], List[int]]
- mass_automation.formula.check_formula.get_peak_candidates(spectrum, peak_mass, error, distance=1)#
Peak finding around the given mass.
- Parameters:
spectrum (Spectrum) – The spectrum, where function tries to find peaks.
peak_mass (float) – The given mass, the center of the slice, where function finds peaks
error (float) – The radius of the slice, where function performs.
distance (int) – The parameter, used in peak finding algorithm. (default is 1).
- Returns:
Tuple[np.ndarray, np.ndarray, np.ndarray, bool]
The first element in the tuple is the array of peak masses, the second is the array of peak intensities and
the third is the array of peak indices in the slice array. If function finds no peaks, it returns the value of
the given mass, the median intensity in the slice and the array of indices is numpy.array([0]) Fourth element
indicates, if function finds at least one peak in the spectrum, else not.
mass_automation.formula.data module#
- class mass_automation.formula.data.OriginalFormulaDataset(formulas: List[Formula], formula_converter: Callable, representations: List[str], augmentations: Optional[Callable], is_classifier: bool, **kwargs)#
Bases:
Dataset
Constructs the dataset of model training
- class mass_automation.formula.data.PadSequence#
Bases:
object
Prepares a sequence as an input for LSTM network
- class mass_automation.formula.data.PadSequenceConstant(first_x)#
Bases:
object
Prepares a sequence as an input for MLP
mass_automation.formula.determination module#
mass_automation.formula.model module#
- class mass_automation.formula.model.LSTM(lstm_in_size=16, lstm_hidden_size=128, lstm_num_layers=1, lstm_bidirectional=False, lstm_dropout=0.5, decoder_hidden_size=128, activation=True, loss='MSE', opt='Adam', lr=0.0002, **kwargs)#
Bases:
LightningModule
- configure_optimizers()#
Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you’d need one. But in the case of GANs or similar you might have multiple.
- Returns:
Any of these 6 options.
Single optimizer.
List or Tuple - List of optimizers.
Two lists - The first list has multiple optimizers, the second a list of LR schedulers (or lr_dict).
Dictionary, with an ‘optimizer’ key, and (optionally) a ‘lr_scheduler’ key whose value is a single LR scheduler or lr_dict.
Tuple of dictionaries as described, with an optional ‘frequency’ key.
None - Fit will run without any optimizer.
Note
The ‘frequency’ value is an int corresponding to the number of sequential batches optimized with the specific optimizer. It should be given to none or to all of the optimizers. There is a difference between passing multiple optimizers in a list, and passing multiple optimizers in dictionaries with a frequency of 1: In the former case, all optimizers will operate on the given batch in each optimization step. In the latter, only one optimizer will operate on the given batch at every step.
The lr_dict is a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.
{ 'scheduler': lr_scheduler, # The LR scheduler instance (required) 'interval': 'epoch', # The unit of the scheduler's step size 'frequency': 1, # The frequency of the scheduler 'reduce_on_plateau': False, # For ReduceLROnPlateau scheduler 'monitor': 'val_loss', # Metric for ReduceLROnPlateau to monitor 'strict': True, # Whether to crash the training if `monitor` is not found 'name': None, # Custom name for LearningRateMonitor to use }
Only the
scheduler
key is required, the rest will be set to the defaults above.Examples:
# most cases def configure_optimizers(self): opt = Adam(self.parameters(), lr=1e-3) return opt # multiple optimizer case (e.g.: GAN) def configure_optimizers(self): generator_opt = Adam(self.model_gen.parameters(), lr=0.01) disriminator_opt = Adam(self.model_disc.parameters(), lr=0.02) return generator_opt, disriminator_opt # example with learning rate schedulers def configure_optimizers(self): generator_opt = Adam(self.model_gen.parameters(), lr=0.01) disriminator_opt = Adam(self.model_disc.parameters(), lr=0.02) discriminator_sched = CosineAnnealing(discriminator_opt, T_max=10) return [generator_opt, disriminator_opt], [discriminator_sched] # example with step-based learning rate schedulers def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_disc.parameters(), lr=0.02) gen_sched = {'scheduler': ExponentialLR(gen_opt, 0.99), 'interval': 'step'} # called after each training step dis_sched = CosineAnnealing(discriminator_opt, T_max=10) # called every epoch return [gen_opt, dis_opt], [gen_sched, dis_sched] # example with optimizer frequencies # see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1 # https://arxiv.org/abs/1704.00028 def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_disc.parameters(), lr=0.02) n_critic = 5 return ( {'optimizer': dis_opt, 'frequency': n_critic}, {'optimizer': gen_opt, 'frequency': 1} )
Note
Some things to know:
Lightning calls
.backward()
and.step()
on each optimizer and learning rate scheduler as needed.If you use 16-bit precision (
precision=16
), Lightning will automatically handle the optimizers for you.If you use multiple optimizers,
training_step()
will have an additionaloptimizer_idx
parameter.If you use LBFGS Lightning handles the closure function automatically for you.
If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step.
If you need to control how often those optimizers step or override the default
.step()
schedule, override theoptimizer_step()
hook.If you only want to call a learning rate scheduler every
x
step or epoch, or want to monitor a custom metric, you can specify these in a lr_dict:{ 'scheduler': lr_scheduler, 'interval': 'step', # or 'epoch' 'monitor': 'val_f1', 'frequency': x, }
- forward(x)#
Same as
torch.nn.Module.forward()
, however in Lightning you want this to define the operations you want to use for prediction (i.e.: on a server or as a feature extractor).Normally you’d call
self()
from yourtraining_step()
method. This makes it easy to write a complex system for training with the outputs you’d want in a prediction setting.You may also find the
auto_move_data()
decorator useful when using the module outside Lightning in a production setting.- Parameters:
*args – Whatever you decide to pass into the forward method.
**kwargs – Keyword arguments are also possible.
- Returns:
Predicted output
Examples:
# example if we were using this model as a feature extractor def forward(self, x): feature_maps = self.convnet(x) return feature_maps def training_step(self, batch, batch_idx): x, y = batch feature_maps = self(x) logits = self.classifier(feature_maps) # ... return loss # splitting it this way allows model to be used a feature extractor model = MyModelAbove() inputs = server.get_request() results = model(inputs) server.write_results(results) # ------------- # This is in stark contrast to torch.nn.Module where normally you would have this: def forward(self, batch): x, y = batch feature_maps = self.convnet(x) logits = self.classifier(feature_maps) return logits
- test_epoch_end(outputs)#
Called at the end of a test epoch with the output of all test steps.
# the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs)
- Parameters:
outputs – List of outputs you defined in
test_step_end()
, or if there are multiple dataloaders, a list containing a list of outputs for each dataloader- Returns:
None
Note
If you didn’t define a
test_step()
, this won’t be called.Examples
With a single dataloader:
def test_epoch_end(self, outputs): # do something with the outputs of all test batches all_test_preds = test_step_outputs.predictions some_result = calc_all_results(all_test_preds) self.log(some_result)
With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each test step for that dataloader.
def test_epoch_end(self, outputs): final_value = 0 for dataloader_outputs in outputs: for test_step_out in dataloader_outputs: # do something final_value += test_step_out self.log('final_metric', final_value)
- test_step(x, batch_id)#
Operates on a single batch of data from the test set. In this step you’d normally generate examples or calculate anything of interest such as accuracy.
# the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs)
- Parameters:
batch (
Tensor
| (Tensor
, …) | [Tensor
, …]) – The output of yourDataLoader
. A tensor, tuple or list.batch_idx (int) – The index of this batch.
dataloader_idx (int) – The index of the dataloader that produced this batch (only if multiple test dataloaders used).
- Returns:
Any of.
Any object or value
None
- Testing will skip to the next batch
# if you have one test dataloader: def test_step(self, batch, batch_idx) # if you have multiple test dataloaders: def test_step(self, batch, batch_idx, dataloader_idx)
Examples:
# CASE 1: A single test dataset def test_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) test_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'test_loss': loss, 'test_acc': test_acc})
If you pass in multiple test dataloaders,
test_step()
will have an additional argument.# CASE 2: multiple test dataloaders def test_step(self, batch, batch_idx, dataloader_idx): # dataloader_idx tells you which dataset this is.
Note
If you don’t need to test you don’t need to implement this method.
Note
When the
test_step()
is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of the test epoch, the model goes back to training mode and gradients are enabled.
- train_epoch_end(outputs)#
- training: bool#
- training_step(x, batch_id)#
Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger.
- Parameters:
batch (
Tensor
| (Tensor
, …) | [Tensor
, …]) – The output of yourDataLoader
. A tensor, tuple or list.batch_idx (int) – Integer displaying index of this batch
optimizer_idx (int) – When using multiple optimizers, this argument will also be present.
hiddens (
Tensor
) – Passed in if :paramref:`~pytorch_lightning.trainer.trainer.Trainer.truncated_bptt_steps` > 0.
- Returns:
Any of.
Tensor
- The loss tensordict
- A dictionary. Can include any keys, but must include the key'loss'
None
- Training will skip to the next batch
Note
Returning
None
is currently not supported for multi-GPU or TPU, or with 16-bit precision enabled.In this step you’d normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific.
Example:
def training_step(self, batch, batch_idx): x, y, z = batch out = self.encoder(x) loss = self.loss(out, x) return loss
If you define multiple optimizers, this step will be called with an additional
optimizer_idx
parameter.# Multiple optimizers (e.g.: GANs) def training_step(self, batch, batch_idx, optimizer_idx): if optimizer_idx == 0: # do training_step with encoder if optimizer_idx == 1: # do training_step with decoder
If you add truncated back propagation through time you will also get an additional argument with the hidden states of the previous step.
# Truncated back-propagation through time def training_step(self, batch, batch_idx, hiddens): # hiddens are the hidden states from the previous truncated backprop step ... out, hiddens = self.lstm(data, hiddens) ... return {'loss': loss, 'hiddens': hiddens}
Note
The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step.
- validation_epoch_end(outputs)#
Called at the end of the validation epoch with the outputs of all validation steps.
# the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs)
- Parameters:
outputs – List of outputs you defined in
validation_step()
, or if there are multiple dataloaders, a list containing a list of outputs for each dataloader.- Returns:
None
Note
If you didn’t define a
validation_step()
, this won’t be called.Examples
With a single dataloader:
def validation_epoch_end(self, val_step_outputs): for out in val_step_outputs: # do something
With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each validation step for that dataloader.
def validation_epoch_end(self, outputs): for dataloader_output_result in outputs: dataloader_outs = dataloader_output_result.dataloader_i_outputs self.log('final_metric', final_value)
- validation_step(x, batch_id)#
Operates on a single batch of data from the validation set. In this step you’d might generate examples or calculate anything of interest like accuracy.
# the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs)
- Parameters:
batch (
Tensor
| (Tensor
, …) | [Tensor
, …]) – The output of yourDataLoader
. A tensor, tuple or list.batch_idx (int) – The index of this batch
dataloader_idx (int) – The index of the dataloader that produced this batch (only if multiple val dataloaders used)
- Returns:
Any of.
Any object or value
None
- Validation will skip to the next batch
# pseudocode of order val_outs = [] for val_batch in val_data: out = validation_step(val_batch) if defined('validation_step_end'): out = validation_step_end(out) val_outs.append(out) val_outs = validation_epoch_end(val_outs)
# if you have one val dataloader: def validation_step(self, batch, batch_idx) # if you have multiple val dataloaders: def validation_step(self, batch, batch_idx, dataloader_idx)
Examples:
# CASE 1: A single validation dataset def validation_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'val_loss': loss, 'val_acc': val_acc})
If you pass in multiple val dataloaders,
validation_step()
will have an additional argument.# CASE 2: multiple validation dataloaders def validation_step(self, batch, batch_idx, dataloader_idx): # dataloader_idx tells you which dataset this is.
Note
If you don’t need to validate you don’t need to implement this method.
Note
When the
validation_step()
is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of validation, the model goes back to training mode and gradients are enabled.
- class mass_automation.formula.model.LinearWithHidden(in_size, hidden_size, out_size, activation=True, dropout=True)#
Bases:
Module
- forward(spectra)#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- class mass_automation.formula.model.MLP(in_size=100, hidden_size=50, activation=True, loss='MSE', opt='Adam', lr=0.0002, **kwargs)#
Bases:
LightningModule
- configure_optimizers()#
Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you’d need one. But in the case of GANs or similar you might have multiple.
- Returns:
Any of these 6 options.
Single optimizer.
List or Tuple - List of optimizers.
Two lists - The first list has multiple optimizers, the second a list of LR schedulers (or lr_dict).
Dictionary, with an ‘optimizer’ key, and (optionally) a ‘lr_scheduler’ key whose value is a single LR scheduler or lr_dict.
Tuple of dictionaries as described, with an optional ‘frequency’ key.
None - Fit will run without any optimizer.
Note
The ‘frequency’ value is an int corresponding to the number of sequential batches optimized with the specific optimizer. It should be given to none or to all of the optimizers. There is a difference between passing multiple optimizers in a list, and passing multiple optimizers in dictionaries with a frequency of 1: In the former case, all optimizers will operate on the given batch in each optimization step. In the latter, only one optimizer will operate on the given batch at every step.
The lr_dict is a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.
{ 'scheduler': lr_scheduler, # The LR scheduler instance (required) 'interval': 'epoch', # The unit of the scheduler's step size 'frequency': 1, # The frequency of the scheduler 'reduce_on_plateau': False, # For ReduceLROnPlateau scheduler 'monitor': 'val_loss', # Metric for ReduceLROnPlateau to monitor 'strict': True, # Whether to crash the training if `monitor` is not found 'name': None, # Custom name for LearningRateMonitor to use }
Only the
scheduler
key is required, the rest will be set to the defaults above.Examples:
# most cases def configure_optimizers(self): opt = Adam(self.parameters(), lr=1e-3) return opt # multiple optimizer case (e.g.: GAN) def configure_optimizers(self): generator_opt = Adam(self.model_gen.parameters(), lr=0.01) disriminator_opt = Adam(self.model_disc.parameters(), lr=0.02) return generator_opt, disriminator_opt # example with learning rate schedulers def configure_optimizers(self): generator_opt = Adam(self.model_gen.parameters(), lr=0.01) disriminator_opt = Adam(self.model_disc.parameters(), lr=0.02) discriminator_sched = CosineAnnealing(discriminator_opt, T_max=10) return [generator_opt, disriminator_opt], [discriminator_sched] # example with step-based learning rate schedulers def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_disc.parameters(), lr=0.02) gen_sched = {'scheduler': ExponentialLR(gen_opt, 0.99), 'interval': 'step'} # called after each training step dis_sched = CosineAnnealing(discriminator_opt, T_max=10) # called every epoch return [gen_opt, dis_opt], [gen_sched, dis_sched] # example with optimizer frequencies # see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1 # https://arxiv.org/abs/1704.00028 def configure_optimizers(self): gen_opt = Adam(self.model_gen.parameters(), lr=0.01) dis_opt = Adam(self.model_disc.parameters(), lr=0.02) n_critic = 5 return ( {'optimizer': dis_opt, 'frequency': n_critic}, {'optimizer': gen_opt, 'frequency': 1} )
Note
Some things to know:
Lightning calls
.backward()
and.step()
on each optimizer and learning rate scheduler as needed.If you use 16-bit precision (
precision=16
), Lightning will automatically handle the optimizers for you.If you use multiple optimizers,
training_step()
will have an additionaloptimizer_idx
parameter.If you use LBFGS Lightning handles the closure function automatically for you.
If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step.
If you need to control how often those optimizers step or override the default
.step()
schedule, override theoptimizer_step()
hook.If you only want to call a learning rate scheduler every
x
step or epoch, or want to monitor a custom metric, you can specify these in a lr_dict:{ 'scheduler': lr_scheduler, 'interval': 'step', # or 'epoch' 'monitor': 'val_f1', 'frequency': x, }
- forward(x)#
Same as
torch.nn.Module.forward()
, however in Lightning you want this to define the operations you want to use for prediction (i.e.: on a server or as a feature extractor).Normally you’d call
self()
from yourtraining_step()
method. This makes it easy to write a complex system for training with the outputs you’d want in a prediction setting.You may also find the
auto_move_data()
decorator useful when using the module outside Lightning in a production setting.- Parameters:
*args – Whatever you decide to pass into the forward method.
**kwargs – Keyword arguments are also possible.
- Returns:
Predicted output
Examples:
# example if we were using this model as a feature extractor def forward(self, x): feature_maps = self.convnet(x) return feature_maps def training_step(self, batch, batch_idx): x, y = batch feature_maps = self(x) logits = self.classifier(feature_maps) # ... return loss # splitting it this way allows model to be used a feature extractor model = MyModelAbove() inputs = server.get_request() results = model(inputs) server.write_results(results) # ------------- # This is in stark contrast to torch.nn.Module where normally you would have this: def forward(self, batch): x, y = batch feature_maps = self.convnet(x) logits = self.classifier(feature_maps) return logits
- test_epoch_end(outputs)#
Called at the end of a test epoch with the output of all test steps.
# the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs)
- Parameters:
outputs – List of outputs you defined in
test_step_end()
, or if there are multiple dataloaders, a list containing a list of outputs for each dataloader- Returns:
None
Note
If you didn’t define a
test_step()
, this won’t be called.Examples
With a single dataloader:
def test_epoch_end(self, outputs): # do something with the outputs of all test batches all_test_preds = test_step_outputs.predictions some_result = calc_all_results(all_test_preds) self.log(some_result)
With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each test step for that dataloader.
def test_epoch_end(self, outputs): final_value = 0 for dataloader_outputs in outputs: for test_step_out in dataloader_outputs: # do something final_value += test_step_out self.log('final_metric', final_value)
- test_step(x, batch_id)#
Operates on a single batch of data from the test set. In this step you’d normally generate examples or calculate anything of interest such as accuracy.
# the pseudocode for these calls test_outs = [] for test_batch in test_data: out = test_step(test_batch) test_outs.append(out) test_epoch_end(test_outs)
- Parameters:
batch (
Tensor
| (Tensor
, …) | [Tensor
, …]) – The output of yourDataLoader
. A tensor, tuple or list.batch_idx (int) – The index of this batch.
dataloader_idx (int) – The index of the dataloader that produced this batch (only if multiple test dataloaders used).
- Returns:
Any of.
Any object or value
None
- Testing will skip to the next batch
# if you have one test dataloader: def test_step(self, batch, batch_idx) # if you have multiple test dataloaders: def test_step(self, batch, batch_idx, dataloader_idx)
Examples:
# CASE 1: A single test dataset def test_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) test_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'test_loss': loss, 'test_acc': test_acc})
If you pass in multiple test dataloaders,
test_step()
will have an additional argument.# CASE 2: multiple test dataloaders def test_step(self, batch, batch_idx, dataloader_idx): # dataloader_idx tells you which dataset this is.
Note
If you don’t need to test you don’t need to implement this method.
Note
When the
test_step()
is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of the test epoch, the model goes back to training mode and gradients are enabled.
- train_epoch_end(outputs)#
- training: bool#
- training_step(x, batch_id)#
Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger.
- Parameters:
batch (
Tensor
| (Tensor
, …) | [Tensor
, …]) – The output of yourDataLoader
. A tensor, tuple or list.batch_idx (int) – Integer displaying index of this batch
optimizer_idx (int) – When using multiple optimizers, this argument will also be present.
hiddens (
Tensor
) – Passed in if :paramref:`~pytorch_lightning.trainer.trainer.Trainer.truncated_bptt_steps` > 0.
- Returns:
Any of.
Tensor
- The loss tensordict
- A dictionary. Can include any keys, but must include the key'loss'
None
- Training will skip to the next batch
Note
Returning
None
is currently not supported for multi-GPU or TPU, or with 16-bit precision enabled.In this step you’d normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific.
Example:
def training_step(self, batch, batch_idx): x, y, z = batch out = self.encoder(x) loss = self.loss(out, x) return loss
If you define multiple optimizers, this step will be called with an additional
optimizer_idx
parameter.# Multiple optimizers (e.g.: GANs) def training_step(self, batch, batch_idx, optimizer_idx): if optimizer_idx == 0: # do training_step with encoder if optimizer_idx == 1: # do training_step with decoder
If you add truncated back propagation through time you will also get an additional argument with the hidden states of the previous step.
# Truncated back-propagation through time def training_step(self, batch, batch_idx, hiddens): # hiddens are the hidden states from the previous truncated backprop step ... out, hiddens = self.lstm(data, hiddens) ... return {'loss': loss, 'hiddens': hiddens}
Note
The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step.
- validation_epoch_end(outputs)#
Called at the end of the validation epoch with the outputs of all validation steps.
# the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs)
- Parameters:
outputs – List of outputs you defined in
validation_step()
, or if there are multiple dataloaders, a list containing a list of outputs for each dataloader.- Returns:
None
Note
If you didn’t define a
validation_step()
, this won’t be called.Examples
With a single dataloader:
def validation_epoch_end(self, val_step_outputs): for out in val_step_outputs: # do something
With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each validation step for that dataloader.
def validation_epoch_end(self, outputs): for dataloader_output_result in outputs: dataloader_outs = dataloader_output_result.dataloader_i_outputs self.log('final_metric', final_value)
- validation_step(x, batch_id)#
Operates on a single batch of data from the validation set. In this step you’d might generate examples or calculate anything of interest like accuracy.
# the pseudocode for these calls val_outs = [] for val_batch in val_data: out = validation_step(val_batch) val_outs.append(out) validation_epoch_end(val_outs)
- Parameters:
batch (
Tensor
| (Tensor
, …) | [Tensor
, …]) – The output of yourDataLoader
. A tensor, tuple or list.batch_idx (int) – The index of this batch
dataloader_idx (int) – The index of the dataloader that produced this batch (only if multiple val dataloaders used)
- Returns:
Any of.
Any object or value
None
- Validation will skip to the next batch
# pseudocode of order val_outs = [] for val_batch in val_data: out = validation_step(val_batch) if defined('validation_step_end'): out = validation_step_end(out) val_outs.append(out) val_outs = validation_epoch_end(val_outs)
# if you have one val dataloader: def validation_step(self, batch, batch_idx) # if you have multiple val dataloaders: def validation_step(self, batch, batch_idx, dataloader_idx)
Examples:
# CASE 1: A single validation dataset def validation_step(self, batch, batch_idx): x, y = batch # implement your own out = self(x) loss = self.loss(out, y) # log 6 example images # or generated text... or whatever sample_imgs = x[:6] grid = torchvision.utils.make_grid(sample_imgs) self.logger.experiment.add_image('example_images', grid, 0) # calculate acc labels_hat = torch.argmax(out, dim=1) val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0) # log the outputs! self.log_dict({'val_loss': loss, 'val_acc': val_acc})
If you pass in multiple val dataloaders,
validation_step()
will have an additional argument.# CASE 2: multiple validation dataloaders def validation_step(self, batch, batch_idx, dataloader_idx): # dataloader_idx tells you which dataset this is.
Note
If you don’t need to validate you don’t need to implement this method.
Note
When the
validation_step()
is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of validation, the model goes back to training mode and gradients are enabled.
- class mass_automation.formula.model.WeightedMSELoss#
Bases:
Module
- forward(y, y_hat)#
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- training: bool#
- mass_automation.formula.model.log_elementwise(model, metric, prefix)#
mass_automation.formula.plot module#
- mass_automation.formula.plot.plot_compare(spectrum: Spectrum, formula: Formula, cal_error=0.006, dist_error=0.003, distance=50, max_peaks=5, path=None, return_masses=False, show=True)#
Spectra comparison visualization.
- Parameters:
spectrum (Spectrum) – The spectrum, where algorithm tries to detect the substance.
formula (Formula) – The Formula class of the substance.
cal_error (float) – The radius of the first peak’ vicinity (default is 0.006).
dist_error (float) – The possible error in distance between the peaks, characterizes the radius of the second and the next peak’s vicinity (default is 0.001).
distance (int) – The parameter, used in peak finding algorithm. (default is 50).
max_peaks (int) – Limits the number of possible first peaks (default is 5).
path (str) – Saves plotting figure in png with dpi = 300.
Module contents#
- class mass_automation.formula.Formula(formula: Union[str, dict], charge=None)#
Bases:
object
A class used to represent Formula.
- str_formula#
The molecular formula of the substance. Example: ‘C2H5OH’.
- Type:
str
- dict_formula#
The dictionary, which corresponds to the formula. Example:
{ 'C' : 2, 'H' : 6, 'O' : 1 }
- Type:
dict
- formula#
Dictionary or string, depending on the argument in __init__.
- Type:
str or dict
- charge#
The charge of the molecule.
- Type:
int
- monoisotopic_mass#
The mass of the monoisotopic peak.
- Type:
float
- isodistribution(side_threshold=0.001) Tuple[ndarray, ndarray] #
Creates isotopic distribution.
Gives back a tuple with masses and peaks of the theoretical isotopic distribution.
- Parameters:
side_threshold (float) – Minimal required relative intensity of side isotopologues to the most intensive. (default is 0.001)
- Return type:
Tuple[np.ndarray, np.ndarray]
- Raises:
ValueError – Raises when isotopic distribution can not be constructed because of specific formula names.
- vector()#
Converts formula into a vector of quantities with length of number elements in the periodic table
Attention: may not work for elements like Fl, as pyteomics does not recognize them.
- Returns:
Resulting vector
- Return type:
np.ndarray
- class mass_automation.formula.RealIsotopicDistribution(spectrum: Spectrum, peak_indices: List[int])#
Bases:
object
- get_monoisotopic() int #
Find monoisotopic peak
Currently “monoisotopic” is the first one. THAT IS NO TRUE FOR SOME ELEMENTS. Rewrite
get_representation
method accordingly if changing- Returns:
Index to the monoisotopic peak
- Return type:
int
- get_representation(delta: ~typing.Optional[float] = 0.025, length: ~typing.Optional[int] = 100, f: ~typing.Optional[~typing.Callable] = <function amax>, mode='middle', vectorization_method='simple', sigma: ~typing.Optional[float] = None) List[Tuple[ndarray, float]] #
Calculates vector representation for the isotopic distribution
- Parameters:
delta (float) – Size of peak vicinity
length (int) – Number of items in resulting feature vector¬
mode (str) –
Mode of vectorization. One of the following:
middle
— the most intensive in the middlemonoisotopic
— some part of the spectrum left to the monoisotopic peak, incremented, by neuron mass
vectorization_method (str) –
Method of vectorization. One of the folloring:
simple
convolution
sigma (float) – A parameter for the vectorization method
- Returns:
Representations for each peak
- Return type:
List[Tuple[np.ndarray, float]]