mass_automation.formula package#

Submodules#

mass_automation.formula.augmentations module#

class mass_automation.formula.augmentations.Add(delta)#

Bases: object

Adds uniformly sampled intensity shift

class mass_automation.formula.augmentations.AugmentationWrapper(*args)#

Bases: object

Wraps the set of augmentations

class mass_automation.formula.augmentations.DroppingAug(threshold, probability)#

Bases: object

Drops a subspectrum with certain probability

class mass_automation.formula.augmentations.RandomNoise(sigma)#

Bases: object

Adds random noise to the subspectrum

class mass_automation.formula.augmentations.Scale(delta)#

Bases: object

Scales the subspectrum randomly

class mass_automation.formula.augmentations.Shift(delta)#

Bases: object

Shifts subspectra; shift sampled uniformly from delta

mass_automation.formula.check_formula module#

mass_automation.formula.check_formula.add_new_peak(spectrum, delta, vector, teor_vector, dist_error, distance=1)#

Adds best peak’s mass and intensity to the array of registered masses and intensities respectively.

Parameters:
  • spectrum (Spectrum) – The spectrum, where function gets peak candidates and chooses the best one to add to the resulting vector.

  • delta (float) – The distance between the last mass in the vector of masses and the center of the slice, where get_peak_candidates tries to find peaks.

  • vector (np.ndarray) – The 2D-array, where the first array is the vector of masses, where the function should add the best mass and the second array id the array of intensities, where the function should add the best intensity.

  • teor_vector (np.array) – The theoretical vector of intensities, which best_peak uses to count the cosine distance.

  • dist_error (float) – The possible error in distance between the peaks, characterizes the radius of the second and the next peak’s vicinity.

  • distance (int) – The parameter, used in peak finding algorithm. (default is 1).

Returns:

  • np.ndarray

  • The 2D numpy array of masses and intensities with the best chosen last mass and intensity respectively

  • and indicator of matching the peak.

mass_automation.formula.check_formula.best_peak(pos_peak_masses, pos_peak_ints, teor_vector, ints_vector)#

The function chooses the best peak to add by counting the cosine distance between the theoretical vector of intensities and the vector of intensities with the registered peak.

Parameters:
  • pos_peak_masses (np.array) – The array of possible best peak masses.

  • pos_peak_ints (np.array) – The array of possible best peak intensities.

  • teor_vector (np.array) – The theoretical vector of intensities, which function uses to count the cosine distance.

  • ints_vector (np.array) – The vector of previous intensities. By adding possible best peak intensity to the ints_vector, the function gets the new possible intensities vector and counts the cosine distance between it and the theoretical vector.

Returns:

  • Tuple[float, float]

  • The tuple contains the best peak’s mass and the best peak’s intensity.

mass_automation.formula.check_formula.check_presence(spectrum, formula, cal_error=0.006, dist_error=0.003, distance=50, max_peaks=5)#

Calculates the cosine distance between the peaks of the theoretical isotope distribution and the peaks in their confidence intervals.

Parameters:
  • spectrum (Spectrum) – The spectrum, where algorithm tries to detect the substance.

  • formula (Formula) – The Formula class of the substance.

  • cal_error (float) – The radius of the first peak’ vicinity (default is 0.006).

  • dist_error (float) – The possible error in distance between the peaks, characterizes the radius of the second and the next peak’s vicinity (default is 0.001).

  • distance (int) – The parameter, used in peak finding algorithm. (default is 50).

  • max_peaks (int) – Limits the number of possible first peaks (default is 5).

Returns:

  • Tuple[float, np.ndarray, np.ndarray, float, float]

  • Cosine distance, the arrays of possible masses and intensities and matched peaks percentage.

  • Mass error (in ppm)

mass_automation.formula.check_formula.del_isotopologues(masses: ndarray, peaks: ndarray, return_ids: Optional[bool] = False) Union[Tuple[ndarray, ndarray], List[int]]#

Delete low intense isotopologues.

Parameters:
  • masses (np.ndarray) – The array of masses from the isotopic distribution.

  • peaks (np.ndarray) – The array of intensities from the isotopic distribution.

  • return_ids (bool) – Return ids of the most intensive peaks

Returns:

Contains two numpy arrays with new masses and intensities without the isotopologues or just the ids, depending on the return_ids parameter.

Return type:

Union[Tuple[np.ndarray, np.ndarray], List[int]]

mass_automation.formula.check_formula.get_peak_candidates(spectrum, peak_mass, error, distance=1)#

Peak finding around the given mass.

Parameters:
  • spectrum (Spectrum) – The spectrum, where function tries to find peaks.

  • peak_mass (float) – The given mass, the center of the slice, where function finds peaks

  • error (float) – The radius of the slice, where function performs.

  • distance (int) – The parameter, used in peak finding algorithm. (default is 1).

Returns:

  • Tuple[np.ndarray, np.ndarray, np.ndarray, bool]

  • The first element in the tuple is the array of peak masses, the second is the array of peak intensities and

  • the third is the array of peak indices in the slice array. If function finds no peaks, it returns the value of

  • the given mass, the median intensity in the slice and the array of indices is numpy.array([0]) Fourth element

  • indicates, if function finds at least one peak in the spectrum, else not.

mass_automation.formula.data module#

class mass_automation.formula.data.OriginalFormulaDataset(formulas: List[Formula], formula_converter: Callable, representations: List[str], augmentations: Optional[Callable], is_classifier: bool, **kwargs)#

Bases: Dataset

Constructs the dataset of model training

class mass_automation.formula.data.PadSequence#

Bases: object

Prepares a sequence as an input for LSTM network

class mass_automation.formula.data.PadSequenceConstant(first_x)#

Bases: object

Prepares a sequence as an input for MLP

mass_automation.formula.data.formula2element(formula: Formula, element_id: int) bool#
mass_automation.formula.data.formula2element_amount(formula: Formula, element_id: int) int#
mass_automation.formula.data.formula2vector(formula: Formula, normalizer: str)#
mass_automation.formula.data.generate_fake_representation(formula: Formula, **kwargs)#

Simulates isotopic distribution and constructs pseudo-spectrum

mass_automation.formula.determination module#

mass_automation.formula.model module#

class mass_automation.formula.model.LSTM(lstm_in_size=16, lstm_hidden_size=128, lstm_num_layers=1, lstm_bidirectional=False, lstm_dropout=0.5, decoder_hidden_size=128, activation=True, loss='MSE', opt='Adam', lr=0.0002, **kwargs)#

Bases: LightningModule

configure_optimizers()#

Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you’d need one. But in the case of GANs or similar you might have multiple.

Returns:

Any of these 6 options.

  • Single optimizer.

  • List or Tuple - List of optimizers.

  • Two lists - The first list has multiple optimizers, the second a list of LR schedulers (or lr_dict).

  • Dictionary, with an ‘optimizer’ key, and (optionally) a ‘lr_scheduler’ key whose value is a single LR scheduler or lr_dict.

  • Tuple of dictionaries as described, with an optional ‘frequency’ key.

  • None - Fit will run without any optimizer.

Note

The ‘frequency’ value is an int corresponding to the number of sequential batches optimized with the specific optimizer. It should be given to none or to all of the optimizers. There is a difference between passing multiple optimizers in a list, and passing multiple optimizers in dictionaries with a frequency of 1: In the former case, all optimizers will operate on the given batch in each optimization step. In the latter, only one optimizer will operate on the given batch at every step.

The lr_dict is a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.

{
    'scheduler': lr_scheduler, # The LR scheduler instance (required)
    'interval': 'epoch', # The unit of the scheduler's step size
    'frequency': 1, # The frequency of the scheduler
    'reduce_on_plateau': False, # For ReduceLROnPlateau scheduler
    'monitor': 'val_loss', # Metric for ReduceLROnPlateau to monitor
    'strict': True, # Whether to crash the training if `monitor` is not found
    'name': None, # Custom name for LearningRateMonitor to use
}

Only the scheduler key is required, the rest will be set to the defaults above.

Examples:

# most cases
def configure_optimizers(self):
    opt = Adam(self.parameters(), lr=1e-3)
    return opt

# multiple optimizer case (e.g.: GAN)
def configure_optimizers(self):
    generator_opt = Adam(self.model_gen.parameters(), lr=0.01)
    disriminator_opt = Adam(self.model_disc.parameters(), lr=0.02)
    return generator_opt, disriminator_opt

# example with learning rate schedulers
def configure_optimizers(self):
    generator_opt = Adam(self.model_gen.parameters(), lr=0.01)
    disriminator_opt = Adam(self.model_disc.parameters(), lr=0.02)
    discriminator_sched = CosineAnnealing(discriminator_opt, T_max=10)
    return [generator_opt, disriminator_opt], [discriminator_sched]

# example with step-based learning rate schedulers
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_disc.parameters(), lr=0.02)
    gen_sched = {'scheduler': ExponentialLR(gen_opt, 0.99),
                 'interval': 'step'}  # called after each training step
    dis_sched = CosineAnnealing(discriminator_opt, T_max=10) # called every epoch
    return [gen_opt, dis_opt], [gen_sched, dis_sched]

# example with optimizer frequencies
# see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1
# https://arxiv.org/abs/1704.00028
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_disc.parameters(), lr=0.02)
    n_critic = 5
    return (
        {'optimizer': dis_opt, 'frequency': n_critic},
        {'optimizer': gen_opt, 'frequency': 1}
    )

Note

Some things to know:

  • Lightning calls .backward() and .step() on each optimizer and learning rate scheduler as needed.

  • If you use 16-bit precision (precision=16), Lightning will automatically handle the optimizers for you.

  • If you use multiple optimizers, training_step() will have an additional optimizer_idx parameter.

  • If you use LBFGS Lightning handles the closure function automatically for you.

  • If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step.

  • If you need to control how often those optimizers step or override the default .step() schedule, override the optimizer_step() hook.

  • If you only want to call a learning rate scheduler every x step or epoch, or want to monitor a custom metric, you can specify these in a lr_dict:

    {
        'scheduler': lr_scheduler,
        'interval': 'step',  # or 'epoch'
        'monitor': 'val_f1',
        'frequency': x,
    }
    
forward(x)#

Same as torch.nn.Module.forward(), however in Lightning you want this to define the operations you want to use for prediction (i.e.: on a server or as a feature extractor).

Normally you’d call self() from your training_step() method. This makes it easy to write a complex system for training with the outputs you’d want in a prediction setting.

You may also find the auto_move_data() decorator useful when using the module outside Lightning in a production setting.

Parameters:
  • *args – Whatever you decide to pass into the forward method.

  • **kwargs – Keyword arguments are also possible.

Returns:

Predicted output

Examples:

# example if we were using this model as a feature extractor
def forward(self, x):
    feature_maps = self.convnet(x)
    return feature_maps

def training_step(self, batch, batch_idx):
    x, y = batch
    feature_maps = self(x)
    logits = self.classifier(feature_maps)

    # ...
    return loss

# splitting it this way allows model to be used a feature extractor
model = MyModelAbove()

inputs = server.get_request()
results = model(inputs)
server.write_results(results)

# -------------
# This is in stark contrast to torch.nn.Module where normally you would have this:
def forward(self, batch):
    x, y = batch
    feature_maps = self.convnet(x)
    logits = self.classifier(feature_maps)
    return logits
test_epoch_end(outputs)#

Called at the end of a test epoch with the output of all test steps.

# the pseudocode for these calls
test_outs = []
for test_batch in test_data:
    out = test_step(test_batch)
    test_outs.append(out)
test_epoch_end(test_outs)
Parameters:

outputs – List of outputs you defined in test_step_end(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader

Returns:

None

Note

If you didn’t define a test_step(), this won’t be called.

Examples

With a single dataloader:

def test_epoch_end(self, outputs):
    # do something with the outputs of all test batches
    all_test_preds = test_step_outputs.predictions

    some_result = calc_all_results(all_test_preds)
    self.log(some_result)

With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each test step for that dataloader.

def test_epoch_end(self, outputs):
    final_value = 0
    for dataloader_outputs in outputs:
        for test_step_out in dataloader_outputs:
            # do something
            final_value += test_step_out

    self.log('final_metric', final_value)
test_step(x, batch_id)#

Operates on a single batch of data from the test set. In this step you’d normally generate examples or calculate anything of interest such as accuracy.

# the pseudocode for these calls
test_outs = []
for test_batch in test_data:
    out = test_step(test_batch)
    test_outs.append(out)
test_epoch_end(test_outs)
Parameters:
  • batch (Tensor | (Tensor, …) | [Tensor, …]) – The output of your DataLoader. A tensor, tuple or list.

  • batch_idx (int) – The index of this batch.

  • dataloader_idx (int) – The index of the dataloader that produced this batch (only if multiple test dataloaders used).

Returns:

Any of.

  • Any object or value

  • None - Testing will skip to the next batch

# if you have one test dataloader:
def test_step(self, batch, batch_idx)

# if you have multiple test dataloaders:
def test_step(self, batch, batch_idx, dataloader_idx)

Examples:

# CASE 1: A single test dataset
def test_step(self, batch, batch_idx):
    x, y = batch

    # implement your own
    out = self(x)
    loss = self.loss(out, y)

    # log 6 example images
    # or generated text... or whatever
    sample_imgs = x[:6]
    grid = torchvision.utils.make_grid(sample_imgs)
    self.logger.experiment.add_image('example_images', grid, 0)

    # calculate acc
    labels_hat = torch.argmax(out, dim=1)
    test_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0)

    # log the outputs!
    self.log_dict({'test_loss': loss, 'test_acc': test_acc})

If you pass in multiple test dataloaders, test_step() will have an additional argument.

# CASE 2: multiple test dataloaders
def test_step(self, batch, batch_idx, dataloader_idx):
    # dataloader_idx tells you which dataset this is.

Note

If you don’t need to test you don’t need to implement this method.

Note

When the test_step() is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of the test epoch, the model goes back to training mode and gradients are enabled.

train_epoch_end(outputs)#
training: bool#
training_step(x, batch_id)#

Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger.

Parameters:
Returns:

Any of.

  • Tensor - The loss tensor

  • dict - A dictionary. Can include any keys, but must include the key 'loss'

  • None - Training will skip to the next batch

Note

Returning None is currently not supported for multi-GPU or TPU, or with 16-bit precision enabled.

In this step you’d normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific.

Example:

def training_step(self, batch, batch_idx):
    x, y, z = batch
    out = self.encoder(x)
    loss = self.loss(out, x)
    return loss

If you define multiple optimizers, this step will be called with an additional optimizer_idx parameter.

# Multiple optimizers (e.g.: GANs)
def training_step(self, batch, batch_idx, optimizer_idx):
    if optimizer_idx == 0:
        # do training_step with encoder
    if optimizer_idx == 1:
        # do training_step with decoder

If you add truncated back propagation through time you will also get an additional argument with the hidden states of the previous step.

# Truncated back-propagation through time
def training_step(self, batch, batch_idx, hiddens):
    # hiddens are the hidden states from the previous truncated backprop step
    ...
    out, hiddens = self.lstm(data, hiddens)
    ...
    return {'loss': loss, 'hiddens': hiddens}

Note

The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step.

validation_epoch_end(outputs)#

Called at the end of the validation epoch with the outputs of all validation steps.

# the pseudocode for these calls
val_outs = []
for val_batch in val_data:
    out = validation_step(val_batch)
    val_outs.append(out)
validation_epoch_end(val_outs)
Parameters:

outputs – List of outputs you defined in validation_step(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader.

Returns:

None

Note

If you didn’t define a validation_step(), this won’t be called.

Examples

With a single dataloader:

def validation_epoch_end(self, val_step_outputs):
    for out in val_step_outputs:
        # do something

With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each validation step for that dataloader.

def validation_epoch_end(self, outputs):
    for dataloader_output_result in outputs:
        dataloader_outs = dataloader_output_result.dataloader_i_outputs

    self.log('final_metric', final_value)
validation_step(x, batch_id)#

Operates on a single batch of data from the validation set. In this step you’d might generate examples or calculate anything of interest like accuracy.

# the pseudocode for these calls
val_outs = []
for val_batch in val_data:
    out = validation_step(val_batch)
    val_outs.append(out)
validation_epoch_end(val_outs)
Parameters:
  • batch (Tensor | (Tensor, …) | [Tensor, …]) – The output of your DataLoader. A tensor, tuple or list.

  • batch_idx (int) – The index of this batch

  • dataloader_idx (int) – The index of the dataloader that produced this batch (only if multiple val dataloaders used)

Returns:

Any of.

  • Any object or value

  • None - Validation will skip to the next batch

# pseudocode of order
val_outs = []
for val_batch in val_data:
    out = validation_step(val_batch)
    if defined('validation_step_end'):
        out = validation_step_end(out)
    val_outs.append(out)
val_outs = validation_epoch_end(val_outs)
# if you have one val dataloader:
def validation_step(self, batch, batch_idx)

# if you have multiple val dataloaders:
def validation_step(self, batch, batch_idx, dataloader_idx)

Examples:

# CASE 1: A single validation dataset
def validation_step(self, batch, batch_idx):
    x, y = batch

    # implement your own
    out = self(x)
    loss = self.loss(out, y)

    # log 6 example images
    # or generated text... or whatever
    sample_imgs = x[:6]
    grid = torchvision.utils.make_grid(sample_imgs)
    self.logger.experiment.add_image('example_images', grid, 0)

    # calculate acc
    labels_hat = torch.argmax(out, dim=1)
    val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0)

    # log the outputs!
    self.log_dict({'val_loss': loss, 'val_acc': val_acc})

If you pass in multiple val dataloaders, validation_step() will have an additional argument.

# CASE 2: multiple validation dataloaders
def validation_step(self, batch, batch_idx, dataloader_idx):
    # dataloader_idx tells you which dataset this is.

Note

If you don’t need to validate you don’t need to implement this method.

Note

When the validation_step() is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of validation, the model goes back to training mode and gradients are enabled.

class mass_automation.formula.model.LinearWithHidden(in_size, hidden_size, out_size, activation=True, dropout=True)#

Bases: Module

forward(spectra)#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
class mass_automation.formula.model.MLP(in_size=100, hidden_size=50, activation=True, loss='MSE', opt='Adam', lr=0.0002, **kwargs)#

Bases: LightningModule

configure_optimizers()#

Choose what optimizers and learning-rate schedulers to use in your optimization. Normally you’d need one. But in the case of GANs or similar you might have multiple.

Returns:

Any of these 6 options.

  • Single optimizer.

  • List or Tuple - List of optimizers.

  • Two lists - The first list has multiple optimizers, the second a list of LR schedulers (or lr_dict).

  • Dictionary, with an ‘optimizer’ key, and (optionally) a ‘lr_scheduler’ key whose value is a single LR scheduler or lr_dict.

  • Tuple of dictionaries as described, with an optional ‘frequency’ key.

  • None - Fit will run without any optimizer.

Note

The ‘frequency’ value is an int corresponding to the number of sequential batches optimized with the specific optimizer. It should be given to none or to all of the optimizers. There is a difference between passing multiple optimizers in a list, and passing multiple optimizers in dictionaries with a frequency of 1: In the former case, all optimizers will operate on the given batch in each optimization step. In the latter, only one optimizer will operate on the given batch at every step.

The lr_dict is a dictionary which contains the scheduler and its associated configuration. The default configuration is shown below.

{
    'scheduler': lr_scheduler, # The LR scheduler instance (required)
    'interval': 'epoch', # The unit of the scheduler's step size
    'frequency': 1, # The frequency of the scheduler
    'reduce_on_plateau': False, # For ReduceLROnPlateau scheduler
    'monitor': 'val_loss', # Metric for ReduceLROnPlateau to monitor
    'strict': True, # Whether to crash the training if `monitor` is not found
    'name': None, # Custom name for LearningRateMonitor to use
}

Only the scheduler key is required, the rest will be set to the defaults above.

Examples:

# most cases
def configure_optimizers(self):
    opt = Adam(self.parameters(), lr=1e-3)
    return opt

# multiple optimizer case (e.g.: GAN)
def configure_optimizers(self):
    generator_opt = Adam(self.model_gen.parameters(), lr=0.01)
    disriminator_opt = Adam(self.model_disc.parameters(), lr=0.02)
    return generator_opt, disriminator_opt

# example with learning rate schedulers
def configure_optimizers(self):
    generator_opt = Adam(self.model_gen.parameters(), lr=0.01)
    disriminator_opt = Adam(self.model_disc.parameters(), lr=0.02)
    discriminator_sched = CosineAnnealing(discriminator_opt, T_max=10)
    return [generator_opt, disriminator_opt], [discriminator_sched]

# example with step-based learning rate schedulers
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_disc.parameters(), lr=0.02)
    gen_sched = {'scheduler': ExponentialLR(gen_opt, 0.99),
                 'interval': 'step'}  # called after each training step
    dis_sched = CosineAnnealing(discriminator_opt, T_max=10) # called every epoch
    return [gen_opt, dis_opt], [gen_sched, dis_sched]

# example with optimizer frequencies
# see training procedure in `Improved Training of Wasserstein GANs`, Algorithm 1
# https://arxiv.org/abs/1704.00028
def configure_optimizers(self):
    gen_opt = Adam(self.model_gen.parameters(), lr=0.01)
    dis_opt = Adam(self.model_disc.parameters(), lr=0.02)
    n_critic = 5
    return (
        {'optimizer': dis_opt, 'frequency': n_critic},
        {'optimizer': gen_opt, 'frequency': 1}
    )

Note

Some things to know:

  • Lightning calls .backward() and .step() on each optimizer and learning rate scheduler as needed.

  • If you use 16-bit precision (precision=16), Lightning will automatically handle the optimizers for you.

  • If you use multiple optimizers, training_step() will have an additional optimizer_idx parameter.

  • If you use LBFGS Lightning handles the closure function automatically for you.

  • If you use multiple optimizers, gradients will be calculated only for the parameters of current optimizer at each training step.

  • If you need to control how often those optimizers step or override the default .step() schedule, override the optimizer_step() hook.

  • If you only want to call a learning rate scheduler every x step or epoch, or want to monitor a custom metric, you can specify these in a lr_dict:

    {
        'scheduler': lr_scheduler,
        'interval': 'step',  # or 'epoch'
        'monitor': 'val_f1',
        'frequency': x,
    }
    
forward(x)#

Same as torch.nn.Module.forward(), however in Lightning you want this to define the operations you want to use for prediction (i.e.: on a server or as a feature extractor).

Normally you’d call self() from your training_step() method. This makes it easy to write a complex system for training with the outputs you’d want in a prediction setting.

You may also find the auto_move_data() decorator useful when using the module outside Lightning in a production setting.

Parameters:
  • *args – Whatever you decide to pass into the forward method.

  • **kwargs – Keyword arguments are also possible.

Returns:

Predicted output

Examples:

# example if we were using this model as a feature extractor
def forward(self, x):
    feature_maps = self.convnet(x)
    return feature_maps

def training_step(self, batch, batch_idx):
    x, y = batch
    feature_maps = self(x)
    logits = self.classifier(feature_maps)

    # ...
    return loss

# splitting it this way allows model to be used a feature extractor
model = MyModelAbove()

inputs = server.get_request()
results = model(inputs)
server.write_results(results)

# -------------
# This is in stark contrast to torch.nn.Module where normally you would have this:
def forward(self, batch):
    x, y = batch
    feature_maps = self.convnet(x)
    logits = self.classifier(feature_maps)
    return logits
test_epoch_end(outputs)#

Called at the end of a test epoch with the output of all test steps.

# the pseudocode for these calls
test_outs = []
for test_batch in test_data:
    out = test_step(test_batch)
    test_outs.append(out)
test_epoch_end(test_outs)
Parameters:

outputs – List of outputs you defined in test_step_end(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader

Returns:

None

Note

If you didn’t define a test_step(), this won’t be called.

Examples

With a single dataloader:

def test_epoch_end(self, outputs):
    # do something with the outputs of all test batches
    all_test_preds = test_step_outputs.predictions

    some_result = calc_all_results(all_test_preds)
    self.log(some_result)

With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each test step for that dataloader.

def test_epoch_end(self, outputs):
    final_value = 0
    for dataloader_outputs in outputs:
        for test_step_out in dataloader_outputs:
            # do something
            final_value += test_step_out

    self.log('final_metric', final_value)
test_step(x, batch_id)#

Operates on a single batch of data from the test set. In this step you’d normally generate examples or calculate anything of interest such as accuracy.

# the pseudocode for these calls
test_outs = []
for test_batch in test_data:
    out = test_step(test_batch)
    test_outs.append(out)
test_epoch_end(test_outs)
Parameters:
  • batch (Tensor | (Tensor, …) | [Tensor, …]) – The output of your DataLoader. A tensor, tuple or list.

  • batch_idx (int) – The index of this batch.

  • dataloader_idx (int) – The index of the dataloader that produced this batch (only if multiple test dataloaders used).

Returns:

Any of.

  • Any object or value

  • None - Testing will skip to the next batch

# if you have one test dataloader:
def test_step(self, batch, batch_idx)

# if you have multiple test dataloaders:
def test_step(self, batch, batch_idx, dataloader_idx)

Examples:

# CASE 1: A single test dataset
def test_step(self, batch, batch_idx):
    x, y = batch

    # implement your own
    out = self(x)
    loss = self.loss(out, y)

    # log 6 example images
    # or generated text... or whatever
    sample_imgs = x[:6]
    grid = torchvision.utils.make_grid(sample_imgs)
    self.logger.experiment.add_image('example_images', grid, 0)

    # calculate acc
    labels_hat = torch.argmax(out, dim=1)
    test_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0)

    # log the outputs!
    self.log_dict({'test_loss': loss, 'test_acc': test_acc})

If you pass in multiple test dataloaders, test_step() will have an additional argument.

# CASE 2: multiple test dataloaders
def test_step(self, batch, batch_idx, dataloader_idx):
    # dataloader_idx tells you which dataset this is.

Note

If you don’t need to test you don’t need to implement this method.

Note

When the test_step() is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of the test epoch, the model goes back to training mode and gradients are enabled.

train_epoch_end(outputs)#
training: bool#
training_step(x, batch_id)#

Here you compute and return the training loss and some additional metrics for e.g. the progress bar or logger.

Parameters:
Returns:

Any of.

  • Tensor - The loss tensor

  • dict - A dictionary. Can include any keys, but must include the key 'loss'

  • None - Training will skip to the next batch

Note

Returning None is currently not supported for multi-GPU or TPU, or with 16-bit precision enabled.

In this step you’d normally do the forward pass and calculate the loss for a batch. You can also do fancier things like multiple forward passes or something model specific.

Example:

def training_step(self, batch, batch_idx):
    x, y, z = batch
    out = self.encoder(x)
    loss = self.loss(out, x)
    return loss

If you define multiple optimizers, this step will be called with an additional optimizer_idx parameter.

# Multiple optimizers (e.g.: GANs)
def training_step(self, batch, batch_idx, optimizer_idx):
    if optimizer_idx == 0:
        # do training_step with encoder
    if optimizer_idx == 1:
        # do training_step with decoder

If you add truncated back propagation through time you will also get an additional argument with the hidden states of the previous step.

# Truncated back-propagation through time
def training_step(self, batch, batch_idx, hiddens):
    # hiddens are the hidden states from the previous truncated backprop step
    ...
    out, hiddens = self.lstm(data, hiddens)
    ...
    return {'loss': loss, 'hiddens': hiddens}

Note

The loss value shown in the progress bar is smoothed (averaged) over the last values, so it differs from the actual loss returned in train/validation step.

validation_epoch_end(outputs)#

Called at the end of the validation epoch with the outputs of all validation steps.

# the pseudocode for these calls
val_outs = []
for val_batch in val_data:
    out = validation_step(val_batch)
    val_outs.append(out)
validation_epoch_end(val_outs)
Parameters:

outputs – List of outputs you defined in validation_step(), or if there are multiple dataloaders, a list containing a list of outputs for each dataloader.

Returns:

None

Note

If you didn’t define a validation_step(), this won’t be called.

Examples

With a single dataloader:

def validation_epoch_end(self, val_step_outputs):
    for out in val_step_outputs:
        # do something

With multiple dataloaders, outputs will be a list of lists. The outer list contains one entry per dataloader, while the inner list contains the individual outputs of each validation step for that dataloader.

def validation_epoch_end(self, outputs):
    for dataloader_output_result in outputs:
        dataloader_outs = dataloader_output_result.dataloader_i_outputs

    self.log('final_metric', final_value)
validation_step(x, batch_id)#

Operates on a single batch of data from the validation set. In this step you’d might generate examples or calculate anything of interest like accuracy.

# the pseudocode for these calls
val_outs = []
for val_batch in val_data:
    out = validation_step(val_batch)
    val_outs.append(out)
validation_epoch_end(val_outs)
Parameters:
  • batch (Tensor | (Tensor, …) | [Tensor, …]) – The output of your DataLoader. A tensor, tuple or list.

  • batch_idx (int) – The index of this batch

  • dataloader_idx (int) – The index of the dataloader that produced this batch (only if multiple val dataloaders used)

Returns:

Any of.

  • Any object or value

  • None - Validation will skip to the next batch

# pseudocode of order
val_outs = []
for val_batch in val_data:
    out = validation_step(val_batch)
    if defined('validation_step_end'):
        out = validation_step_end(out)
    val_outs.append(out)
val_outs = validation_epoch_end(val_outs)
# if you have one val dataloader:
def validation_step(self, batch, batch_idx)

# if you have multiple val dataloaders:
def validation_step(self, batch, batch_idx, dataloader_idx)

Examples:

# CASE 1: A single validation dataset
def validation_step(self, batch, batch_idx):
    x, y = batch

    # implement your own
    out = self(x)
    loss = self.loss(out, y)

    # log 6 example images
    # or generated text... or whatever
    sample_imgs = x[:6]
    grid = torchvision.utils.make_grid(sample_imgs)
    self.logger.experiment.add_image('example_images', grid, 0)

    # calculate acc
    labels_hat = torch.argmax(out, dim=1)
    val_acc = torch.sum(y == labels_hat).item() / (len(y) * 1.0)

    # log the outputs!
    self.log_dict({'val_loss': loss, 'val_acc': val_acc})

If you pass in multiple val dataloaders, validation_step() will have an additional argument.

# CASE 2: multiple validation dataloaders
def validation_step(self, batch, batch_idx, dataloader_idx):
    # dataloader_idx tells you which dataset this is.

Note

If you don’t need to validate you don’t need to implement this method.

Note

When the validation_step() is called, the model has been put in eval mode and PyTorch gradients have been disabled. At the end of validation, the model goes back to training mode and gradients are enabled.

class mass_automation.formula.model.WeightedMSELoss#

Bases: Module

forward(y, y_hat)#

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
mass_automation.formula.model.log_elementwise(model, metric, prefix)#

mass_automation.formula.plot module#

mass_automation.formula.plot.plot_compare(spectrum: Spectrum, formula: Formula, cal_error=0.006, dist_error=0.003, distance=50, max_peaks=5, path=None, return_masses=False, show=True)#

Spectra comparison visualization.

Parameters:
  • spectrum (Spectrum) – The spectrum, where algorithm tries to detect the substance.

  • formula (Formula) – The Formula class of the substance.

  • cal_error (float) – The radius of the first peak’ vicinity (default is 0.006).

  • dist_error (float) – The possible error in distance between the peaks, characterizes the radius of the second and the next peak’s vicinity (default is 0.001).

  • distance (int) – The parameter, used in peak finding algorithm. (default is 50).

  • max_peaks (int) – Limits the number of possible first peaks (default is 5).

  • path (str) – Saves plotting figure in png with dpi = 300.

Module contents#

class mass_automation.formula.Formula(formula: Union[str, dict], charge=None)#

Bases: object

A class used to represent Formula.

str_formula#

The molecular formula of the substance. Example: ‘C2H5OH’.

Type:

str

dict_formula#

The dictionary, which corresponds to the formula. Example:

{
    'C' : 2,
    'H' : 6,
    'O' : 1
}
Type:

dict

formula#

Dictionary or string, depending on the argument in __init__.

Type:

str or dict

charge#

The charge of the molecule.

Type:

int

monoisotopic_mass#

The mass of the monoisotopic peak.

Type:

float

isodistribution(side_threshold=0.001) Tuple[ndarray, ndarray]#

Creates isotopic distribution.

Gives back a tuple with masses and peaks of the theoretical isotopic distribution.

Parameters:

side_threshold (float) – Minimal required relative intensity of side isotopologues to the most intensive. (default is 0.001)

Return type:

Tuple[np.ndarray, np.ndarray]

Raises:

ValueError – Raises when isotopic distribution can not be constructed because of specific formula names.

vector()#

Converts formula into a vector of quantities with length of number elements in the periodic table

Attention: may not work for elements like Fl, as pyteomics does not recognize them.

Returns:

Resulting vector

Return type:

np.ndarray

class mass_automation.formula.RealIsotopicDistribution(spectrum: Spectrum, peak_indices: List[int])#

Bases: object

get_monoisotopic() int#

Find monoisotopic peak

Currently “monoisotopic” is the first one. THAT IS NO TRUE FOR SOME ELEMENTS. Rewrite get_representation method accordingly if changing

Returns:

Index to the monoisotopic peak

Return type:

int

get_representation(delta: ~typing.Optional[float] = 0.025, length: ~typing.Optional[int] = 100, f: ~typing.Optional[~typing.Callable] = <function amax>, mode='middle', vectorization_method='simple', sigma: ~typing.Optional[float] = None) List[Tuple[ndarray, float]]#

Calculates vector representation for the isotopic distribution

Parameters:
  • delta (float) – Size of peak vicinity

  • length (int) – Number of items in resulting feature vector¬

  • mode (str) –

    Mode of vectorization. One of the following:

    • middle — the most intensive in the middle

    • monoisotopic — some part of the spectrum left to the monoisotopic peak, incremented, by neuron mass

  • vectorization_method (str) –

    Method of vectorization. One of the folloring:

    • simple

    • convolution

  • sigma (float) – A parameter for the vectorization method

Returns:

Representations for each peak

Return type:

List[Tuple[np.ndarray, float]]