mlmc.tool¶

Contains classes that provide an interface to other resources such as HDF5, Gmsh, PBS, …

Submodules¶

mlmc.tool.context_statprof module¶

mlmc.tool.distribution module¶

class mlmc.tool.distribution.Distribution(moments_obj, moment_data, domain=None, force_decay=(True, True), monitor=False)[source]¶

Bases: object

Calculation of the distribution

cdf(values)[source]¶

density(value, moments_fn=None)[source]¶

Parameters:	value – float or np.array moments_fn – counting moments function
Returns:	density for passed value

end_point_derivatives()[source]¶: Compute approximation of moment derivatives at endpoints of the domain. :return: array (2, n_moments)

estimate_density(tol=None)[source]¶: Run nonlinear iterative solver to estimate density, use previous solution as initial guess. Faster, but worse stability. :return: None

estimate_density_minimize(tol=1e-05, reg_param=0.01)[source]¶: Optimize density estimation :param tol: Tolerance for the nonlinear system residual, after division by std errors for individual moment means, i.e. res = || (F_i - mu_i) / sigma_i ||_2 :return: None

eval_moments(x)[source]¶

extend_size(new_size)[source]¶

mlmc.tool.distribution.KL_divergence(prior_density, posterior_density, a, b)[source]¶: Compute D_KL(P | Q) = int_R P(x) log( P(X)/Q(x)) dx :param prior_density: P :param posterior_density: Q :return: KL divergence value

mlmc.tool.distribution.L2_distance(prior_density, posterior_density, a, b)[source]¶

mlmc.tool.distribution.compute_exact_moments(moments_fn, density, tol=0.0001)[source]¶: Compute approximation of moments using exact density. :param moments_fn: Moments function. :param n_moments: Number of mements to compute. :param density: Density function (must accept np vectors). :param a, b: Integral bounds, approximate integration over R. :param tol: Tolerance of integration. :return: np.array, moment values

mlmc.tool.flow_mc module¶

class mlmc.tool.flow_mc.FlowSim(config=None, clean=None)[source]¶

Bases: mlmc.sim.simulation.Simulation

static calculate(config, seed)[source]¶: Method that actually run the calculation, it’s called from mlmc.tool.pbs_job.PbsJob.calculate_samples() Calculate fine and coarse sample and also extract their results :param config: dictionary containing simulation configuration, LevelSimulation.config_dict (set in level_instance) :param seed: random seed, int :return: List[fine result, coarse result], both flatten arrays (see mlmc.sim.synth_simulation.calculate())

static extract_mesh(mesh_file)[source]¶: Extract mesh from file :param mesh_file: Mesh file path :return: Dict

static generate_random_sample(fields, coarse_step, n_fine_elements)[source]¶: Generate random field, both fine and coarse part. Store them separeted. :return: Dict, Dict

level_instance(fine_level_params: List[float], coarse_level_params: List[float]) → mlmc.level_simulation.LevelSimulation[source]¶: Called from mlmc.Sampler, it creates single instance of LevelSimulation (mlmc.) :param fine_level_params: in this version, it is just fine simulation step :param coarse_level_params: in this version, it is just coarse simulation step :return: mlmc.LevelSimulation object, this object is serialized in SamplingPoolPbs and deserialized in PbsJob,

so it allows pass simulation data from main process to PBS process

static make_fields(fields, fine_mesh_data, coarse_mesh_data)[source]¶: Create random fields that are used by both coarse and fine simulation :param fields: correlated_field.Fields instance :param fine_mesh_data: Dict contains data extracted from fine mesh file (points, point_region_ids, region_map) :param coarse_mesh_data: Dict contains data extracted from coarse mesh file (points, point_region_ids, region_map) :return: correlated_field.Fields

static result_format() → List[mlmc.quantity.quantity_spec.QuantitySpec][source]¶: Define simulation result format :return: List[QuantitySpec, …]

FIELDS_FILE = 'fields_sample.msh'¶

Gather data for single flow call (coarse/fine)

Usage: mlmc.sampler.Sampler uses instance of FlowSim, it calls once level_instance() for each level step (The level_instance() method

is called as many times as the number of levels), it takes place in main process

mlmc.tool.pbs_job.PbsJob uses static methods in FlowSim, it calls calculate(). That’s where the calculation actually runs, it takes place in PBS process

It also extracts results and passes them back to PbsJob, which handles the rest

GEO_FILE = 'mesh.geo'¶

MESH_FILE = 'mesh.msh'¶

MESH_FILE_VAR = 'mesh_file'¶

TIMESTEP_H1_VAR = 'timestep_h1'¶

TIMESTEP_H2_VAR = 'timestep_h2'¶

YAML_FILE = 'flow_input.yaml'¶

YAML_TEMPLATE = 'flow_input.yaml.tmpl'¶

total_sim_id = 0¶

mlmc.tool.flow_mc.create_corr_field(model='gauss', corr_length=0.125, dim=2, log=True, sigma=1, mode_no=1000)[source]¶: Create random fields :return:

mlmc.tool.flow_mc.force_mkdir(path, force=False)[source]¶: Make directory ‘path’ with all parents, remove the leaf dir recursively if it already exists. :param path: path to directory :param force: if dir already exists then remove it and create new one :return: None

mlmc.tool.flow_mc.substitute_placeholders(file_in, file_out, params)[source]¶: Substitute for placeholders of format ‘<name>’ from the dict ‘params’. :param file_in: Template file. :param file_out: Values substituted. :param params: { ‘name’: value, …}

mlmc.tool.gmsh_io module¶

Module containing an expanded python gmsh class

class mlmc.tool.gmsh_io.GmshIO(filename=None)[source]¶

Bases: object

This is a class for storing nodes and elements. Based on Gmsh.py

Members: nodes – A dict of the form { nodeID: [ xcoord, ycoord, zcoord] } elements – A dict of the form { elemID: (type, [tags], [nodeIDs]) } physical – A dict of the form { name: (id, dim) }

Methods: read([file]) – Parse a Gmsh version 1.0 or 2.0 mesh file write([file]) – Output a Gmsh version 2.0 mesh file

read(mshfile=None)[source]¶

Read a Gmsh .msh file.

Reads Gmsh format 1.0 and 2.0 mesh files, storing the nodes and elements in the appropriate dicts.

read_element_data()[source]¶

Write given element data to the MSH file. Write only a single ‘$ElementData’ section. :param f: Output file stream. :param ele_ids: Iterable giving element ids of N value rows given in ‘values’ :param name: Field name. :param values: np.array (N, L); N number of elements, L values per element (components) :return:

TODO: Generalize to time dependent fields.

read_element_data_head(mshfile)[source]¶

reset()[source]¶: Reinitialise Gmsh data structure

write_ascii(mshfile=None)[source]¶: Dump the mesh out to a Gmsh 2.0 msh file.

write_binary(filename=None)[source]¶: Dump the mesh out to a Gmsh 2.0 msh file.

write_element_data(f, ele_ids, name, values)[source]¶

Write given element data to the MSH file. Write only a single ‘$ElementData’ section. :param f: Output file stream. :param ele_ids: Iterable giving element ids of N value rows given in ‘values’ :param name: Field name. :param values: np.array (N, L); N number of elements, L values per element (components) :return:

TODO: Generalize to time dependent fields.

write_fields(msh_file, ele_ids, fields)[source]¶: Creates input data msh file for Flow model. :param msh_file: Target file (or None for current mesh file) :param ele_ids: Element IDs in computational mesh corrsponding to order of field values in element’s barycenter. :param fields: {‘field_name’ : values_array, ..}

mlmc.tool.hdf5 module¶

class mlmc.tool.hdf5.HDF5(file_path, load_from_file=False)[source]¶

Bases: object

HDF5 file is organized into groups (h5py.Group objects) which is somewhat like dictionaries in python terminology - ‘keys’ are names of group members ‘values’ are members (groups (h5py.Group objects) and datasets (h5py.Dataset objects - similar to NumPy arrays)). Each group and dataset (including root group) can store metadata in ‘attributes’ (h5py.AttributeManager objects) HDF5 files (h5py.File) work generally like standard Python file objects

Our HDF5 file strucutre:

Main Group: Keys:

Levels: h5py.Group

Attributes:

level_parameters: [[a], [b], [], …]

Keys:

<N>: h5py.Group (N - level id, start with 0)

Attributes:

id: str n_ops_estimate: float

Keys:

scheduled: h5py.Dataset

dtype: S100 shape: (N,), N - number of scheduled values maxshape: (None,) chunks: True

collected_values: h5py.Dataset

dtype: numpy.float64 shape: (Nc, 2, M) dtype structure is defined in simulation class maxshape: (None, 2, None) chunks: True

collected_ids: h5py.Dataset

dtype: numpy.int16 index into scheduled shape: (Nc, 1) maxshape: (None, 1) chunks: True

failed: h5py.Dataset

dtype: (‘S100’, ‘S1000’) shape: (Nf, 1) mashape: (None, 1) chunks: True

add_level_group(level_id)[source]¶: Create group for particular level, parent group is ‘Levels’ :param level_id: str, mlmc.Level identifier :return: LevelGroup instance, it is container for h5py.Group instance

clear_groups()[source]¶: Remove HDF5 group Levels, it allows run same mlmc object more times :return: None

create_file_structure(level_parameters)[source]¶: Create hdf structure :param level_parameters: List[float] :return: None

init_header(level_parameters)[source]¶: Add h5py.File metadata to .attrs (attrs objects are of class h5py.AttributeManager) :param level_parameters: MLMC level range of steps :return: None

load_from_file()[source]¶: Load root group attributes from existing HDF5 file :return: None

load_level_parameters()[source]¶

load_result_format()[source]¶: Load format result, it just read dataset :return:

save_result_format(result_format, res_dtype)[source]¶: Save result format to dataset :param result_format: List[QuantitySpec] :param res_dtype: result numpy dtype :return: None

result_format_dset_name¶: Result format dataset name :return: str

class mlmc.tool.hdf5.LevelGroup(file_name, hdf_group_path, level_id, loaded_from_file=False)[source]¶

Bases: object

append_failed(failed_samples)[source]¶: Save level failed sample ids (not append samples) :param failed_samples: set; Level sample ids :return: None

append_scheduled(scheduled_samples)[source]¶: Save scheduled samples to dataset (h5py.Dataset) :param scheduled_samples: list of sample ids :return: None

append_successful(samples: numpy.array)[source]¶: Save level samples to datasets (h5py.Dataset), save ids of collected samples and their results :param samples: np.ndarray :return: None

chunks(n_samples=None)[source]¶

clear_failed_dataset()[source]¶: Clear failed_ids dataset :return: None

collected(chunk_slice)[source]¶: Read collected data by chunks, number of items in chunk is determined by LevelGroup.chunk_size (number of bytes) :param chunk_slice: slice() object :return: np.ndarray

collected_n_items()[source]¶: Number of collected samples :return: int

get_failed_ids()[source]¶: Failed samples ids :return: list of failed sample ids

get_finished_ids()[source]¶: Get collected and failed samples ids :return: NumPy array

get_unfinished_ids()[source]¶: Get unfinished sample ids as difference between scheduled ids and finished ids :return: list

scheduled()[source]¶: Read level dataset with scheduled samples :return:

COLLECTED_ATTRS = {'sample_id': {'default_shape': (0,), 'dtype': {'formats': ['S100'], 'names': ['sample_id']}, 'maxshape': (None,), 'name': 'collected_ids'}}¶

FAILED_DTYPE = {'formats': ('S100', 'S1000'), 'names': ('sample_id', 'message')}¶

SCHEDULED_DTYPE = {'formats': ['S100'], 'names': ['sample_id']}¶

collected_ids_dset¶: Collected ids dataset :return: Dataset name

failed_dset¶: Dataset of ids of failed samples :return: Dataset name

n_ops_estimate¶: Get number of operations estimate :return: float

scheduled_dset¶: Dataset with scheduled samples :return: Dataset name

mlmc.tool.pbs_job module¶

class mlmc.tool.pbs_job.PbsJob(output_dir, jobs_dir, job_id, level_sim_file, debug)[source]¶

Bases: object

calculate_samples()[source]¶: Calculate scheduled samples :return:

static command_params()[source]¶: Read command parameters - job identifier and file with necessary files :return: None

classmethod create_job(output_dir, jobs_dir, job_id, level_sim_file, debug)[source]¶: Create PbsProcess instance from SamplingPoolPBS :param output_dir: str :param jobs_dir: str :param job_id: str :param level_sim_file: str, file name format of LevelSimulation serialization :param debug: bool, if True keep sample directories :return: PbsProcess instance

classmethod create_process()[source]¶: Create PbsProcess via PBS :return:

static get_job_n_running(job_id, jobs_dir)[source]¶: Get number of running (scheduled) samples for given unfinished jobs :param job_id: str :param jobs_dir: str, path to jobs directory :return: int

static get_scheduled_sample_ids(job_id, jobs_dir)[source]¶: Get scheduled samples :param job_id: str :param jobs_dir: str :return:

static job_id_from_sample_id(sample_id, jobs_dir)[source]¶: Get job ID for given sample ID :param sample_id: str :param jobs_dir: jobs directory with results :return: str, job id

static read_results(job_id, jobs_dir)[source]¶: Read result file for given job id :param job_id: str :param jobs_dir: path to jobs directory :return: successful: Dict[level_id, List[Tuple[sample_id:str, Tuple[ndarray, ndarray]]]]

failed: Dict[level_id, List[Tuple[sample_id: str, error message: str]]] time: Dict[level_id: int, List[total time: float, number of success samples: int]]

save_sample_id_job_id(job_id, sample_ids)[source]¶: Store the sample ID associated with the job ID :param job_id: str :param sample_ids: list of str

save_scheduled(scheduled)[source]¶: Save scheduled samples to yaml file format: List[Tuple[level_id, sample_id]] :return: None

write_pbs_id(pbs_job_id)[source]¶: Create empty file name contains pbs jobID and our jobID :param pbs_job_id: str :return: None

CLASS_FILE = 'pbs_process_serialized.txt'¶

FAILED_RESULTS = '{}_failed_results.yaml'¶

PBS_ID = '{}_'¶

SAMPLE_ID_JOB_ID = 'sample_id_job_id.json'¶

SCHEDULED = '{}_scheduled.yaml'¶

SUCCESSFUL_RESULTS = '{}_successful_results.yaml'¶

TIME = '{}_times.yaml'¶

mlmc.tool.process_base module¶

class mlmc.tool.process_base.ProcessBase[source]¶

Bases: object

Parent class for particular simulation processes

all_collect(sampler_list)[source]¶: Collect samples :param mlmc_list: List of mlmc.MLMC objects :return: None

analyze_error_of_level_variances(cl, mlmc_level)[source]¶: Analyze error of level variances :param cl: mlmc.estimate.CompareLevels instance :param mlmc_level: selected MC method :return: None

analyze_error_of_log_variance(cl, mlmc_level)[source]¶: Analyze error of level variances :param cl: mlmc.estimate.CompareLevels instance :param mlmc_level: selected MC method :return: None

analyze_error_of_regression_level_variances(cl, mlmc_level)[source]¶: Analyze error of level variances :param cl: mlmc.estimate.CompareLevels instance :param mlmc_level: selected MC method :return: None

analyze_error_of_regression_variance(cl, mlmc_level)[source]¶: Analyze error of regression variance :param cl: CompareLevels :param mlmc_level: selected MC method :return:

analyze_error_of_variance(cl, mlmc_level)[source]¶: Analyze error of variance for particular mlmc method or for all collected methods :param cl: mlmc.estimate.CompareLevels instance :param mlmc_level: selected MC method :return: None

analyze_pdf_approx(cl)[source]¶: Plot densities :param cl: mlmc.estimate.CompareLevels :return: None

analyze_regression_of_variance(cl, mlmc_level)[source]¶: Analyze regression of variance :param cl: mlmc.estimate.CompareLevels instance :param mlmc_level: selected MC method :return: None

create_pbs_object(output_dir, clean)[source]¶: Initialize object for PBS execution :param output_dir: Output directory :param clean: bool, if True remove existing files :return: None

generate_jobs(mlmc, n_samples=None)[source]¶: Generate level samples :param n_samples: None or list, number of samples for each level :return: None

static get_arguments(arguments)[source]¶: Getting arguments from console :param arguments: list of arguments :return: namespace

n_sample_estimate(mlmc, target_variance=0.001)[source]¶: Estimate number of level samples considering target variance :param mlmc: MLMC object :param target_variance: float, target variance of moments :return: None

process_analysis(cl)[source]¶: Main analysis function. Particular types of analysis called from here. :param cl: Instance of CompareLevels - list of Estimate objects :return:

rm_files(output_dir)[source]¶: Rm files and dirs :param output_dir: Output directory path :return:

run(renew=True)[source]¶: Run mlmc :return: None

set_environment_variables()[source]¶: Set pbs config, flow123d, gmsh :return: None

set_moments(n_moments, log=False)[source]¶: Create moments function instance :param n_moments: int, number of moments :param log: bool, If true then apply log transform :return:

setup_config(n_levels, clean)[source]¶: Set simulation configuration depends on particular task :param n_levels: Number of levels :param clean: bool, if False use existing files :return: mlmc.MLMC

mlmc.tool.simple_distribution module¶

class mlmc.tool.simple_distribution.SimpleDistribution(moments_obj, moment_data, domain=None, force_decay=(True, True), verbose=False)[source]¶

Bases: object

Calculation of the distribution

cdf(values)[source]¶

density(value)[source]¶

Parameters:	value – float or np.array moments_fn – counting moments function
Returns:	density for passed value

end_point_derivatives()[source]¶: Compute approximation of moment derivatives at endpoints of the domain. :return: array (2, n_moments)

estimate_density_minimize(tol=1e-05, reg_param=0.01)[source]¶: Optimize density estimation :param tol: Tolerance for the nonlinear system residual, after division by std errors for individual moment means, i.e. res = || (F_i - mu_i) / sigma_i ||_2 :return: None

eval_moments(x)[source]¶

mlmc.tool.simple_distribution.KL_divergence(prior_density, posterior_density, a, b)[source]¶: Compute D_KL(P | Q) = int_R P(x) log( P(X)/Q(x)) dx :param prior_density: P :param posterior_density: Q :return: KL divergence value

mlmc.tool.simple_distribution.L2_distance(prior_density, posterior_density, a, b)[source]¶

mlmc.tool.simple_distribution.best_fit_all(values, range_a, range_b)[source]¶

mlmc.tool.simple_distribution.best_p1_fit(values)[source]¶: Find indices a < b such that linear fit for values[a:b] have smallest residual / (b - a)** alpha alpha is fixed parameter. This should find longest fit with reasonably small residual. :return: (a, b)

mlmc.tool.simple_distribution.compute_exact_cov(moments_fn, density, tol=1e-10)[source]¶: Compute approximation of covariance matrix using exact density. :param moments_fn: Moments function. :param density: Density function (must accept np vectors). :param tol: Tolerance of integration. :return: np.array, moment values

mlmc.tool.simple_distribution.compute_exact_moments(moments_fn, density, tol=1e-10)[source]¶: Compute approximation of moments using exact density. :param moments_fn: Moments function. :param density: Density function (must accept np vectors). :param tol: Tolerance of integration. :return: np.array, moment values

mlmc.tool.simple_distribution.compute_semiexact_cov(moments_fn, density, tol=1e-10)[source]¶: Compute approximation of covariance matrix using exact density. :param moments_fn: Moments function. :param density: Density function (must accept np vectors). :param tol: Tolerance of integration. :return: np.array, moment values

mlmc.tool.simple_distribution.compute_semiexact_moments(moments_fn, density, tol=1e-10)[source]¶

mlmc.tool.simple_distribution.construct_ortogonal_moments(moments, cov, tol=None)[source]¶: For given moments find the basis orthogonal with respect to the covariance matrix, estimated from samples. :param moments: moments object :return: orthogonal moments object of the same size.

mlmc.tool.simple_distribution.detect_treshold_slope_change(values, log=True)[source]¶

Find a longest subsequence with linear fit residual X% higher then the best at least 4 point fit. Extrapolate this fit to the left.

Parameters:	values – Increassing sequence. log – Use logarithm of the sequence.
Returns:	Index K for which K: should have same slope.

mlmc.tool.simple_distribution.lsq_reconstruct(cov, eval, evec, treshold)[source]¶

mlmc.tool.stats_tests module¶

mlmc.tool.stats_tests.anova(level_moments)[source]¶: Analysis of variance :param level_moments: moments values per level :return: bool

mlmc.tool.stats_tests.chi2_test(var_0, samples, max_p_val=0.01, tag='')[source]¶: Test that variance of samples is sigma_0, false failures with probability max_p_val. :param sigma_0: Exact mean. :param samples: Samples to test. :param max_p_val: Probability of failed t-test for correct samples.

mlmc.tool.stats_tests.t_test(mu_0, samples, max_p_val=0.01)[source]¶

Test that mean of samples is mu_0, false failures with probability max_p_val.

Perform the two-tailed t-test and Assert that p-val is smaller then given value. :param mu_0: Exact mean. :param samples: Samples to test. :param max_p_val: Probability of failed t-test for correct samples.

Module contents¶

Contains classes that provide an interface to other resources such as HDF5, Gmsh, PBS, …