mlmc.tool

Contains classes that provide an interface to other resources such as HDF5, Gmsh, PBS, …

Submodules

mlmc.tool.distribution module

class mlmc.tool.distribution.Distribution(moments_obj, moment_data, domain=None, force_decay=(True, True), monitor=False)[source]

Bases: object

Calculation of the distribution

estimate_density_minimize(tol=1e-05, reg_param=0.01)[source]

Optimize density estimation :type tol: :param tol: Tolerance for the nonlinear system residual, after division by std errors for individual moment means, i.e. res = || (F_i - mu_i) / sigma_i ||_2 :return: None

estimate_density(tol=None)[source]

Run nonlinear iterative solver to estimate density, use previous solution as initial guess. Faster, but worse stability. :return: None

density(value, moments_fn=None)[source]
Parameters:
  • value – float or np.array

  • moments_fn – counting moments function

Returns:

density for passed value

cdf(values)[source]
extend_size(new_size)[source]
eval_moments(x)[source]
end_point_derivatives()[source]

Compute approximation of moment derivatives at endpoints of the domain. :return: array (2, n_moments)

mlmc.tool.distribution.compute_exact_moments(moments_fn, density, tol=0.0001)[source]

Compute approximation of moments using exact density. :type moments_fn: :param moments_fn: Moments function. :param n_moments: Number of mements to compute. :type density: :param density: Density function (must accept np vectors). :param a, b: Integral bounds, approximate integration over R. :type tol: :param tol: Tolerance of integration. :return: np.array, moment values

mlmc.tool.distribution.KL_divergence(prior_density, posterior_density, a, b)[source]

Compute D_KL(P | Q) = int_R P(x) log( P(X)/Q(x)) dx :type prior_density: :param prior_density: P :type posterior_density: :param posterior_density: Q :return: KL divergence value

mlmc.tool.distribution.L2_distance(prior_density, posterior_density, a, b)[source]

mlmc.tool.flow_mc module

mlmc.tool.flow_mc.create_corr_field(model='gauss', corr_length=0.125, dim=2, log=True, sigma=1, mode_no=1000)[source]

Create correlated random-field provider (cf.Fields) according to selected backend.

Parameters:
  • model – One of ‘fourier’, ‘svd’, ‘exp’, ‘TPLgauss’, ‘TPLexp’, ‘TPLStable’, or others (defaults to ‘gauss’).

  • corr_length – Correlation length (used by GSTools or SVD implementations).

  • dim – Spatial dimension of the field (1, 2 or 3).

  • log – If True, generate log-normal field (exponentiate underlying Gaussian field).

  • sigma – Standard deviation for the generated field.

  • mode_no – Number of Fourier modes

Returns:

cf.Fields instance that can generate random field samples.

mlmc.tool.flow_mc.substitute_placeholders(file_in, file_out, params)[source]

Replace placeholders of form ‘<name>’ in a template file with corresponding values.

Parameters:
  • file_in – Path to the template file containing placeholders.

  • file_out – Path where the substituted output will be written.

  • params – Dictionary mapping placeholder names to replacement values, e.g. {‘mesh_file’: ‘mesh.msh’}.

Returns:

List of parameter names that were actually used (replaced) in the template.

mlmc.tool.flow_mc.force_mkdir(path, force=False)[source]

Create directory tree; optionally remove existing leaf directory first.

Parameters:
  • path – Directory path to create (parents created as needed).

  • force – If True and the directory already exists, remove it (recursively) before creating.

Returns:

None

class mlmc.tool.flow_mc.FlowSim(config=None, clean=None)[source]

Bases: Simulation

total_sim_id = 0
MESH_FILE_VAR = 'mesh_file'
TIMESTEP_H1_VAR = 'timestep_h1'
TIMESTEP_H2_VAR = 'timestep_h2'
GEO_FILE = 'mesh.geo'
MESH_FILE = 'mesh.msh'
YAML_TEMPLATE = 'flow_input.yaml.tmpl'
YAML_FILE = 'flow_input.yaml'
FIELDS_FILE = 'fields_sample.msh'
level_instance(fine_level_params, coarse_level_params)[source]

Create a LevelSimulation object for given fine/coarse steps.

This method is called in the main process (Sampler) and must prepare common files (mesh, YAML) for that level. The returned LevelSimulation is serialized and sent to PBS jobs (PbsJob) for actual execution.

Parameters:
  • fine_level_params (List[float]) – list with single element [fine_step] (mesh step)

  • coarse_level_params (List[float]) – list with single element [coarse_step] (mesh step) or [0] for one-level MC

Return type:

LevelSimulation

Returns:

LevelSimulation configured with task size and calculate method

static calculate(config, seed)[source]

Execute one MLMC sample calculation (fine and optional coarse) inside PBS job.

Parameters:
  • config – Configuration dict from LevelSimulation.config_dict (contains common_files dirs, steps, fields params)

  • seed – Random seed for the sample generation (derived from sample id)

Returns:

Tuple (fine_result_array, coarse_result_array), both numpy arrays (coarse may be zeros for one-level MC)

static make_fields(fields, fine_mesh_data, coarse_mesh_data)[source]

Assign evaluation points to fields and return the Fields object prepared for sampling.

Parameters:
  • fields – correlated_field.Fields instance (with local field definitions)

  • fine_mesh_data – Dict returned by extract_mesh() for the fine mesh

  • coarse_mesh_data – Dict returned by extract_mesh() for the coarse mesh (or None for one-level)

Returns:

the same cf.Fields object with points set for sampling

static generate_random_sample(fields, coarse_step, n_fine_elements)[source]

Generate random field samples for the fine and (optionally) coarse meshes.

Parameters:
  • fields – cf.Fields object (already configured with points)

  • coarse_step – coarse-level step (0 for no coarse sample)

  • n_fine_elements – Number of elements that belong to fine mesh (used to split combined sample)

Returns:

Tuple (fine_input_sample: dict, coarse_input_sample: dict) Each dict maps field name -> array shaped (n_elements, 1).

static extract_mesh(mesh_file)[source]

Parse a Gmsh mesh file and extract points (element centers), element ids and region mapping.

Parameters:

mesh_file – Path to .msh file to parse (Gmsh 2/4 depending on GmshIO implementation)

Returns:

Dict with keys: - ‘points’: np.ndarray of shape (n_elements, dim) with element center coordinates - ‘point_region_ids’: np.ndarray of region id per element - ‘ele_ids’: np.ndarray of original element ids - ‘region_map’: dict mapping region name -> region id

static result_format()[source]

Describe the simulation output format as a list of QuantitySpec objects.

Return type:

List[QuantitySpec]

Returns:

List[QuantitySpec] describing each output quantity (name, unit, shape, times, locations)

mlmc.tool.gmsh_io module

Module containing an expanded python gmsh class

class mlmc.tool.gmsh_io.GmshIO(filename=None)[source]

Bases: object

This is a class for storing nodes and elements. Based on Gmsh.py

Members: nodes – A dict of the form { nodeID: [ xcoord, ycoord, zcoord] } elements – A dict of the form { elemID: (type, [tags], [nodeIDs]) } physical – A dict of the form { name: (id, dim) }

Methods: read([file]) – Parse a Gmsh version 1.0 or 2.0 mesh file write_ascii([file]) – Output a Gmsh version 2.0 mesh file (ASCII) write_binary([file]) – Output a Gmsh version 2.0 mesh file (binary) write_element_data(f, ele_ids, name, values) – write $ElementData block write_fields(msh_file, ele_ids, fields) – convenience to write several ElementData blocks

reset()[source]

Reinitialise internal storage.

Clears nodes, elements, physical names and element_data dictionaries.

Returns:

None

read_element_data_head(mshfile)[source]

Read header of a $ElementData block from an open mshfile.

The method expects the lines after ‘$ElementData’ to match the conventional Gmsh textual ElementData header layout:

<nstringtags> “<field name>” <nrealstags> <time> <ninttags> <time_index> <ncomponents> <nentries>

Parameters:

mshfile – Open file-like object positioned after the ‘$ElementData’ line.

Returns:

tuple (field, time, t_idx, n_comp, n_elem) - field: string field name - time: float time tag - t_idx: integer time index - n_comp: number of components per element - n_elem: number of element entries following header

read(mshfile=None)[source]

Read a Gmsh .msh file.

Supports parsing textual (ASCII) Gmsh files with sections like:
  • $MeshFormat

  • $Nodes / $NOD

  • $Elements / $ELM

  • $PhysicalNames

  • $ElementData

Parsed data is stored in the instance attributes:
  • self.nodes: dict nodeID -> [x, y, z]

  • self.elements: dict elemID -> (type, tags_list, nodeIDs_list)

  • self.physical: dict name -> (id, dim)

  • self.element_data: dict field_name -> { time_idx: (time, {elemID: component_list}) }

Parameters:

mshfile – Optional open file-like object or path string; if None uses filename passed to __init__.

Returns:

None

write_ascii(mshfile=None)[source]

Dump the mesh out to a Gmsh 2.0 (textual) msh file.

Writes $MeshFormat, $PhysicalNames, $Nodes and $Elements sections according to the current contents of self.physical, self.nodes and self.elements.

Parameters:

mshfile – Optional open file or filename; if None uses self.filename opened for writing.

Returns:

None

write_binary(filename=None)[source]

Dump the mesh out to a Gmsh 2.0 msh file in binary format.

Note: this implementation mirrors the ASCII writer’s structure but writes binary packed integers/doubles. This method attempts to follow the Gmsh 2.2 binary formatting conventions.

Parameters:

filename – Path to write binary .msh file; if None, uses self.filename.

Returns:

None

write_element_data(f, ele_ids, name, values)[source]

Write a single $ElementData block for a field to an open file stream.

The function writes a minimal textual $ElementData header and then one row per element with element ID followed by component values.

Parameters:
  • f – Open file-like object opened for writing.

  • ele_ids – Iterable of element ids corresponding to the rows in ‘values’.

  • name – String name of the field (will be written as the ElementData field name).

  • values – numpy array of shape (N, L) where N == len(ele_ids) and L is components per element.

Returns:

None

write_fields(msh_file, ele_ids, fields)[source]

Create an MSH file that contains $ElementData blocks for the provided fields.

This is a convenience writer used to generate field input files for models (Flow123d). It writes a MeshFormat header and then for each field calls write_element_data.

Parameters:
  • msh_file – Path to output MSH file (string). If falsy, uses self.filename when available.

  • ele_ids – Iterable of element ids (order must match field value ordering).

  • fields – Dict mapping field name -> array-like values (one row per element).

Returns:

None

mlmc.tool.hdf5 module

class mlmc.tool.hdf5.HDF5(file_path, load_from_file=False)[source]

Bases: object

HDF5 file is organized into groups (h5py.Group objects) which is somewhat like dictionaries in python terminology - ‘keys’ are names of group members ‘values’ are members (groups (h5py.Group objects) and datasets (h5py.Dataset objects - similar to NumPy arrays)). Each group and dataset (including root group) can store metadata in ‘attributes’ (h5py.AttributeManager objects) HDF5 files (h5py.File) work generally like standard Python file objects

Our HDF5 file strucutre:

Main Group: Keys:

Levels: h5py.Group
Attributes:

level_parameters: [[a], [b], [], …]

Keys:
<N>: h5py.Group (N - level id, start with 0)
Attributes:

id: str n_ops_estimate: float

Keys:
scheduled: h5py.Dataset

dtype: S100 shape: (N,), N - number of scheduled values maxshape: (None,) chunks: True

collected_values: h5py.Dataset

dtype: numpy.float64 shape: (Nc, 2, M) dtype structure is defined in simulation class maxshape: (None, 2, None) chunks: True

collected_ids: h5py.Dataset

dtype: numpy.int16 index into scheduled shape: (Nc, 1) maxshape: (None, 1) chunks: True

failed: h5py.Dataset

dtype: (‘S100’, ‘S1000’) shape: (Nf, 1) mashape: (None, 1) chunks: True

create_file_structure(level_parameters)[source]

Create top-level HDF5 structure for MLMC results or load existing one.

Parameters:

level_parameters – List[float] of level parameters to store in root attributes when initializing new file.

Returns:

None

load_from_file()[source]

Load root group attributes from an existing HDF5 file and set them as instance attributes.

Raises an Exception if required attributes (like ‘level_parameters’) are not present.

Returns:

None

clear_groups()[source]

Remove all top-level groups/datasets from the HDF5 file.

Useful when reinitializing a new MLMC run into an existing file.

Returns:

None

init_header(level_parameters)[source]

Initialize root attributes and create the top-level ‘Levels’ group.

Parameters:

level_parameters – Iterable of level parameters to store in root attributes.

Returns:

None

add_level_group(level_id)[source]

Create (if necessary) and return a LevelGroup wrapper for a particular level.

Parameters:

level_id – str, mlmc.Level identifier (e.g. ‘0’, ‘1’, …)

Returns:

LevelGroup instance bound to the ‘/Levels/{level_id}’ HDF5 group

property result_format_dset_name

Dataset name used to store the simulation result format (QuantitySpec array).

Returns:

str dataset name

save_result_format(result_format, res_dtype)[source]

Save simulation result format into a structured dataset.

The result_format is a list of QuantitySpec objects; res_dtype is a NumPy structured dtype describing how to store the QuantitySpec attributes in the dataset.

Parameters:
  • result_format – List[QuantitySpec] (objects describing output fields)

  • res_dtype – numpy.dtype used for the dataset storage of a single QuantitySpec

Returns:

None

load_result_format()[source]

Load the saved result_format dataset and return it as a NumPy array.

Returns:

numpy.ndarray containing the stored result_format structured records

Raises:

AttributeError – if the dataset is not present

load_level_parameters()[source]

Read level_parameters from the HDF5 file root attributes.

Returns:

value of ‘level_parameters’ attribute or empty list if not present

class mlmc.tool.hdf5.LevelGroup(file_name, hdf_group_path, level_id, loaded_from_file=False)[source]

Bases: object

Helper class to manipulate per-level HDF5 group contents.

It provides convenience methods to append scheduled samples, collected results, failed entries and to iterate over collected data in chunks.

SCHEDULED_DTYPE = {'formats': ['S100'], 'names': ['sample_id']}
FAILED_DTYPE = {'formats': ('S100', 'S1000'), 'names': ('sample_id', 'message')}
COLLECTED_ATTRS = {'sample_id': {'default_shape': (0,), 'dtype': {'formats': ['S100'], 'names': ['sample_id']}, 'maxshape': (None,), 'name': 'collected_ids'}}
property collected_ids_dset

Name of dataset storing collected ids.

Returns:

str

property scheduled_dset

Name of dataset storing scheduled sample ids.

Returns:

str

property failed_dset

Name of dataset storing failed sample rows.

Returns:

str

append_scheduled(scheduled_samples)[source]

Append scheduled sample ids to the scheduled dataset.

Parameters:

scheduled_samples – iterable of sample-id strings (or bytes-like)

Returns:

None

append_successful(samples)[source]

Append successful (collected) samples.

The samples array is expected to have rows of the form [sample_id, result_value]. The method appends sample ids to ‘collected_ids’ and result values to ‘collected_values’.

Parameters:

samples (array) – numpy.ndarray where each row is [sample_id, value], value may be array-like itself.

Returns:

None

append_failed(failed_samples)[source]

Append failed sample rows to the failed dataset.

Parameters:

failed_samples – iterable of failed sample descriptors (e.g. tuples (sample_id, message))

Returns:

None

scheduled()[source]

Read and return the scheduled dataset contents.

Returns:

numpy.ndarray of scheduled entries (structured dtype)

chunks(n_samples=None)[source]

Iterate over collected_values dataset chunks and yield ChunkSpec descriptors.

Parameters:

n_samples – If provided, yield a single ChunkSpec from 0..n_samples instead of iterating actual chunks.

Yield:

ChunkSpec(chunk_id, chunk_slice, level_id)

collected(chunk_slice)[source]

Read a slice (chunk) from the collected_values dataset.

Parameters:

chunk_slice – slice object describing which rows to read

Returns:

numpy.ndarray with the chunk rows or None if dataset missing

collected_n_items()[source]

Return the number of collected items (rows) stored for this level.

Returns:

int number of collected rows

get_finished_ids()[source]

Return concatenated list of successful and failed sample ids for this level.

Returns:

numpy.ndarray of sample id strings (successful then failed)

get_unfinished_ids()[source]

Compute unfinished sample ids = scheduled_ids finished_ids.

Returns:

list of unfinished sample id strings

get_failed_ids()[source]

Get list of failed sample ids for this level.

Returns:

list of failed sample id strings

clear_failed_dataset()[source]

Remove and recreate the failed dataset (clears all failure records).

Returns:

None

property n_ops_estimate

Number of operations estimate stored as a group attribute.

Returns:

float or object stored under ‘n_ops_estimate’ attribute (if present)

mlmc.tool.pbs_job module

class mlmc.tool.pbs_job.PbsJob(output_dir, jobs_dir, job_id, level_sim_file, debug)[source]

Bases: object

SCHEDULED = '{}_scheduled.yaml'
SUCCESSFUL_RESULTS = '{}_successful_results.yaml'
FAILED_RESULTS = '{}_failed_results.yaml'
TIME = '{}_times.yaml'
PBS_ID = '{}_'
CLASS_FILE = 'pbs_process_serialized.txt'
SAMPLE_ID_JOB_ID = 'sample_id_job_id.json'
classmethod create_job(output_dir, jobs_dir, job_id, level_sim_file, debug)[source]

Create and serialize a PbsJob descriptor for a PBS process to later deserialize.

The created descriptor (CLASS_FILE) is written under output_dir for the PBS worker.

Parameters:
  • output_dir – str

  • jobs_dir – str

  • job_id – str

  • level_sim_file – str, format of LevelSimulation serialization filenames

  • debug – bool

Returns:

PbsJob instance

classmethod create_process()[source]

Create PbsJob instance inside PBS worker process.

The worker expects command-line arguments (see command_params) and a serialized CLASS_FILE in output_dir describing jobs_dir and level_sim_file format.

Returns:

PbsJob instance

static command_params()[source]

Parse PBS worker command-line parameters. Called inside worker process.

Expects sys.argv[1] = output_dir, sys.argv[2] = job_id

Returns:

tuple (job_id: str, output_dir: str)

calculate_samples()[source]

Main worker routine: calculate each scheduled sample, move produced files, and record success/failure.

This method:
  • reads the scheduled list,

  • deserializes LevelSimulation objects on demand,

  • calls SamplingPool.calculate_sample for each scheduled sample,

  • moves successful/failed artifacts,

  • writes partial results to YAML files (successful, failed, times).

Returns:

None

save_sample_id_job_id(job_id, sample_ids)[source]

Save mapping of sample ids to this job_id so other tools can query which job handled a sample.

Parameters:
  • job_id – str

  • sample_ids – iterable of sample-identifiers (each sample_id is usually a tuple or list, code expects sample_id[1])

Returns:

None

static job_id_from_sample_id(sample_id, jobs_dir)[source]

Lookup job id that processed a given sample id.

Parameters:
  • sample_id – str sample identifier

  • jobs_dir – path to jobs directory where SAMPLE_ID_JOB_ID file is stored

Returns:

str job id associated with sample_id

static read_results(job_id, jobs_dir)[source]

Read and aggregate results produced by a PBS job into dictionaries.

The function reads SUCCESSFUL_RESULTS, FAILED_RESULTS and TIME YAML files (if present) and returns aggregated dicts keyed by level_id.

Parameters:
  • job_id – str

  • jobs_dir – path to directory containing job result YAML files

Returns:

tuple (successful_dict, failed_dict, time_dict) where: - successful_dict[level_id] = [(sample_id, result), …] - failed_dict[level_id] = [(sample_id, error_message), …] - time_dict[level_id] = [(n_samples, cumulative_time), …]

static get_scheduled_sample_ids(job_id, jobs_dir)[source]

Read the scheduled YAML file and return the list of scheduled (level_id, sample_id, seed) tuples.

Parameters:
  • job_id – str

  • jobs_dir – str

Returns:

list of tuples (level_id, sample_id, seed)

write_pbs_id(pbs_job_id)[source]

Write an empty file whose filename encodes the mapping from our internal job id to the external PBS job id.

Parameters:

pbs_job_id – str (external PBS job identifier)

Returns:

None

save_scheduled(scheduled)[source]

Store scheduled samples list into the jobs folder.

Parameters:

scheduled – list of tuples (level_id, sample_id, seed) or similar structure

Returns:

None

static get_job_n_running(job_id, jobs_dir)[source]

Return number of scheduled samples for a job (length of scheduled list file).

Parameters:
  • job_id – str

  • jobs_dir – str path to jobs directory

Returns:

int count of scheduled entries

mlmc.tool.process_base module

class mlmc.tool.process_base.ProcessBase[source]

Bases: object

Parent class for particular simulation processes. Subclasses should implement setup_config.

static get_arguments(arguments)[source]

Parse command-line arguments.

Parameters:

arguments – list of arguments (typically sys.argv[1:])

Returns:

argparse.Namespace with parsed arguments: - command: one of [‘run’, ‘collect’, ‘renew’, ‘process’] - work_dir: str path - clean: bool - debug: bool

run(renew=True)[source]

High-level entry point to run the MLMC workflow.

Creates the working directory, sets up MLMC configurations for a set of level counts (currently hard-coded to [1]) and schedules/generates jobs. After job creation it triggers collection of results via all_collect.

Parameters:

renew – bool, if True indicates renewing failed samples (passed down to setup_config in some subclasses)

Returns:

None

set_environment_variables()[source]

Determine environment-dependent configuration values (PBS config, executables, timeouts).

The method inspects the work_dir path to decide whether it runs on a cluster or locally and sets attributes used later (pbs_config, sample_sleep, init_sample_timeout, sample_timeout, flow123d, gmsh).

Returns:

None

setup_config(n_levels, clean)[source]

Set simulation configuration depending on particular task.

Subclasses must override this method and return a configured mlmc.MLMC object.

Parameters:
  • n_levels – int, number of MLMC levels

  • clean – bool, whether to clean/create new files or use existing ones

Returns:

mlmc.MLMC instance (implementation dependent)

Raises:

NotImplementedError – always in base class

rm_files(output_dir)[source]

Remove (recursively) output_dir and create an empty directory in its place.

Parameters:

output_dir – str path to remove and recreate

Returns:

None

create_pbs_object(output_dir, clean)[source]

Initialize PBS helper object for submitting/executing jobs.

This creates self.pbs_obj and configures it with common PBS settings.

Parameters:
  • output_dir – str, directory where PBS scripts and job state will be created

  • clean – bool, if True remove existing scripts before creating new ones

Returns:

None

generate_jobs(mlmc, n_samples=None)[source]

Prepare and kick off sampling jobs for the provided MLMC object.

The method optionally sets the initial n_samples (if provided), refills the sampler queues and triggers the PBS object execution. It then waits for simulations to finish.

Parameters:
  • mlmc – mlmc.MLMC instance

  • n_samples – None or list specifying number of samples to request for each level

Returns:

None

set_moments(n_moments, log=False)[source]

Create and store a moments function instance (Legendre polynomial family).

Parameters:
  • n_moments – int, number of moments

  • log – bool, whether to apply log-transform to quantity prior to moment evaluation

Returns:

Legendre moments instance

n_sample_estimate(mlmc, target_variance=0.001)[source]

Heuristic routine to estimate a good number of initial samples for MLMC using target variance.

It triggers an initial sampling run, estimates the domain, constructs moments, and requests additional samples using mlmc.target_var_adding_samples.

Parameters:
  • mlmc – mlmc.MLMC instance

  • target_variance – float target variance for moment estimates

Returns:

None

all_collect(sampler_list)[source]

Poll samplers to collect running samples until none are left.

Repeatedly asks each sampler for the number of running jobs and keeps polling until all complete.

Parameters:

sampler_list – list of sampler-like objects providing ask_sampling_pool_for_samples(sleep, timeout)

Returns:

None

process_analysis(cl)[source]

Top-level analysis entry point. Calls specific analysis routines (many commented out).

Parameters:

cl – CompareLevels instance (or equivalent) holding estimation/collected data

Returns:

None

analyze_pdf_approx(cl)[source]

Perform PDF approximation experiments and plotting.

Parameters:

cl – CompareLevels instance

Returns:

None

analyze_regression_of_variance(cl, mlmc_level)[source]

Analyze regression of variance for a selected level.

Parameters:
  • cl – CompareLevels instance

  • mlmc_level – int index of method/level to analyze

Returns:

None

analyze_error_of_variance(cl, mlmc_level)[source]

Analyze error of variance estimators and plot related diagnostics.

Parameters:
  • cl – CompareLevels instance

  • mlmc_level – int index of method/level to analyze

Returns:

None

analyze_error_of_regression_variance(cl, mlmc_level)[source]

Bootstrap-based analysis of regression variance errors.

Parameters:
  • cl – CompareLevels instance

  • mlmc_level – int index of method/level to analyze

Returns:

None

analyze_error_of_level_variances(cl, mlmc_level)[source]

Analyze errors in per-level variance estimates and plot results.

Parameters:
  • cl – CompareLevels instance

  • mlmc_level – int index of method/level to analyze

Returns:

None

analyze_error_of_regression_level_variances(cl, mlmc_level)[source]

Analyze combined regression and level variance errors with bootstrap.

Parameters:
  • cl – CompareLevels instance

  • mlmc_level – int index of method/level to analyze

Returns:

None

analyze_error_of_log_variance(cl, mlmc_level)[source]

Analyze bootstrap error of log-variance estimates.

Parameters:
  • cl – CompareLevels instance

  • mlmc_level – int index of method/level to analyze

Returns:

None

mlmc.tool.simple_distribution module

class mlmc.tool.simple_distribution.SimpleDistribution(moments_obj, moment_data, domain=None, force_decay=(True, True), verbose=False)[source]

Bases: object

Approximate a probability density function (PDF) from given moments.

The class constructs a parametric PDF using Lagrange multipliers and fits those multipliers by minimizing a functional (or solving a root problem). Numerical integration and adaptive quadrature are used to compute moments, gradients and Jacobians required by the optimizer.

estimate_density_minimize(tol=1e-05, reg_param=0.01)[source]

Estimate multipliers by minimizing the dual functional.

Uses scipy.optimize.minimize (trust-ncg by default) to minimize the functional _calculate_functional(multipliers). The function sets up quadrature and internal tolerances before solving, and updates the multipliers attribute with the optimizer result.

Parameters:
  • tol – Optimization tolerance (used for jacobian/grad stopping).

  • reg_param – Regularization parameter (not used directly here but kept for API parity).

Returns:

scipy OptimizeResult with fields: - x: optimized multipliers - success: bool convergence flag (set to True if solver succeeded or residual < tol) - nit: number of iterations (at least 1) - fun_norm: norm of gradient at solution - eigvals: eigenvalues of computed Jacobian (added by this method) - solver_res: raw solver residual information (copy of result.jac)

Notes:

  • After optimization the code enforces normalization by subtracting log(moment_0) from multipliers[0] so that the integral of the density is consistent with moment_0.

density(value)[source]

Evaluate the approximated density at the given point(s).

Parameters:

value – scalar or numpy array of points

Returns:

numpy array of density values (same shape as flattened input)

Notes:

  • Uses self.multipliers, self._moment_errs and the basis moments returned by eval_moments() to build exponent power = sum_i multipliers_i * moment_i / err_i.

  • The result is clipped (power limited to [-200, 200]) to avoid overflow.

cdf(values)[source]

Evaluate the cumulative distribution function (CDF) at the given points.

Parameters:

values – scalar or array-like points

Returns:

numpy array of CDF values corresponding to input points

Notes:

  • The method integrates the density piecewise between successive query points using fixed_quad with n=10 on subintervals determined by the adaptive quadrature info.

  • Values outside domain are mapped to 0 (left) or 1 (right).

eval_moments(x)[source]

Evaluate all basis moment functions at x up to current approximation size.

Parameters:

x – scalar or array-like points

Returns:

numpy.ndarray of shape (n_points, approx_size) or similar depending on moments_fn.eval_all

end_point_derivatives()[source]

Approximate derivatives of all moment basis functions at domain endpoints.

Returns:

numpy array of shape (2, approx_size) where index 0 = left derivative, index 1 = right derivative. The derivatives are scaled by moment errors.

Notes:

  • Uses a tiny eps shift (1e-10) to compute forward/backward differences.

  • If corresponding decay_penalty is False for a side, that side’s derivative is left as zeros.

mlmc.tool.simple_distribution.compute_exact_moments(moments_fn, density, tol=1e-10)[source]

Compute moments by integrating moments_fn against provided density.

Parameters:
  • moments_fn – object with .domain and .size and .eval(i, x)

  • density – callable accepting numpy arrays (vectorized) returning density values

  • tol – absolute tolerance for numerical integration

Returns:

numpy array of length moments_fn.size containing integrated moments

mlmc.tool.simple_distribution.compute_semiexact_moments(moments_fn, density, tol=1e-10)[source]

Compute moments using a hybrid approach: use adaptive quad to identify subintervals then apply Gauss-Legendre nodes inside those subintervals for an accurate quadrature.

Parameters:
  • moments_fn – moments object with .domain and .size and .eval_all

  • density – callable density(x)

  • tol – quad tolerance

Returns:

vector of integrated moments (length = moments_fn.size)

mlmc.tool.simple_distribution.compute_exact_cov(moments_fn, density, tol=1e-10)[source]

Compute covariance matrix of moment basis under the provided density.

Parameters:
  • moments_fn – moments object

  • density – callable density(x)

  • tol – integration tolerance

Returns:

symmetric matrix (size x size) containing E[m_i * m_j]

mlmc.tool.simple_distribution.compute_semiexact_cov(moments_fn, density, tol=1e-10)[source]

Compute approximate covariance matrix using quadrature nodes determined by adaptive integration.

Parameters:
  • moments_fn – moments object

  • density – callable density(x)

  • tol – integration tolerance

Returns:

Jacobian-like matrix approximating covariance (moments weighted by density)

mlmc.tool.simple_distribution.KL_divergence(prior_density, posterior_density, a, b)[source]

Compute Kullback-Leibler divergence between two densities over [a,b].

Using numerically stable integrand:

integrand = p * log(p/q) - p + q

which equals D_KL(P||Q) when both integrate to 1 but remains finite even when Q is not perfectly normalized.

Parameters:
  • prior_density – callable P(x)

  • posterior_density – callable Q(x)

  • a – left integration bound

  • b – right integration bound

Returns:

scalar KL divergence (floored at 1e-10)

mlmc.tool.simple_distribution.L2_distance(prior_density, posterior_density, a, b)[source]

Compute L2 distance between two densities on [a, b].

Parameters:
  • prior_density – callable P(x)

  • posterior_density – callable Q(x)

  • a – left bound

  • b – right bound

Returns:

scalar L2 norm: sqrt( integral (Q-P)^2 )

mlmc.tool.simple_distribution.best_fit_all(values, range_a, range_b)[source]

Find the best linear fit across all given index ranges.

The function searches all combinations of indices a and b within range_a and range_b such that a < b and fits a first-degree polynomial to the values between these indices. It then selects the fit with the smallest residual normalized by the square of the interval length.

Parameters:
  • values – Array-like sequence of values to fit.

  • range_a – Iterable of possible starting indices.

  • range_b – Iterable of possible ending indices.

Returns:

Tuple (a, b, fit) corresponding to the best fit, where fit is the array of polynomial coefficients.

mlmc.tool.simple_distribution.best_p1_fit(values)[source]

Recursively find the best linear (P1) fit segment of the sequence.

The method finds indices a < b such that the segment values[a:b] has the smallest residual (least-squares error) normalized by (b - a)**2. If the array is large, it downsamples the data before recursively fitting.

Parameters:

values – Sequence of numeric values to fit.

Returns:

Tuple (a, b, fit) representing the best segment and corresponding linear coefficients.

mlmc.tool.simple_distribution.detect_treshold_slope_change(values, log=True)[source]

Detect the index where the slope of a sequence changes significantly.

This function fits linear segments to the data (optionally in log scale) and detects where the slope begins to deviate, returning both the threshold index and a modified version of the input values where the slope change is extrapolated.

Parameters:
  • values – Monotonically increasing numeric sequence.

  • log – If True, the logarithm of the sequence is used for fitting.

Returns:

Tuple (i_treshold, mod_vals) - i_treshold: Index where the slope change is detected. - mod_vals: Modified version of values with extrapolated segment.

mlmc.tool.simple_distribution.lsq_reconstruct(cov, eval, evec, treshold)[source]

Perform least-squares reconstruction of the eigenvectors of a covariance matrix to restore orthogonality.

This method adjusts the eigenvectors using nonlinear least-squares minimization so that the reconstructed eigenvectors are orthogonal and diagonalize the covariance matrix as closely as possible.

Parameters:
  • cov – Covariance matrix (2D array).

  • eval – Eigenvalues (1D array).

  • evec – Eigenvectors (2D array).

  • treshold – Number of eigenvectors to fix (use exact values up to this index).

Returns:

Reconstructed orthogonal eigenvector matrix Q.

mlmc.tool.simple_distribution.construct_ortogonal_moments(moments, cov, tol=None)[source]

Construct orthogonal statistical moments with respect to the covariance matrix.

This function computes a transformation that makes the given moments orthogonal under the provided covariance matrix. It determines the threshold for significant eigenvalues (either via slope detection or tolerance) and constructs a transformation matrix accordingly.

Parameters:
  • moments – Input moments object.

  • cov – Covariance matrix estimated from samples.

  • tol – Optional eigenvalue threshold. If None, an automatic slope-change detection is used.

Returns:

Tuple (ortogonal_moments, info) - ortogonal_moments: Transformed (orthogonalized) moments. - info: Tuple containing (eval, threshold, transformation_matrix).

mlmc.tool.stats_tests module

mlmc.tool.stats_tests.t_test(mu_0, samples, max_p_val=0.01)[source]

Perform a one-sample two-tailed t-test to check if the sample mean equals mu_0.

This test ensures that false positives (rejecting H0 when true) occur with probability <= max_p_val.

Parameters:
  • mu_0 – Expected mean value of the samples.

  • samples – Array-like of sample values to test.

  • max_p_val – Maximum allowed p-value for false rejection (significance threshold).

Raises:

AssertionError – if the p-value is larger than max_p_val, indicating the mean is statistically equal to mu_0.

mlmc.tool.stats_tests.chi2_test(var_0, samples, max_p_val=0.01, tag='')[source]

Perform a chi-squared test to check if the sample variance equals var_0.

False rejections should occur with probability <= max_p_val.

Parameters:
  • var_0 – Expected variance of the samples.

  • samples – Array-like of sample values.

  • max_p_val – Maximum allowed p-value for false rejection (significance threshold).

  • tag – Optional string tag for printing/debugging.

Raises:

AssertionError – if the p-value indicates variance significantly differs from var_0.

mlmc.tool.stats_tests.anova(level_moments)[source]

Perform one-way ANOVA (analysis of variance) across multiple levels.

Tests the null hypothesis H0: all levels have the same mean.

Parameters:

level_moments – List of arrays, each array containing moments/samples for a level.

Returns:

True if H0 cannot be rejected (means are statistically equal), False if H0 is rejected (at least one mean differs).

Module contents

Contains classes that provide an interface to other resources such as HDF5, Gmsh, PBS, …