statistics package
Submodules
statistics.analysesCombinations module
- class statistics.analysesCombinations.AnaCombLikelihoodComputer(theoryPredictions: list, deltas_rel=None)[source]
Bases:
object
constructor.
- Parameters
theoryPredictions – the List of theory predictions
deltas_rel – relative uncertainty in signal (float). Default value is 20%.
- CLs(mu: float = 1.0, expected: Union[str, bool] = False, return_type: str = 'CLs')[source]
Compute the exclusion confidence level of the model
- Parameters
mu – compute for the parameter of interest mu
expected – if false, compute observed, true: compute a priori expected
return_type – (Text) can be “CLs-alpha”, “1-CLs”, “CLs” CLs-alpha: returns CLs - 0.05 1-CLs: returns 1-CLs value CLs: returns CLs value
- getCLsRootFunc(expected: bool = False, allowNegativeSignals: bool = False) Tuple[float, float, Callable] [source]
Obtain the function “CLs-alpha[0.05]” whose root defines the upper limit, plus mu_hat and sigma_mu
- Parameters
expected – if True, compute expected likelihood, else observed
- getLlhds(muvals, expected=False, normalize=True)[source]
Compute the likelihoods for the individual analyses and the combined likelihood. Returns a dictionary with the analysis IDs as keys and the likelihood values as values.
- Parameters
muvals – List with values for the signal strenth for which the likelihoods must be evaluated.
expected – If True returns the expected likelihood values.
normalize – If True normalizes the likelihood by its integral over muvals.
- getUpperLimitOnMu(expected=False, allowNegativeSignals=False)[source]
get upper limit on signal strength multiplier, i.e. value for mu for which CLs = 0.95
- Parameters
expected – if True, compute expected likelihood, else observed
- Returns
upper limit on signal strength multiplier mu
- getUpperLimitOnSigmaTimesEff(expected=False, allowNegativeSignals=False)[source]
- upper limit on the fiducial cross section sigma times efficiency,
summed over all signal regions, i.e. sum_i xsec^prod_i eff_i obtained from the defined Data (using the signal prediction for each signal region/dataset), by using the q_mu test statistic from the CCGV paper (arXiv:1007.1727).
- Params expected
if false, compute observed, true: compute a priori expected, “posteriori”: compute a posteriori expected
- Returns
upper limit on fiducial cross section
- likelihood(mu: float = 1.0, expected: Union[bool, str] = False, return_nll: bool = False, useCached: bool = True) float [source]
Compute the likelihood at a given mu
- Parameters
mu – signal strength
expected – if True, compute expected likelihood, else observed
return_nll – if True, return negative log likelihood, else likelihood
useCached – if True, will use the cached values from the theoryPrediction objects (if available)
- lmax(allowNegativeSignals: bool = False, expected: Union[bool, str] = False, return_nll: bool = False) Optional[Dict] [source]
find muhat and lmax.
- Parameters
allowNegativeSignals – if true, then also allow for negative values
expected – if true, compute expected prior (=lsm), if “posteriori” compute posteriori expected
return_nll – if true, return negative log max likelihood instead of lmax
- Returns
mu_hat, i.e. the maximum likelihood estimate of mu, if extended output is requested, it returns a dictionary with mu_hat, sigma_mu – the standard deviation around mu_hat, and lmax, i.e. the likelihood at mu_hat
statistics.basicStats module
- statistics.basicStats.CLsfromNLL(nllA: float, nll0A: float, nll: float, nll0: float, return_type: str = 'CLs-alpha') float [source]
compute the CLs - alpha from the NLLs TODO: following needs explanation
- Parameters
nllA – negative log likelihood for Asimov data
nll0A – negative log likelihood at muhat for Asimov data
nll – negative log likelihood
nll0 – negative log likelihood at muhat
return_type – (Text) can be “CLs-alpha”, “1-CLs”, “CLs” CLs-alpha: returns CLs - 0.05 1-CLs: returns 1-CLs value CLs: returns CLs value
- Returns
Cls-type value, see above
- statistics.basicStats.determineBrentBracket(mu_hat, sigma_mu, rootfinder, allowNegative=True)[source]
find a, b for brent bracketing
- Parameters
mu_hat – mu that maximizes likelihood
sigm_mu – error on mu_hat (not too reliable)
rootfinder – function that finds the root (usually root_func)
allowNegative – if False, then do not allow a or b to become negative
- Returns
the interval a,b
statistics.exceptions module
statistics.pyhfInterface module
- class statistics.pyhfInterface.PyhfData(nsignals: Dict[str, Dict], inputJsons, jsonFiles=None, includeCRs=False, signalUncertainty=None)[source]
Bases:
object
Holds data for use in pyhf :ivar nsignals: signal predictions dictionary of dictionaries, one for each json file, one entry per signal region :ivar inputJsons: list of json instances :ivar jsonFiles: optional list of json files :ivar nWS: number of workspaces = number of json files
- checkConsistency()[source]
Check various inconsistencies of the PyhfData attributes
- Parameters
zeroSignalsFlag – boolean identifying if all SRs of a single json are empty
- getWSInfo()[source]
Getting informations from the json files
- Variables
channelsInfo – list of dictionaries (one dictionary for each json file) containing useful information about the json files - :key signalRegions: list of dictonaries with ‘json path’ and ‘size’ (number of bins) of the ‘signal regions’ channels in the json files - :key otherRegions: list of dictionnaries indicating the path and the name of the control and/or validation region channels
- class statistics.pyhfInterface.PyhfUpperLimitComputer(data, cl=0.95, lumi=None)[source]
Bases:
object
Class that computes the upper limit using the jsons files and signal informations in the ‘data’ instance of ‘PyhfData’
- Parameters
data – instance of ‘PyhfData’ holding the signals information
cl – confdence level at which the upper limit is desired to be computed
- Variables
data – created from data
nsignals – signal predictions list divided into sublists, one for each json file
inputJsons – list of input json files as python json instances
channelsInfo – list of channels information for the json files
zeroSignalsFlag – list boolean flags in case all signals are zero for a specific json
nWS – number of workspaces = number of json files
patches – list of patches to be applied to the inputJsons as python dictionary instances
workspaces – list of workspaces resulting from the patched inputJsons
workspaces_expected – list of patched workspaces with observation yields replaced by the expected ones
cl – created from cl
scale – scale that is applied to the signal predictions, dynamically changes throughout the upper limit calculation
alreadyBeenThere – boolean flag that identifies when nsignals accidentally passes twice at two identical values
- changeChannelName(srInfo)[source]
changes the channel names in the json to match the SModelS name. FIXME this is only a hack for now, should be replaced by a self-respecting dataIdMap in the database itself, that maps the smodels names to the pyhf names explicitly. This method will then go away!
- compute_invhess(x, data, model, index, epsilon=1e-05)[source]
if inv_hess is not given by the optimiser, calculate numerically by evaluating second order partial derivatives using 2 point central finite differences method :param x: parameter values given to pyhf.infer.mle.twice_nll taken from pyhf.infer.mle.fit - optimizer.x (best_fit parameter values) :param data: workspace.data(model) passed to pyhf.infer.mle.fit :param model: model passed to pyhf.infer.mle.fit :param index: index of the POI Note : If len(x) <=5, compute the entire hessian matrix and ind its inverse. Else, compute the hessian at the index of the POI and return its inverse (diagonal approximation) returns the inverse hessian at the index of the poi
- exponentiateNLL(twice_nll, doIt)[source]
if doIt, then compute likelihood from nll, else return nll
- getUpperLimitOnMu(expected=False, workspace_index=None)[source]
- Compute the upper limit on the signal strength modifier with:
by default, the combination of the workspaces contained into self.workspaces
if workspace_index is specified, self.workspace[workspace_index] (useful for computation of the best upper limit)
- Parameters
expected –
if set to ‘True’: uses expected SM backgrounds as signals
else: uses ‘self.nsignals’
workspace_index –
if different from ‘None’: index of the workspace to use for upper limit
else: choose best combo
- Returns
the upper limit at ‘self.cl’ level (0.95 by default)
- getUpperLimitOnSigmaTimesEff(expected=False, workspace_index=None)[source]
- Compute the upper limit on the fiducial cross section sigma times efficiency:
by default, the combination of the workspaces contained into self.workspaces
if workspace_index is specified, self.workspace[workspace_index] (useful for computation of the best upper limit)
- Parameters
expected –
if set to ‘True’: uses expected SM backgrounds as signals
else: uses ‘self.nsignals’
workspace_index –
if different from ‘None’: index of the workspace to use for upper limit
else: choose best combo
- Returns
the upper limit on sigma times eff at ‘self.cl’ level (0.95 by default)
- get_position(name, model)[source]
- Parameters
name – name of the parameter one wants to increase
model – the pyhf model
- Returns
the position of the parameter that has to be modified in order to turn positive the negative total yield
- likelihood(mu=1.0, workspace_index=None, return_nll=False, expected=False)[source]
Returns the value of the likelihood. Inspired by the ‘pyhf.infer.mle’ module but for non-log likelihood
- Parameters
workspace_index – supply index of workspace to use. If None, choose index of best combo
return_nll – if true, return nll, not llhd
expected – if False, compute expected values, if True, compute a priori expected, if “posteriori” compute posteriori expected
- lmax(workspace_index=None, return_nll=False, expected=False, allowNegativeSignals=False)[source]
Returns the negative log max likelihood
- Parameters
return_nll – if true, return nll, not llhd
workspace_index – supply index of workspace to use. If None, choose index of best combo
expected – if False, compute expected values, if True, compute a priori expected, if “posteriori” compute posteriori expected
allowNegativeSignals – if False, then negative nsigs are replaced with 0.
- patchMaker()[source]
Method that creates the list of patches to be applied to the self.inputJsons workspaces, one for each region given the self.nsignals and the informations available in self.channelsInfo and the content of the self.inputJsons NB: It seems we need to include the change of the “modifiers” in the patches as well
- Returns
the list of patches, one for each workspace
- rescale(factor)[source]
Rescales the signal predictions (self.nsignals) and processes again the patches and workspaces
- Returns
updated list of patches and workspaces (self.patches, self.workspaces and self.workspaces_expected)
- rescaleBgYields(init_pars, workspace, model)[source]
Increase initial value of nuisance parameters until the starting value of the total yield (mu*signal + background) is positive
- Parameters
init_pars – list of initial parameters values one wants to increase in order to turn positive the negative total yields
workspace – the pyhf workspace
model – the pyhf model
- Returns
the list of initial parameters values that gives positive total yields
- updateWorkspace(workspace_index=None, expected=False)[source]
Small method used to return the appropriate workspace
- Parameters
workspace_index – the index of the workspace to retrieve from the corresponding list
expected – if False, retuns the unmodified (but patched) workspace. Used for computing observed or aposteriori expected limits. if True, retuns the modified (and patched) workspace, where obs = sum(bkg). Used for computing apriori expected limit.
- wsMaker(apriori=False)[source]
Apply each region patch (self.patches) to his associated json (self.inputJsons) to obtain the complete workspaces
- Parameters
apriori –
If set to ‘True’: Replace the observation data entries of each workspace by the corresponding sum of the expected yields - Else: The observed yields put in the workspace are the ones written in the corresponfing json dictionary
- Returns
the list of patched workspaces
statistics.simplifiedLikelihoods module
- class statistics.simplifiedLikelihoods.Data(observed, backgrounds, covariance, third_moment=None, nsignal=None, name='model', deltas_rel=0.2, lumi=None)[source]
Bases:
object
A very simple observed container to collect all the data needed to fully define a specific statistical model
- Parameters
observed – number of observed events per dataset
backgrounds – expected bg per dataset
covariance – uncertainty in background, as a covariance matrix
nsignal – number of signal events in each dataset
name – give the model a name, just for convenience
deltas_rel – the assumed relative error on the signal hypotheses. The default is 20%.
lumi – luminosity of dataset in 1/fb, or None
- convert(obj)[source]
Convert object to numpy arrays. If object is a float or int, it is converted to a one element array.
- nsignals(mu)[source]
Returns the number of expected signal events, for all datasets, given total signal strength mu.
- Parameters
mu – Total number of signal events summed over all datasets.
- rel_signals(mu)[source]
Returns the number of expected relative signal events, for all datasets, given total signal strength mu. For mu=1, the sum of the numbers = 1.
- Parameters
mu – Total number of signal events summed over all datasets.
- totalCovariance(nsig)[source]
get the total covariance matrix, taking into account also signal uncertainty for the signal hypothesis <nsig>. If nsig is None, the predefined signal hypothesis is taken.
- class statistics.simplifiedLikelihoods.LikelihoodComputer(data)[source]
Bases:
object
- Parameters
data – a Data object.
- chi2()[source]
Computes the chi2 for a given number of observed events nobs given the predicted background nb, error on this background deltab, expected number of signal events nsig and the relative error on signal (deltas_rel).
- Parameters
nsig – number of signal events
- Returns
chi2 (float)
- d2NLLdMu2(mu, theta_hat, allowZeroHessian=True)[source]
the hessian of the likelihood of mu, at mu, which is the Fisher information which is approximately the inverse of the covariance
- Parameters
allowZeroHessian – if false and sum(observed)==0, then replace observed with expected
- d2NLLdTheta2(theta)[source]
the Hessian of nll as a function of the thetas. Makes it easier to find the maximum likelihood.
- dNLLdMu(mu, theta_hat=None)[source]
d (- ln L)/d mu, if L is the likelihood. The function whose root gives us muhat, i.e. the mu that maximizes the likelihood.
- Parameters
mu – total number of signal events
theta_hat – array with nuisance parameters, if None then compute them
- dNLLdTheta(theta)[source]
the derivative of nll as a function of the thetas. Makes it easier to find the maximum likelihood.
- debug_mode = False
- findAvgr(theta_hat)[source]
from the difference observed - background, find got inital values for lower and upper
- findMuHat(allowNegativeSignals=False, extended_output=False, return_nll=False)[source]
Find the most likely signal strength mu via gradient descent given the relative signal strengths in each dataset (signal region).
- Parameters
allowNegativeSignals – if true, then also allow for negative values
extended_output – if true, return also sigma_mu, the estimate of the error of mu_hat, and lmax, the likelihood at mu_hat
return_nll – if true, return nll instead of lmax in the extended output
- Returns
mu_hat, i.e. the maximum likelihood estimate of mu, if extended output is requested, it returns mu_hat, sigma_mu – the standard deviation around mu_hat, and llhd, the likelihood at mu_hat
- findMuHatViaBracketing(allowNegativeSignals=False, extended_output=False, nll=False)[source]
Find the most likely signal strength mu via a brent bracketing technique given the relative signal strengths in each dataset (signal region).
- Parameters
allowNegativeSignals – if true, then also allow for negative values
extended_output – if true, return also sigma_mu, the estimate of the error of mu_hat, and lmax, the likelihood at mu_hat
nll – if true, return nll instead of lmax in the extended output
- Returns
mu_hat, i.e. the maximum likelihood estimate of mu, if extended output is requested, it returns a dictionary with mu_hat, sigma_mu – the standard deviation around mu_hat, and lmax, i.e. the likelihood at mu_hat
- findThetaHat(mu: float)[source]
Compute nuisance parameters theta that maximize our likelihood (poisson*gauss).
- getSigmaMu(mu, theta_hat)[source]
Get an estimate for the standard deviation of mu at <mu>, from the inverse hessian
- getThetaHat(nobs, nb, mu, covb, max_iterations)[source]
Compute nuisance parameter theta that maximizes our likelihood (poisson*gauss) – by setting dNLL/dTheta to zero
- Parameters
mu – signal strength
- Returns
theta_hat
- likelihood(mu: float, return_nll: bool = False)[source]
compute the profiled likelihood for mu.
- Parameters
mu – float Parameter of interest, signal strength
return_nll – if true, return nll instead of likelihood
- Returns
profile likelihood and error code (0=no error)
- llhdOfTheta(theta, nll=True)[source]
likelihood for nuicance parameters theta, given signal strength self.mu. notice, by default it returns nll
- Parameters
theta – nuisance parameters
- Params nll
if True, compute negative log likelihood
- class statistics.simplifiedLikelihoods.UpperLimitComputer(cl: float = 0.95)[source]
Bases:
object
- Parameters
cl – desired quantile for limits
- computeCLs(model: Data, expected: Union[bool, str] = False, trylasttime: bool = False, return_type: str = '1-CLs') float [source]
Compute the exclusion confidence level of the model (1-CLs).
- Parameters
model – statistical model
expected – if false, compute observed, true: compute a priori expected, “posteriori”: compute a posteriori expected
trylasttime – if True, then dont try extra
return_type – (Text) can be “CLs-alpha”, “1-CLs”, “CLs” CLs-alpha: returns CLs - 0.05 (alpha) 1-CLs: returns 1-CLs value CLs: returns CLs value
- debug_mode = False
- getCLsRootFunc(model: Data, expected: Optional[Union[bool, str]] = False, trylasttime: Optional[bool] = False) Tuple [source]
Obtain the function “CLs-alpha[0.05]” whose root defines the upper limit, plus mu_hat and sigma_mu
- Parameters
model – statistical model
expected – false: compute observed, true: compute a priori expected, “posteriori”: compute a posteriori expected
trylasttime – if True, then dont try extra
- Returns
mu_hat, sigma_mu, CLs-alpha
- getUpperLimitOnMu(model, expected=False, trylasttime=False)[source]
upper limit on the signal strength multiplier mu obtained from the defined Data (using the signal prediction for each signal regio/dataset), by using the q_mu test statistic from the CCGV paper (arXiv:1007.1727).
- Params expected
if false, compute observed, true: compute a priori expected, “posteriori”: compute a posteriori expected
- Params trylasttime
if True, then dont try extra
- Returns
upper limit on the signal strength multiplier mu
- getUpperLimitOnSigmaTimesEff(model, expected=False, trylasttime=False)[source]
- upper limit on the fiducial cross section sigma times efficiency,
summed over all signal regions, i.e. sum_i xsec^prod_i eff_i obtained from the defined Data (using the signal prediction for each signal region/dataset), by using the q_mu test statistic from the CCGV paper (arXiv:1007.1727).
- Params expected
if false, compute observed, true: compute a priori expected, “posteriori”: compute a posteriori expected
- Params trylasttime
if True, then dont try extra
- Returns
upper limit on fiducial cross section
statistics.speyTools module
statistics.statsTools module
- class statistics.statsTools.StatsComputer(dataObject: Union[DataSet, CombinedDataSet, list], dataType: str, nsig: Union[None, float, List, Dict] = None, deltas_rel: Union[None, float] = None, allowNegativeSignals: bool = False)[source]
Bases:
object
Initialise.
- Parameters
dataObject – a smodels (combined)dataset or a list of theory predictions (for combination of analyses)
nsig – signal yield, either as float or as list
deltas_rel – relative error on signal. currently unused
- AllowNegativeSignals
if True, negative values for the signal (mu) are allowed.
- CLs(poi_test: float = 1.0, expected: Union[bool, str] = False) Optional[float] [source]
compute CLs value for a given value of the poi
- allowNegativeSignals
- data
- dataObject
- dataType
- deltas_sys
- classmethod forAnalysesComb(theoryPredictions, deltas_rel)[source]
get a statscomputer for combination of analyses :param theoryPredictions: list of TheoryPrediction objects :param deltas_rel: relative error for the signal :returns: a StatsComputer
- classmethod forMultiBinSL(dataset, nsig, deltas_rel)[source]
get a statscomputer for simplified likelihood combination.
- Parameters
dataset – CombinedDataSet object
nsig – Number of signal events for each SR
- Deltas_rel
Relative uncertainty for the signal
- Returns
a StatsComputer
- classmethod forPyhf(dataset, nsig, deltas_rel)[source]
get a statscomputer for pyhf combination.
- Parameters
dataset – CombinedDataSet object
nsig – Number of signal events for each SR
- Deltas_rel
Relative uncertainty for the signal
- Returns
a StatsComputer
- classmethod forSingleBin(dataset, nsig, deltas_rel)[source]
get a statscomputer for an efficiency map (single bin).
- Parameters
dataset – DataSet object
nsig – Number of signal events for each SR
- Deltas_rel
Relative uncertainty for the signal
- Returns
a StatsComputer
- classmethod forTruncatedGaussian(theorypred, corr: float = 0.6)[source]
get a statscomputer for truncated gaussians :param theorypred: TheoryPrediction object :param corr: correction factor: ULexp_mod = ULexp / (1. - corr*((ULobs-ULexp)/(ULobs+ULexp))) a factor of corr = 0.6 is proposed. :returns: a StatsComputer
- get_five_values(expected: Union[bool, str], return_nll: bool = False, check_for_maxima: bool = False) Dict [source]
Return the Five Values: l(bsm), l(sm), muhat, l(muhat), sigma(mu_hat)
- Parameters
check_for_maxima – if true, then check lmax against l(sm) and l(bsm) correct, if necessary
- likelihood(poi_test: float, expected: Union[bool, str], return_nll: bool) float [source]
simple frontend to individual computers
- likelihoodComputer
- maximize_likelihood(expected: Union[bool, str], return_nll: bool = False) dict [source]
simple frontend to the individual computers, later spey :param return_nll: if True, return negative log likelihood :returns: Dictionary of llhd (llhd at mu_hat), muhat, sigma_mu (sigma of mu_hat), optionally also theta_hat
- nsig
- poi_upper_limit(expected: Union[bool, str], limit_on_xsec: bool = False) float [source]
Simple frontend to the upperlimit computers, later to spey.poi_upper_limit
- Parameters
limit_on_xsec – if True, then return the limit on the cross section
- upperLimitComputer
statistics.truncatedGaussians module
- class statistics.truncatedGaussians.TruncatedGaussians(upperLimitOnMu: float, expectedUpperLimitOnMu: float, corr: Optional[float] = 0.6, cl=0.95)[source]
Bases:
object
likelihood computer based on the trunacated Gaussian approximation, see arXiv:1202.3415
- Parameters
upperLimitOnMu – observed upper limit on signal strength mu
expectedUpperLimitOnMu – expected upper limit on signal strength mu
corr – correction factor: ULexp_mod = ULexp / (1. - corr*((ULobs-ULexp)/(ULobs+ULexp))) When comparing with likelihoods constructed from efficiency maps, a factor of corr = 0.6 has been found to result in the best approximations.
cl – confidence level
- cl
- corr
- denominator
- expectedUpperLimitOnMu
- likelihood(mu: Optional[float], return_nll: Optional[bool] = False, allowNegativeSignals: Optional[bool] = True, corr: Optional[float] = 0.6, expected: Union[str, bool] = False) Union[None, float] [source]
return the likelihood, as a function of mu
- Parameters
mu – number of signal events, if None then mu = muhat
return_nll – if True, return negative log likelihood
allowNegativeSignals – if True, then allow muhat to become negative, else demand that muhat >= 0. In the presence of underfluctuations in the data, setting this to True results in more realistic approximate likelihoods.
- Returns
likelihood (float)
- lmax(return_nll: Optional[bool] = False, allowNegativeSignals: Optional[bool] = True, corr: Optional[float] = 0.6, expected: Union[bool, str] = False) Dict [source]
Return the likelihood, as a function of mu
- Parameters
mu – number of signal events, if None then mu = muhat
return_nll – if True, return negative log likelihood
allowNegativeSignals – if True, then allow muhat to become negative, else demand that muhat >= 0. In the presence of underfluctuations in the data, setting this to True results in more realistic approximate likelihoods.
- Returns
dictionary with likelihood (float), muhat, and sigma_mu
- newCorrectionType = False
- sigma_mu
- upperLimitOnMu