Using SModelS

SModelS can take SLHA or LHE files as input (see Basic Input). It ships with a command-line tool runSModelS.py, which reports on the SMS decomposition and theory predictions in several output formats.

For users more familiar with Python and the SModelS basics, an example code Example.py is provided showing how to access the main SModelS functionalities: decomposition, the database and computation of theory predictions.

The command-line tool (runSModelS.py) and the example Python code (Example.py) are described below.

Note

For non-MSSM (incl. non-SUSY) input models the user needs to specify the BSM particles and their quantum numbers 1 (see adding new particles).

runSModelS.py

runSModelS.py covers several different applications of the SModelS functionality, with the option of turning various features on or off, as well as setting the basic parameters. These functionalities include detailed checks of input SLHA files, running the decomposition, evaluating the theory predictions and comparing them to the experimental limits available in the database, determining missing topologies and printing the output in several available formats.

Starting on v1.1, runSModelS.py is equipped with two additional functionalities. First, it can process a folder containing a set of SLHA or LHE files, second, it supports parallelization of this input folder.

The usage of runSModelS is:

runSModelS.py [-h] -f FILENAME [-p PARAMETERFILE] [-o OUTPUTDIR] [-d] [-t] [-C] [-V] [-c] [-v VERBOSE] [-T TIMEOUT]

options:

-h, --help: show this help message and exit
-f FILENAME, --filename FILENAME: name of SLHA or LHE input file or a directory path (required argument). If a directory is given, loop over all files in the directory
-p PARAMETERFILE, --parameterFile PARAMETERFILE: name of parameter file, where most options are defined (optional argument). If not set, use all parameters from smodels/etc/parameters_default.ini
-o OUTPUTDIR, --outputDir OUTPUTDIR: name of output directory (optional argument). The default folder is: ./results/
-d, --development: if set, SModelS will run in development mode and exit if any errors are found.
-t, --force_txt: force loading the text database
-C, --colors: colored output
-V, --version: show program’s version number and exit
-c, --run-crashreport: parse crash report file and use its contents for a SModelS run. Supply the crash file simply via ‘– filename myfile.crash’
-v VERBOSE, --verbose VERBOSE: sets the verbosity level (debug, info, warning, error). Default value is info.
-T TIMEOUT, --timeout TIMEOUT: define a limit on the running time (in secs).If not set, run without a time limit. If a directory is given as input, the timeout will be applied for each individual file.

A typical usage example is:

runSModelS.py -f inputFiles/slha/simplyGluino.slha -p parameters.ini -o ./ -v warning

The resulting output will be generated in the current folder, according to the printer options set in the parameters file.

The Parameters File

The basic options and parameters used by runSModelS.py are defined in the parameters file. An example parameter file, including all available parameters together with a short description, is stored in parameters.ini. If no parameter file is specified, the default parameters stored in smodels/etc/parameters_default.ini are used. Below we give more detailed information about each entry in the parameters file.

options: main options for turning SModelS features on or off

checkInput (True/False): if True, runSModelS.py will run the file check tool on the input file and verify if the input contains all the necessary information.

doInvisible (True/False): turns invisible compression on or off during the decomposition. (Note that the compression is only applied to prompt particles, with widths larger than the promptWidth parameter.)

doCompress (True/False): turns mass compression on or off during the decomposition. (Note that the compression is only applied to prompt particles, with widths larger than the promptWidth parameter.)

testCoverage (True/False): set to True to run the coverage tool.

computeStatistics (True/False): turns the likelihood computation on or off (see likelihood calculation). If True, the negative log likelihoods nll=nll_BSM, nll_SM and nll_min are computed for the EM-type results.

combineSRs (True/False): set to True to combine signal regions in EM-type results when covariance matrix, pyhf JSON likelihood, or an onnx surrogate model is available. Set to False to use only the most sensitive signal region (faster!). Available v1.1.3 onwards for covariance matrices and v1.2.4 onwards for full likelihoods (using pyhf).

pyhfbackend: set to name of pyhf backend. Possible values are: numpy (default), pytorch, tensorflow, jax

reportAllSRs (True/False): set to True to report all signal regions, instead of the best signal region only. From v3.0.0 onwards it will also include the combined SRs if combineSRs=True. Beware, the output can be long.

combineAnas (list of results): list of analysis IDs to be combined. All the analyses are assumed to be fully uncorrelated, so use with caution! Available from v2.2.0 onwards.

experimentalFeatures: enable specific features that are not (yet) considered part of SModelS. Use with care!

truncatedgaussians (True/False): set to True to enable truncated gaussian as approximate likelihoods for UL-based results. Use at your own risk!

spey (True/False): set to True to enable experimental spey interface. Use at your own risk!

particles: defines the particle content of the BSM model

model: pathname to the Python file that defines the particle content of the BSM model or to a SLHA file containing QNUMBERS blocks for the BSM particles (see Basic Input). The Python file can be given either in Unix file notation (“/path/to/model.py”) or as Python module path (“path.to.model”). Defaults to share.models.mssm which is a standard MSSM. See smodels/share/models folder for more examples. Directory name can be omitted; in that case, the current working directory as well as smodels/share/models are searched for. If not defined, it will assume the input file is a SLHA file containing QNUMBERS for the particles in the model.

promptWidth: total decay width in GeV above which decays are considered prompt, default is 1e-11; available v2.0 onwards. (nb default was 1e-8 in v2.0 and 2.1, changed to 1e-11 in v2.2)

stableWidth: total decay width in GeV below which particles are considered as (quasi)stable, default is 1e-25; available v2.0 onwards.

ignorePromptQNumbers: list of quantum numbers to be ignored for promptly decaying particles (particles with width larger than promptWidth). Since many experimental searches are not sensitive to the properties of particles with prompt decays, SModelS has the option to erase the quantum numbers of these particles. For instance, if ignorePromptQNumbers=”spin,eCharge,colordim”, the spin, electric charge and color properties of promptly decaying particles will be ignored. This can greatly reduce the running time (must be used with caution). If this parameter is not defined, all quantum numbers will be kept. Available v3.0.0 onwards.

parameters: basic parameter values for running SModelS

sigmacut (float): minimum value for an SMS weight (in fb). SMS topologies with a weight below sigmacut are neglected during the decomposition of SLHA files (see Minimum Decomposition Weight). The default value is 0.005 fb. Note that, depending on the input model, the running time may increase considerably if sigmacut is too low, while too large values might eliminate relevant SMS topologies.

minmassgap (float): maximum value of the mass difference (in GeV) for performing mass compression. Only used if doCompress = True

minmassgapISR (float): maximum value of the mass difference (in GeV) for performing mass compression leading to ISR signatures. If not defined, a default value of 1 GeV is assumed. Only used if doCompress = True.

maxcond (float): maximum allowed value (in the [0,1] interval) for the violation of upper limit conditions. A zero value means the conditions are strictly enforced, while 1 means the conditions are never enforced. Only relevant for printing the output summary.

ncpus (int): number of CPUs. When processing multiple SLHA/LHE files, SModelS can run in a parallelized fashion, splitting up the input files in equal chunks. ncpus = 0 parallelizes to as many processes as number of CPU cores of the machine, negative values mean parallelization to number of CPU cores minus the absolute value of ncpus (but at least 1). Default value is 1. Warning: python already parallelizes many tasks internally.

database: allows for selection of a subset of experimental results from the database

path: the absolute (or relative) path to the database. The user can supply either the directory name of the database, or the path to the pickle file. Also http addresses may be given, e.g. https://smodels.github.io/database/official230. See the github database release page for a list of public database versions. Shorthand notations are available: path=official refers to the official database of your SModelS version, while path=latest refers to the latest availabe database release. The ‘+’ operator allows for extending the “official” or “latest” database with add-ons:

+fastlim: adds fastlim results (from early 8 TeV ATLAS analyses); from v2.1.0 onward

+superseded: adds results which were previously available but were superseded by newer ones; from v2.1.0 onward

+nonaggregated: adds analyses with non-aggregated SRs in addition to the aggregated results in CMS analyses; from v2.2.0 onward

+full_llhds: replaces simplified HistFactory statistical models by full ones in ATLAS analyses; from v2.3.0 onward (careful, this increases a lot the runtime!)

Examples are path=official+fastlim, path=official+nonaggregated, path=official+nonaggregated+full_llhds. Note that order matters: results are replaced in the specified sequence, so path=nonaggregated+official will fall back onto the official database with aggregated results. In principle, the add-ons can also be used alone, e.g. path=nonaggregated, though this is of little practical use. Finally, debug refers to a version of the database with extra information that is however not intended for usage by a regular user and only mentioned here for completeness.

analyses (list of results): set to [‘all’] to use all available results. If a list of experimental analyses is given, only these will be used. For instance, setting analyses = CMS-PAS-SUS-13-008,ATLAS-CONF-2013-024 will only use the experimental results from CMS-PAS-SUS-13-008 and ATLAS-CONF-2013-024. Wildcards (, ?, [<list-of-or’ed-letters>]) are expanded in the same way the shell does wildcard expansion for file names. So analyses = CMS leads to evaluation of results from the CMS-experiment only, for example. SUS selects everything containining SUS, no matter if from CMS or ATLAS. Furthermore selection of analyses can be confined on their centre-of-mass energy with a suffix beginning with a colon and an energy string in unum-style, like :13*TeV. Note that the asterisk behind the colon is not a wildcard. :13, :13TeV and :13 TeV are also understood but discouraged.

txnames (list of topologies): set to [‘all’] to use all available simplified model topologies. The SMS topologies are labeled according to the txname convention. If a list of txnames are given, only the corresponding topologies will be considered. For instance, setting txnames = T2 will only consider experimental results for \(pp \to \tilde{q} + \tilde{q} \to (jet+\tilde{\chi}_1^0) + (jet+\tilde{\chi}_1^0)\) and the output will only contain constraints for this topology. A list of all SMS topologies and their corresponding txnames can be found here Wildcards (*, ?, [<list-of-or’ed-letters>]) are expanded in the same way the shell does wildcard expansion for file names. So, for example, txnames = T[12]*bb* picks all txnames beginning with T1 or T2 and containg bb as of the time of writing were: T1bbbb, T1bbbt, T1bbqq, T1bbtt, T2bb, T2bbWW, T2bbWWoff

dataselector (list of datasets): set to [‘all’] to use all available data sets. If dataselector = upperLimit (efficiencyMap), only UL-type results (EM-type results) will be used. Furthermore, if a list of signal regions (data sets) is given, only the experimental results containing these datasets will be used. For instance, if dataselector = SRA mCT150,SRA mCT200, only these signal regions will be used. Wildcards (*, ?, [<list-of-or’ed-letters>]) are expanded in the same way the shell does wildcard expansion for file names. Wildcard examples are given above.

dataTypes dataType of the analysis (all, efficiencyMap or upperLimit). Can be wildcarded with usual shell wildcards: * ? [<list-of-or’ed-letters>]. Wildcard examples are given above.

printer: main options for the output format

outputType (list of outputs): use to list all the output formats to be generated. Available output formats are: summary, stdout, log, python, xml, slha.

outputFormat: use to select in which format the output should be written. Available formats are: current (latest format) or version2 (SModelS 2.x format using bracket notation)

stdout-printer: options for the stdout or log printer

printDatabase (True/False): set to True to print the list of selected experimental results to stdout.

addAnaInfo (True/False): set to True to include detailed information about the txnames tested by each experimental result. Only used if printDatabase=True.

printDecomp (True/False): set to True to print basic information from the decomposition (SMS topologies, total weights, …).

addSMSInfo (True/False): set to True to include detailed information about the SMS topologies generated by the decomposition. Only used if printDecomp=True.

printExtendedResults (True/False): set to True to print extended information about the theory predictions, including the PIDs of the particles contributing to the predicted cross section, their masses and the expected upper limit (if available).

addCoverageID (True/False): set to True to print the list of SMS IDs contributing to each missing topology (see coverage). Only used if testCoverage = True. This option should be used along with addSMSInfo = True so the user can precisely identify which SMS topologies were classified as missing.

summary-printer: options for the summary printer

expandedSummary (True/False): set to True to include in the summary output all applicable experimental results, False for only the strongest one.

slha-printer: options for the SLHA printer
- expandedOutput (True/False): set to True to print the full list of results. If False only the most constraining result and excluding results are printed.

python-printer: options for the Python printer

addSMSList (True/False): set to True to include in the Python output all information about all SMS topologies generated in the decomposition. If set to True the output file can be quite large.

addTxWeights (True/False): set to True to print the contribution from individual topologies to each theory prediction. Available v1.1.3 onwards.

addNodesMap (True/False): set to True to include the mapping of the nodes indices to the BSM labels. Available v3.0.0 onwards.

addStatModel (True/False): set to True to add the file name of the statistical model (or SL for simplified likelihoods). Available v3.1.2 onwards.

xml-printer: options for the xml printer

addSMSList (True/False): set to True to include in the Python output all information about all SMS topologies generated in the decomposition. If set to True the output file can be quite large.

addTxWeights (True/False): set to True to print the contribution from individual topologies to each theory prediction. Available v1.1.3 onwards.

The Output

The results of runSModelS.py are printed to the format(s) specified by the outputType in the parameters file. The following formats are available:

a human-readable screen output (stdout) or log output. These are intended to provide detailed information about the database, the decomposition, the theory predictions and the missing topologies. The output complexity can be controlled through several options in the parameters file. Due to its size, this output is not suitable for storing the results from a large scan, being more appropriate for a single file input.

a human-readable text file output containing a summary of the output. This format contains the main SModelS results: the theory predictions and the missing topologies. It can be used for a large scan, since the output can be made quite compact, using the options in the parameters file.

a python dictionary printed to a file containing information about the decomposition, the theory predictions and the missing topologies. The output can be significantly long, if all options in the parameters file are set to True. However this output can be easily imported to a Python environment, making it easy to access the desired information. For users familiar with the Python language this is the recommended format.

a xml file containing information about the decomposition, the theory predictions and the missing topologies. The output can be significantly long, if all options are set to True. Due to its broad usage, the xml output can be easily converted to the user’s preferred format.

a SLHA file containing information about the theory predictions and the missing topologies. The output follows a SLHA-type format and contains a summary of the most constraining results and the missed topologies.

In addition, when running over multiple files, a simple text output (summary.txt) is generated with basic information about the results for each input file. A detailed explanation of the information contained in each type of output is given in SModels Output.

Example.py

Although runSModelS.py provides the main SModelS features with a command line interface, users more familiar with Python and the SModelS language may prefer to write their own main program. A simple example code for this purpose is provided in examples/Example.py. Below we go step-by-step through this example code:

Import the SModelS modules and methods. If the example code file is not located in the smodels installation folder, simply add “sys.path.append(<smodels installation path>)” before importing smodels. Set SModelS verbosity level.

from smodels.tools import coverage
from smodels.base.smodelsLogging import setLogLevel
from smodels.tools.particlesLoader import load
from smodels.share.models.SMparticles import SMList
from smodels.statistics.basicStats import apriori
from smodels.base.model import Model
import time
setLogLevel("info")

# Set the path to the database
import os

Define the pyhfbackend to use. Specify which pyhfbackend to use. One of: numpy, pytorch, tensorflow, jax.

from smodels.statistics.pyhfInterface import setBackend
# set pyhf backend to one of: numpy (default), pytorch, tensorflow, jax. 
# WARNING: if backend specified is not found, we fall back to numpy!
setBackend("pytorch")

Set the path to the database URL. Specify which database to use. It can be the path to the smodels-database folder, the path to a pickle file or (starting with v1.1.3) a URL path.

    # Set the path to the database
    database = Database(database)

Load the BSM particles. By default SModelS assumes the MSSM particle content. For using SModelS with a different particle content, the user must define the new particle content and set modelFile to the path of the model file (see particles:model in Parameter File).

    # Load the BSM model
    runtime.modelFile = "smodels.share.models.mssm"
    BSMList = load()

Load the model and set the path to the input file. Load BSM and SM particle content; specify the location of the input file (must be an SLHA or LHE file, see Basic Input) and update particles in the model.

    model = Model(BSMparticles=BSMList, SMparticles=SMList)
    slhafile = os.path.abspath(inputFile)
    model.updateParticles(inputFile=slhafile,
                          ignorePromptQNumbers = ['eCharge','colordim','spin'])

Set main options for decomposition. Specify the values of sigmacut and minmassgap:

    sigmacut = sigmacut
    mingap = 5.*GeV

Decompose model using the decomposer.decompose method. The doCompress and doInvisible options turn the mass compression and invisible compression on/off.

    topDict = decomposer.decompose(model, sigmacut,
                                   massCompress=True, invisibleCompress=True,
                                   minmassgap=mingap)

Access basic information from decomposition, using the dictionary of SMS (the dictionary has as keys the canonical name for the topology and as values the list of SMS topologies for the corresponding canonical name):

    print(f"\n Decomposition done in {(time.time() - t0) / 60.0:1.2f}m")
    print("\n Decomposition Results: ")
    print(f"\t  Total number of topologies: {len(topDict)} ")
    nSMS = len(topDict.getSMSList())
    print("\t  Total number of SMS = %i " % nSMS)

output:

 Decomposition Results: 
	  Total number of topologies: 44 
	  Total number of SMS = 7882 

Print information about the SMS topologies from the decomposition:

    smsList = sorted(topDict.getSMSList(), 
                     key = lambda sms: sms.weightList, reverse=True)
    
    # Print information about the first few SMS topologies:
    for sms in smsList[:3]:
        print(f"\t\t SMS  = {sms}")
        print(f"\t\t cross section*BR = {sms.weightList.getMaxXsec()}\n")

output:

		 SMS  = (PV > C1+/C1-(1),N2(2)), (C1+/C1-(1) > N1/N1~,q,q), (N2(2) > N1,q,q)
		 cross section*BR = 7.33E-01 [pb]

		 SMS  = (PV > C1+/C1-(1),N2(2)), (C1+/C1-(1) > N1/N1~,q,c), (N2(2) > N1,q,q)
		 cross section*BR = 7.33E-01 [pb]

		 SMS  = (PV > C1+/C1-(1),C1+/C1-(2)), (C1+/C1-(1) > N1/N1~,q,q), (C1+/C1-(2) > N1/N1~,q,c)
		 cross section*BR = 4.94E-01 [pb]

Load the experimental results to be used to constrain the input model. Here, all results are used:

    listOfExpRes = database.getExpResults()

Alternatively, the getExpResults method can take as arguments specific results to be loaded and used.

Print basic information about the results loaded. Below we show how to count the number of UL-type results and EM-type results loaded:

    for exp in listOfExpRes:
        expType = exp.datasets[0].dataInfo.dataType
        if expType == 'upperLimit':
            nUL += 1
        elif expType == 'efficiencyMap':
            nEM += 1
    print("\n Loaded Database with %i UL results and %i EM results " % (nUL, nEM))

output:

 Loaded Database with 102 UL results and 56 EM results 

Compute the theory predictions for each the list of selected results. The output is a list of theory prediction objects:

    allPredictions = theoryPredictionsFor(database, topDict, combinedResults=False)

Print the results. For each experimental result, loop over the corresponding theory predictions and print the relevant information:

    for theoryPrediction in allPredictions:
        print(f'\n {theoryPrediction.analysisId()} ')
        dataset = theoryPrediction.dataset
        datasetID = theoryPrediction.dataId()
        txnames = sorted([str(txname) for txname in theoryPrediction.txnames])
        print("------------------------")
        print("Dataset = ", datasetID)  # Analysis name
        print("TxNames = ", txnames)
        print("Theory Prediction = ", theoryPrediction.xsection)  # Signal cross section
        print("Condition Violation = ", theoryPrediction.conditions)  # Condition violation values

output:

 Theory Predictions and Constraints:

 ATLAS-SUSY-2019-09 
------------------------
Dataset =  None
TxNames =  ['TChiWZoff']
Theory Prediction =  2.63E+00 [pb]
Condition Violation =  [0.0]

Get the corresponding upper limit. This value can be compared to the theory prediction to decide whether a model is excluded or not:

        print("UL for theory prediction = ", theoryPrediction.upperLimit)

output:

UL for theory prediction =  1.20E-01 [fb]

Print the r-value, i.e. the ratio theory prediction/upper limit. A value of \(r \geq 1\) means that an experimental result excludes the input model. For EM-type results also compute the negative log likelihood values. Determine the most constraining result:

        r = theoryPrediction.getRValue()
        print(f"r = {r:1.3E}")
        # Compute likelihoods for EM-type results:
        if dataset.getType() == 'efficiencyMap':
            theoryPrediction.computeStatistics()
            print('nll_BSM, nll_SM, nll_min = %1.3f, %1.3f, %1.3f' % (theoryPrediction.nll( ),
                    theoryPrediction.nllsm( ), theoryPrediction.nll_min( )) )

output:

r = 2.888E+00
nll_BSM, nll_SM, nll_min = 19.891, 4.827, 4.809

Print the most constraining experimental result. Using the largest r-value, determine if the model has been excluded or not by the selected experimental results:

        if r > rmax:
            rmax = r
            bestResult = theoryPrediction.analysisId()

    # Print the most constraining experimental result
    print(f"\nThe largest r-value (theory/upper limit ratio) is {rmax:1.3E}")
    if rmax > 1.:
        print(f"(The input model is likely excluded by {bestResult})")
    else:
        print("(The input model is not excluded by the simplified model results)")

output:

The largest r-value (theory/upper limit ratio) is 4.783E+00
(The input model is likely excluded by ATLAS-SUSY-2019-09)

Select analyses. Using the theory predictions, select a (user-defined) subset of analyses to be combined:

    combineAnas = ['ATLAS-SUSY-2013-11', 'CMS-SUS-13-013']
    selectedTheoryPreds = []
    for tp in allPredictions:
        expID = tp.analysisId()
        if expID not in combineAnas:
            continue
        if tp.likelihood() is None:
            continue
        selectedTheoryPreds.append(tp)

Combine analyses. Using the selected analyses, combine them under the assumption they are fully uncorrelated:

        combiner = TheoryPredictionsCombiner(selectedTheoryPreds)

Print the combination. Print the r-values and likelihoods for the combination:

        print("\n\nCombined analyses:", combiner.analysisId())
        print(f"Combined r value: {combiner.getRValue():1.3E}")
        print(f"Combined r value (evaluationType): {combiner.getRValue(evaluationType=apriori):1.3E}")
        print(f"Likelihoods: nll, nll_min, nll_SM = {nll:.3f}, {nllmin:.3f}, {nllsm:.3f}\n")

output:

Combined analyses: ATLAS-SUSY-2013-11,CMS-SUS-13-013
Combined r value: 2.053E-02
Combined r value (expected): 2.004E-02
Likelihoods: nll, nll_min, nll_SM = 6.181, 6.176, 6.176

Identify missing topologies. Using the output from decomposition, identify the missing topologies and print some basic information:

    uncovered = coverage.Uncovered(topDict, sqrts=13.*TeV)
    print(f"\n Coverage done in {(time.time() - t0) / 60.0:1.2f}m")
    # First sort coverage groups by label
    groups = sorted(uncovered.groups[:], key=lambda g: g.label)
    # Print uncovered cross-sections:
    for group in groups:
        print(f"\nTotal cross-section for {group.description} (fb): {group.getTotalXSec():10.3E}\n")

output:

Total cross-section for missing topologies (fb):  1.062E+04

Total cross-section for missing topologies with displaced decays (fb):  0.000E+00

Total cross-section for missing topologies with prompt decays (fb):  1.402E+04

Total cross-section for topologies outside the grid (fb):  3.823E+03

It is worth noting that SModelS does not include any statistical treatment for the results, for instance, correction factors like the “look elsewhere effect”. Due to this, the results are claimed to be “likely excluded” in the output.

Notes:

For an SLHA input file, the decays of SM particles are always ignored during the decomposition. Furthermore, if there are two cross sections at different calculation order (say LO and NLO) for the same process, only the highest order is used.
The list of SMS topologies can be extremely long. Try setting addSMSInfo = False and/or printDecomp = False to obtain a smaller output.
A comment of caution is in order regarding naively using the highest \(r\)-value reported by SModelS, as this does not necessarily come from the most sensitive analysis. For a rigorous statistical interpretation, one should use the \(r\)-value of the result with the highest expected \(r\) (\(r_{exp}\)). Unfortunately, for UL-type results, the expected limits are often not available; \(r_{exp}\) is then reported as N/A in the SModelS output.

1: SLHA files including decay tables and cross sections, together with the corresponding model.py, can conveniently be generated via the SModelS-micrOMEGAS interface, see arXiv:1606.03834