Occasionally you might encounter a situation where you need to run the same simulation lots of times with slightly different configurations. In cases where this also involves computationally expensive things (many orders of higher order modes, very high resolution axes, lock
commands, maps,…), that can get quite tedious and you might wonder if you can parallelise your code to speed things up a bit.
With the caveat that you should first do a sanity check as to whether your code is already set up to run as fast as it could per-case*, the short answer is yes.
(*e.g. could your code be tuned closer to an operating point so the lock
s don’t have to work as hard, do you actually need to run with maxtem 20
or do your results converge by maxtem 4
, etc.)
Here are 3 methods (out of probably several more) that can be used to run Finesse 2 simulations in parallel using Python:
Below I’ll assume you are working in a Jupyter Notebook; the same methods can be used directly in a python script. Downloadable examples of everything covered here are linked at the end, along with some thoughts on which method might suit your needs best.
demo code used by all examples
Let’s use the same basic example in all cases. For parakat we need the kat code separate, for the other two we’ll need it inside a function.
In this example we parse the Finesse code for a Gaussian beam propagating to a mirror, and measure the power transmitted through the mirror as the laser’s power is varied linearly from 1W to 100W. Each parallel job then repeats this process for a different value of mirror transmission.
This isn’t something you would normally need to parallelise, but works for the purposes of testing the different methods quickly.
n_eng = 8 # max number of parallel tasks
xsteps = 10000 # xaxis resolution in each job
n_jobs = 100 # total number of different jobs we need to iterate through
katcode = f"""
l L0 1 0 n0
gauss mybeam L0 n0 100u 0 101u 1m
maxtem 4
s s0 0 n0 n1
m m 0.5 0.5 0 n1 nout
pd P nout
xaxis L0 P lin 1 100 {xsteps}"""
vals = range(1,n_jobs)
testvals = [v/n_jobs for v in vals]
ParaKat
For most simple cases, ParaKat is the recommended method. ParaKat only does the explicit Finesse calculation in parallel, returning a list of out
objects, so it’s functionality is a little limited. However this also usually makes it easiest to work with and includes some nice UI features like a progress bar 🙂
ParaKat is based on ipyparallel
. Like the more explicit usage we’ll see below, this relies on first externally starting up a cluster of engines using ipcluster
, which can then be assigned jobs by your code. ipcluser
(installable e.g. via conda
) is intended to be used through the terminal. If using Jupyter notebooks / Jupyterlab, you can always just launch a terminal window there and run the command
ipcluster start -n
[number of engines you want] --daemonize
However, we can also do this directly in-notebook, either using the !
flag, or slightly more pythonically (and controllably) using subprocess
:
import subprocess
subprocess.run(f"ipcluster start -n {n_eng} --daemonize",shell=True) #daemonize and Popen both make this happen in the background
time.sleep(10) #wait a sec for the cluster to start up
Now that the cluster is up and running, we set up the code as usual using Pykat:
base=finesse.kat()
base.verbose=False
base.parse(katcode)
and now we work slightly differently to a serial Pykat run:
from pykat.parallel import parakat
pk = parakat()
for T in testvals:
kat = base.deepcopy()
kat.m.setRTL(1-T,T,0)
pk.run(kat)
outs = pk.getResults()
Note that we still have a for
loop here, but that now this is just used to create and collate the various kat
objects. No calculations occur until the command pk.getResults()
.
If subprocess
was used to launch the engines, we can now also conveniently use it to stop them:
subprocess.run(f"ipcluster stop",shell=True)
time.sleep(10) #wait a sec for the cluster to shut down
Plotting the results (or otherwise working with the outputs of the runs) is then just a case of accessing the relevant list item in outs
:
for o in outs:
plt.plot(o.x,o['P'])
ipyparallel
ParaKat is built on ipyparallel
, so the usage is quite similar between the two. Using ipyparallel
directly gives you more flexibility, since we are directly parsing the python/pykat code of choice. It could therefore be extended to include further post-processing, multiple kat runs, etc.
As above, we need to use ipcluster
to externally launch the engines we want:
import subprocess
import time
subprocess.run(f"ipcluster start -n {n_eng} --daemonize",shell=True) #daemon and Popen both make this happen in the background
time.sleep(10) #wait a sec for the cluster to start up
This time, we need to create a function which describes what we want to happen in each job. Unfortunately, this can’t rely on any external dependancies, so we have to import pykat inside it, and either explicitly write out the katcode or add it as another variable (shown here):
NB: If you are working in a notebook, it typically works best to define functions in a separate cell.
def myfunc(T,code):
from pykat import finesse
k=finesse.kat()
k.verbose=False
k.parse(code)
k.m.setRTL(1-T,T,0)
o=k.run()
return o
To parallelise and run the code, we use Client
:
from ipyparallel import Client
rc=Client() #class object to start the client to the parallel cluster
lview = rc.load_balanced_view()#creates a DirectView object with load-balanced execution using all engines
lview.block = False # if self.block is False, returns AsyncResult, else: returns actual result of f(*args, **kwargs) on the engine(s)
results = [lview.apply_async(myfunc,yy,katcode) for yy in testvals] #easy enough to add the second 'code' arg to apply_async here
outs = [d.get() for d in results]
rc.close()#good practice, if unessential on some local machines
NB rc.load_balanced_view()
creates a DirectView object with load-balanced execution using all engines; if you don’t want that for e.g. memory usage reasons, use rc.direct_view()
instead to skip the load balancing.
As before, we now stop those engines manually and then extract the results to use as we please:
subprocess.run(f"ipcluster stop",shell=True)
time.sleep(10) #wait a sec for the cluster to shut down
for o in outs:
plt.plot(o.x,o['P'])
multiprocessing
This is simpler and cleaner than the above, since engine start/stop is handled internally (i.e. we don’t need ipcluster
this time). However, it might be a little less flexible in what can be iterated over, and engines are restricted to running on the local machine, while ipyparallel
enables you to send jobs to remote machines and more.
Like ipyparallel
, we need to iterate over a function; unlike ipyparallel this doesn’t seem to require pykat be defined every time, so this time we define:
def myfunc2(T,code):
k=finesse.kat()
k.verbose=False
k.parse(code)
k.m.setRTL(1-T,T,0)
o=k.run()
return o
then everything is handled internally, we just need Pool
:
from multiprocessing import Pool
pool = Pool(processes=n_eng)
results = [pool.apply_async(myfunc2, args=(x,katcode)) for x in testvals]
outs = [p.get() for p in results]
pool.close()#good practice, necessary on some machines
NB: you can also use outs = [pool.apply(myfunc2, args=(x,katcode)) for x in testvals]
in place ofresults = [pool.apply_async(myfunc2, args=(x,katcode)) for x in testvals]
outs = [p.get() for p in results]
This locks the code to run things in order (still in parallel) rather than launching all jobs asynchronously whenever space frees up. So it’s one less line of code but slightly slower, for use in cases where synchronicity is important
As coded above, plotting the results is identical to the previous methods:
for o in outs:
plt.plot(o.x,o['P'])
Which method should you use?
Functionality/Usability
parakat
is the method built into PyKat. For most simple cases where nothing much changes except the contents of the kat
object, this will do what you need without having to learn to much about what’s happening behind the scenes. You do have to externally launch engines using ipcluster, but there are many ways to do this (including widgets that let you do this via a GUI, if you prefer).
multiprocessing
seems best in terms of ease of use for cases where you need more to happen in each run than just return outputs from different kat
objects. There’s no external clusters to manually start or stop, and you can put whatever python code you need inside the function.
When you want lots of control, and/or the ability to send your code to engines on remote machines, then ipyparallel
is your best bet.
Speed
Below I’ve linked a script that runs the above examples for the same cases using all 3 methods.
Running this multiple times, I founds that parakat
tends to be the slowest, while ipyparallel
and multiprocessing
are fairly evenly matched (winner seems to depend on your system). I reckon the parakat slowdown is due to time taken to launch progress bars etc.
I suspect if you need to run many parallel runs one after another, ipyparallel-based work will be faster overall, since you can keep the cluster running between jobs. Multiprocessing must be spawning and closing the cluster for every run, which could be less efficient longer term.
Results may vary depending on what you are simulating. In all cases I suggest doing a short ‘dummy’ run with less than 10 jobs to check your code does what you want, and get an idea of whether you should go and make a cup of tea (or go to bed) while you wait.