Parallelising Finesse 2 + PyKat: three ways

Occasionally you might encounter a situation where you need to run the same simulation lots of times with slightly different configurations. In cases where this also involves computationally expensive things (many orders of higher order modes, very high resolution axes, lock commands, maps,…), that can get quite tedious and you might wonder if you can parallelise your code to speed things up a bit.

With the caveat that you should first do a sanity check as to whether your code is already set up to run as fast as it could per-case*, the short answer is yes.

(*e.g. could your code be tuned closer to an operating point so the locks don’t have to work as hard, do you actually need to run with maxtem 20 or do your results converge by maxtem 4, etc.)

Here are 3 methods (out of probably several more) that can be used to run Finesse 2 simulations in parallel using Python:

1. Built-in method: parakat
2. ipyparallel (the package parakat is built on)
3. multiprocessing

Below I’ll assume you are working in a Jupyter Notebook; the same methods can be used directly in a python script. Downloadable examples of everything covered here are linked at the end, along with some thoughts on which method might suit your needs best.

demo code used by all examples

Let’s use the same basic example in all cases. For parakat we need the kat code separate, for the other two we’ll need it inside a function.

In this example we parse the Finesse code for a Gaussian beam propagating to a mirror, and measure the power transmitted through the mirror as the laser’s power is varied linearly from 1W to 100W. Each parallel job then repeats this process for a different value of mirror transmission.

This isn’t something you would normally need to parallelise, but works for the purposes of testing the different methods quickly.

n_eng = 8 # max number of parallel tasks
xsteps = 10000  # xaxis resolution in each job
n_jobs = 100    # total number of different jobs we need to iterate through

katcode = f"""
l L0 1 0 n0
gauss mybeam L0 n0 100u 0 101u 1m
maxtem 4
s s0 0 n0 n1
m m 0.5 0.5 0 n1 nout
pd P nout
xaxis L0 P lin 1 100 {xsteps}"""

vals = range(1,n_jobs)
testvals = [v/n_jobs for v in vals]

ParaKat

For most simple cases, ParaKat is the recommended method. ParaKat only does the explicit Finesse calculation in parallel, returning a list of out objects, so it’s functionality is a little limited. However this also usually makes it easiest to work with and includes some nice UI features like a progress bar ðŸ™‚

ParaKat is based on ipyparallel. Like the more explicit usage we’ll see below, this relies on first externally starting up a cluster of engines using ipcluster, which can then be assigned jobs by your code. ipcluser (installable e.g. via conda) is intended to be used through the terminal. If using Jupyter notebooks / Jupyterlab, you can always just launch a terminal window there and run the command

ipcluster start -n [number of engines you want] --daemonize

However, we can also do this directly in-notebook, either using the ! flag, or slightly more pythonically (and controllably) using subprocess:

import subprocess
subprocess.run(f"ipcluster start -n {n_eng} --daemonize",shell=True) #daemonize and Popen both make this happen in the background
time.sleep(10) #wait a sec for the cluster to start up

Now that the cluster is up and running, we set up the code as usual using Pykat:

base=finesse.kat()
base.verbose=False
base.parse(katcode)

and now we work slightly differently to a serial Pykat run:

from pykat.parallel import parakat

pk = parakat()
for T in testvals:
kat = base.deepcopy()
kat.m.setRTL(1-T,T,0)
pk.run(kat)
outs = pk.getResults() 

Note that we still have a for loop here, but that now this is just used to create and collate the various kat objects. No calculations occur until the command pk.getResults().

If subprocess was used to launch the engines, we can now also conveniently use it to stop them:

subprocess.run(f"ipcluster stop",shell=True)
time.sleep(10) #wait a sec for the cluster to shut down

Plotting the results (or otherwise working with the outputs of the runs) is then just a case of accessing the relevant list item in outs:

for o in outs:
plt.plot(o.x,o['P'])

ipyparallel

ParaKat is built on ipyparallel, so the usage is quite similar between the two. Using ipyparallel directly gives you more flexibility, since we are directly parsing the python/pykat code of choice. It could therefore be extended to include further post-processing, multiple kat runs, etc.

As above, we need to use ipcluster to externally launch the engines we want:

import subprocess
import time
subprocess.run(f"ipcluster start -n {n_eng} --daemonize",shell=True) #daemon and Popen both make this happen in the background
time.sleep(10) #wait a sec for the cluster to start up

This time, we need to create a function which describes what we want to happen in each job. Unfortunately, this can’t rely on any external dependancies, so we have to import pykat inside it, and either explicitly write out the katcode or add it as another variable (shown here):

NB: If you are working in a notebook, it typically works best to define functions in a separate cell.

 def myfunc(T,code):
from pykat import finesse
k=finesse.kat()
k.verbose=False
k.parse(code)
k.m.setRTL(1-T,T,0)
o=k.run()
return o

To parallelise and run the code, we use Client:

from ipyparallel import Client

rc=Client() #class object to start the client to the parallel cluster
lview = rc.load_balanced_view()#creates a DirectView object with load-balanced execution using all engines
lview.block = False # if self.block is False, returns AsyncResult, else: returns actual result of f(*args, **kwargs) on the engine(s)
results = [lview.apply_async(myfunc,yy,katcode) for yy in testvals] #easy enough to add the second 'code' arg to apply_async here
outs = [d.get() for d in results]
rc.close()#good practice, if unessential on some local machines

NB rc.load_balanced_view() creates a DirectView object with load-balanced execution using all engines; if you don’t want that for e.g. memory usage reasons, use rc.direct_view() instead to skip the load balancing.

As before, we now stop those engines manually and then extract the results to use as we please:

subprocess.run(f"ipcluster stop",shell=True)
time.sleep(10) #wait a sec for the cluster to shut down

for o in outs:
plt.plot(o.x,o['P'])

multiprocessing

This is simpler and cleaner than the above, since engine start/stop is handled internally (i.e. we don’t need ipcluster this time). However, it might be a little less flexible in what can be iterated over, and engines are restricted to running on the local machine, while ipyparallel enables you to send jobs to remote machines and more.

Like ipyparallel, we need to iterate over a function; unlike ipyparallel this doesn’t seem to require pykat be defined every time, so this time we define:

 def myfunc2(T,code):
k=finesse.kat()
k.verbose=False
k.parse(code)
k.m.setRTL(1-T,T,0)
o=k.run()
return o

then everything is handled internally, we just need Pool:

from multiprocessing import Pool

pool = Pool(processes=n_eng)
results = [pool.apply_async(myfunc2, args=(x,katcode)) for x in testvals]
outs = [p.get() for p in results]
pool.close()#good practice, necessary on some machines

NB: you can also use outs = [pool.apply(myfunc2, args=(x,katcode)) for x in testvals]
in place of
results = [pool.apply_async(myfunc2, args=(x,katcode)) for x in testvals]outs = [p.get() for p in results]
This locks the code to run things in order (still in parallel) rather than launching all jobs asynchronously whenever space frees up. So it’s one less line of code but slightly slower, for use in cases where synchronicity is important

As coded above, plotting the results is identical to the previous methods:

 for o in outs:
plt.plot(o.x,o['P'])

Which method should you use?

Functionality/Usability

parakat is the method built into PyKat. For most simple cases where nothing much changes except the contents of the kat object, this will do what you need without having to learn to much about what’s happening behind the scenes. You do have to externally launch engines using ipcluster, but there are many ways to do this (including widgets that let you do this via a GUI, if you prefer).

multiprocessing seems best in terms of ease of use for cases where you need more to happen in each run than just return outputs from different kat objects. There’s no external clusters to manually start or stop, and you can put whatever python code you need inside the function.

When you want lots of control, and/or the ability to send your code to engines on remote machines, then ipyparallel is your best bet.

Speed

Below I’ve linked a script that runs the above examples for the same cases using all 3 methods.

Running this multiple times, I founds that parakat tends to be the slowest, while ipyparallel and multiprocessing are fairly evenly matched (winner seems to depend on your system). I reckon the parakat slowdown is due to time taken to launch progress bars etc.

I suspect if you need to run many parallel runs one after another, ipyparallel-based work will be faster overall, since you can keep the cluster running between jobs. Multiprocessing must be spawning and closing the cluster for every run, which could be less efficient longer term.

Results may vary depending on what you are simulating. In all cases I suggest doing a short ‘dummy’ run with less than 10 jobs to check your code does what you want, and get an idea of whether you should go and make a cup of tea (or go to bed) while you wait.