PhD Project
My PhD project is titled "Towards real time Raman molecular imaging of living organisms" and
is revolved around performing rapid inference on Raman 'videos' such that biological or
chemical processes such as fermentation or drug delivery can be monitored in real time in a
non-destructive, non-invasive manner. To achieve this, I am developing an end-to-end
statistical procedure to map the Raman map inputs to desired outputs, such as analyte
concentration. This requires a probabilistic model for extracting features of relevance from
the Raman maps (ie. peak shapes, peak locations etc.), baseline (ie. background signal) and
calibration (ie. modeling the properties of the spectrometer). We are also investigating
amortizing the inference procedure for faster future inferences. Thank you to Mikkel N.
Schmidt for providing the graphic to the left.
A popular scientific summary of the
project is available here.
Research interests
For a list of publications, see
this page.
Bayesian modeling of Raman spectroscopy
Raman spectroscopy is a powerful non-invasive method of identifying and characterizing
molecules of interest. It involves illuminating a sample with a laser and observing the
wavelengths of the scattered light. The difference in wavelengths of the laser light and
the scattered light can reveal important properties of the molecules, as the energy (and
thus the wavelength) is dependent on the molecular structure of the sample at hand. We
model these spectra in a probabilistic way to obtain information about the sample, while
capturing uncertainty in an elegant way.
Bayesian inference
Bayesian statistics is a popular approach to modeling complex phenomena due to its' high
flexibility and elegant uncertainty estimation. Inference in Bayesian models, however,
is computationally demanding and (often) analytically intractable. Thus approximate
inference procedures such as Markov chain Monte Carlo (MCMC) or Variational Bayesian
(VB) are needed. I study mainly MCMC for big-data settings, where we simulate a
dynamical system to achieve proposals for new samples of the Bayesian posterior
distribution. These samples can be used to estimate global properties of the posterior
distribution. Specifically, I study stochastic gradient MCMC, where stochastic estimates
are used instead of true gradients, which speeds up computations when a lot of data is
available.
Non-stationary Gaussian processes
Gaussian processes are an extremely flexible class of stochastic processes, used in
almost all areas of statistical science. I have studied GP's with non-stationary
covariance functions for source separation where we are interested in separating some
signal of interest from measurement noise and background signals. The non-stationarity
of the GP's are interesting especially when the noise is correlated over time. I have
also investigated learning a suitable covariance function from data with mixed results.
Amortized inference
Amortizing inference is the procedure of using a statistical model (often a neural
network - specifically variational autoencoders) to mimic a more complicated inference
procedure. I study this in the context of spectral data (Raman spectroscopy), as
parameter estimation in Bayesian models of Raman spectroscopy is computationally
intensive. The general idea is to train a neural network at the same time as the
parameters of a specific statistical model of the phenomenon is estimated. The neural
network then learns a (fast to compute) foward map of the inputs directy to the output
of interest (in the context of Raman spectroscopy, this could be analyte
concentrations), thus eliminating the need to re-estimate parameters when new data is
presented.