Brian Frost
- Apr 23, 2020
- 7 min read

Using Anatomical Constraints to Facilitate Medical Image Segmentation

Hey all, sorry for the long wait! It's been a baffling few months since November, but the COVID-19 outbreak has made certain work easier to focus on (while making work, generally, impossible to focus on). Specifically, I've had more time than usual to put into the image processing side of my work, and I achieved some pretty nice results this past week that I hope can be illuminating for anyone working in the medical imaging field.

The problem started, as they all do, thinking about gerbil cochleae. When we want to measure motion within the gerbil cochlea using OCT, we care about measuring at least from the basilar membrane (the main vibrating structure in the cochlea) and the outer hair cells (the source of amplification in the cochlea). Getting a "good" image of both of these structures can be very hard, and the signal level from the outer hair cells is usually quite low compared to the noise. As a result, a lot of time at the beginning of an experiment is spent trying to find a good point at which to measure.

An example of a good image is shown below, and it also gives an idea of how specific a "good point" really is. The black-and-white image on the left is called a B-Scan, and the read arrow points in the direction of the OCT beam. That is to say that the image is formed of consecutive line scans called A-Scans which profile a line going into the tissue. The A-Scan corresponding to the midpoint of the B-Scan is shown to the right. Vibration measurements are taken along an A-Scan, so the B-Scan is just used to get our bearings within the cochlea.

A zoom-in to the region of interest, along with the corresponding known anatomy is shown at the bottom of the B-Scan. It is with this that we are able to determine that we have found a good point for measuring basilar membrane and outer hair cell motion. However, several pixels to the left or right would already fail us in good measurements of both features at once. The outer hair cell region in a cross-section is only about 30 microns wide, so we do not have much room for failure.

Furthermore, different cross-sections of the cochlea may offer better imaging points, and we may be sacrificing signal-to-noise ratio for the sake of time! This is far from ideal, but necessary in in vivo experiments, as you are very much strapped for time. Many potentially good datasets have been lost to this, so I figured there must be some way to automate this preliminary step.

The idea was to start blind - take a volume scan of the entire viewable region of the cochlea, and let the computer spit out a point for you. In gerbil cochleae, images are taken through the round window membrane, which offers a constraint for the volume scan. If you scan through all that is visible through the round window, you can ensure that the best measurable point is found, so long as you have a fast and accurate algorithm.

So I grabbed an excised gerbil cochlea and took a volume scan through the round window. This was a fixed sample, but it already began to show some signs of anatomical decay. These occur within minutes of death and are unavoidable, so this is the best approximation to an in vivo cochlea I could get. I put the volume scan on my flash drive, and then a pandemic hit the country...

With this single data set, there is still a lot one can do. The volume scan consists of a few hundred B-Scans similar to the one shown below. The output quality of an OCT image before processing is horribly low, but the anatomical features are readily present. At about 100 px on the x-axis and 400 px on the y-axis, the characteristic gap between the basilar membrane and cellular regions of the organ of Corti can be made out.

OCT does not have "labels", as it works off of reflected light data alone. That means this image only has one channel - gray - and it is very hard to make out the difference between any two structures without knowing the underlying anatomy. For example, the basilar membrane is mostly collagen, and is connected on either side in this image to bone, but all of these structures appear bright gray!

I want to isolate pixels in which the basilar membrane and outer hair cells can be seen, but I have only these grainy, one-channel, unlabeled images to work with! This is nightmarish, but not insurmountable. Qualitatively, we talk about the "gap", a fluid space between the basilar membrane and the cells, as being a good marker of where we want to be taking our measurements. If I can quantitatively define that gap, we can segment the image so that it only contains those A-Scan locations which feature the correct structures.

To quantify a "gap" in an image of this quality is nontrivial, and requires a lot of preprocessing. The first thing to do would be to separate the basilar membrane and the region that is connected to it from the background of the image, as well as the superfluous round window at the top.

With OCT images, as you can likely see above, the background sometimes has pixels that are just as intense as the image itself. This sort of noise is called "salt and pepper", as there are spurious light and dark spots everywhere in the image. The canonical solution is to apply what is called a median filter, which replaces each pixel with the median value of all pixels near it. This removes outlying pixels of both high (salt) and low (pepper) intensities. It also can have a smoothing effect, however, which could potentially smooth out the gap we are looking to maintain. The gap is only about 30 pixels across, so we choose a median filter window of 5 px by 5 px. The result is below.

The whole image looks a lot smother, and most salt and pepper has been removed. One might say it is an uglier image, but I would pay them no mind, as my gap of interest is still visible!

To produce a nice segmentation, we will need to extract the edges from the image. Edges are usually an ill-defined object, so to make edge detection algorithms work well, you need to make sure the edges are as clear as possible. This is done best through smoothing out the edges, and then artificially enhancing the contrast between background and foreground.

To smooth out the edges, we apply a Gaussian filter. Again, we want to be sure not to close the gap, so we use a 5 px by 5 px kernel. The result is shown below.

Now even I would agree that this is an uglier image, but I would say that the edges are no doubt as smooth as they will ever be! Better yet, my gap is still present. We now need to increase contrast. The most contrast an image can have is if the image only takes two values - high and low. If we could make the background totally black and the foreground totally white, then edges would be very well-defined! They would simply be the points which border both black and white pixels.

This can be done through simple thresholding - you pick a pixel intensity and set every pixel below that intensity to 0, and every pixel at or above it to 255 (or whatever the max is for your format). The outcome is shown below.

Now this is a nice image! The gap is very clear - far clearer than in the original image - and the edges are very much well-defined. To perform edge detection, we can look just at those points in the image where the Laplacian will be nonzero. For those unfamiliar with the Laplacian filter, in this case it will just give us the points where the color changes. The edge detection output is shown below.

Perfect! Now I want to remove everything that is not showing the gap feature. Plenty of gaps appear in this image, some of which are of no anatomical concern to us. For example, there are gaps within the bone connecting to the round window membrane at the top of the image.

The insight here is that the anatomical region of interest is part of some connected component, disconnected from some relatively large, useless areas. Using MATLAB's image processing toolbox function "bwconncomp", a list of the connected components of the black-and-white image is formed. The largest four of these components are shown below.

Across all tested B-Scans, the largest connected component is consistently the one containing our region of interest. As such, we select this component and toss away everything else. We then are left with only our region of interest and some very useless bone! I call this step the "primary segmentation", and it is the first of three segmentation steps.

The secondary segmentation step involves looking for a sufficient number of gaps in the remaining region. A gap can be defined by the distance between two vertically separated black pixels, and we can count gaps line-by-line. If we look on the far left of our segmented region, there are no gaps present whatsoever - just solid bone. In our region of interest, there are at least two. In the space to the right, of the region of interest, some variations in bone yield a few other gaps. Removing all A-Scans with fewer than two gaps from the image results in the following image.

Ok, now we are getting somewhere. We only need to remove these segments with very small gaps due to bone variations. To do so, we look at the size of the gaps and use our known anatomy - if there is not a gap in the ballpark of the size of the anatomical space, we remove the A-Scan. Then we should be left with one contiguous region containing the region of interest. Just to be sure, of all the surviving A-Scans, we take the largest contiguous stretch of them. This tertiary segmentation step results in the following image.

Our region of interest! This is a swimming success, for this B-Scan. We now need to find the "best" A-Scan of these. To do so, we compute the A-Scan-wise average signal power in the original image pixels corresponding to the white pixels in the above mask, and choose the largest of these. Four example B-Scans and the located "Best A-Scan Location" are shown below.

Intuitively, these all look exactly right! They all fall in our region of interest, and at locations with seemingly good signal power.

This algorithm takes about 3 seconds to run on a large volume scan, and will theoretically speed up the beginning of experiments by saving up to 15 minutes of searching for a good A-Scan. I was very glad that it worked.

This shows that even in low-quality images without any labels, image segmentation is still completely plausible so long as you can use known anatomical constraints. If anyone reading this (haha) is in the medical imaging field, I hope you can use tools like this one to automate selection in your modality of choice. Of course, your modality probably produces higher-quality images than OCT... and probably has labels... but this is the best we have to look at little rodent ears!

Brian Frost

Using Anatomical Constraints to Facilitate Medical Image Segmentation

Recent Posts

Comentários