Traffic sign segmentation with classical image processing methods (Canny edge detection + color based segmentation)

Jacky Lim
10 min readOct 1, 2023

First of all, just as the title suggests, the current state-of-the-art in traffic signs segmentation problem (or virtually all computer vision tasks), deep learning models are not implemented. In this blog post, we will discuss only the application of a simple traditional image processing pipeline in detecting traffic signs. The workflow is coded in Python language, with aid of several external dependencies primarily OpenCV.

Before we dive into this image segmentation task, it’s assumed that the readers have some background knowledge in image processing and computer vision. For those who do not, online references will be inserted (as hyperlinks) along the way to guide the readers to some useful resources. So, alongside the unfolding of the technical details of each segmentation stage, I would not go too deep on the details of the methods, but will only cover the details that I deemed interesting and worth sharing.

Let’s get straight to the point by looking at the rough project workflow. Just like other machine learning / data science projects, every computer vision project commences with data collection and understanding the data after identifying the problem. Then, the image will undergo a series of image preprocessing, such as denoising, contrast enhancement and resizing. Next, Canny edge detection and color based segmentation are applied concurrently¹ on the preprocessed image, followed by shape detection on the outputs. Then the two resulting outputs are selected based on the areas of contours found. Eventually, bounding box surround the object of interest is constructed together with the computation of intersection over union. The following figure shows the high level overview of the segmentation pipeline.

Overview of project workflow
High-level overview of the traffic sign segmentation pipeline.

Download and understand the data

The data originates from Chinese Traffic Sign Recognition Database. There are 4170 training image data and this will be the dataset used in this project. Equally important is the annotation file which specifies the filenames, image widths & heights, bounding box coordinates and categories. Note that each image comes with only one ground truth bounding box, so we only have to identify one foreground object.

It is always a good practice to visualize a handful of images to get some knowledge of the background info.

import matplotlib.pyplot as plt
import os
import numpy
import cv2 as cv
import numpy as np

# "images/" is the local file path where all the 4170 images located at
files = np.random.choice(os.listdir("images/"), size=20, replace=False)
img_dir = "images/"

plt.figure(figsize=(15, 12))
for i, img_file in enumerate(files):
img = cv.imread(img_dir + img_file)

img_rgb = cv.cvtColor(img, cv.COLOR_BGR2RGB)
plt.subplot(5, 4, i+1)
plt.imshow(img_rgb)

plt.show()
Samples of images from the Chinese traffic sign database.

It is quite obvious that most object of interest (traffic signs): 1) are located at the center and is the focus of camera, 2) have distinct color from the surroundings, 3) come with distinct geometrical shapes (e.g. circles and rectangles).

With the above notion, traffic signs segmentation based on edge detection, color based segmentation and shapes detection should be well suited for the task.

Image preprocessing

During this stage, the image will go through these sequential processes:

  • Image denoising / blurring. Median filter of kernel size 3×3 is applied here due to its better edge preserving preservation and computational efficiency.
  • If the image is of low contrast, contrast enhancement would be applied.
  • Image resizing to constant width of 200 pixels. The interpolation method chosen is cv.INTER_AREA . It is similar to bilinear interpolation with some nuances. Please refer to this medium article for more insight.

Let’s break down the second bullet point statement. Here, skimage.exposure.is_low_contrast() function is utilized to detect low contrast image². Basically, to tell if an image is low contrast or not, the function employs the following expression:

where f(x, y) is the image pixel intensity parameterized by coordinates, x and y. By default, P99 and P1 are 99th and 1st percentile of the pixel values; 0.05 is the fraction threshold. P99, P1 and 0.05 shown above are changeable input arguments.

As for contrast enhancement, the process goes like the following:

  1. Change image color space from BGR to L*a*b color space. Be mindful that L channel encodes lightness.
  2. Split the image into its respective component channels. Apply histogram equalization on L channel.
  3. Merge the color channels to get back the 3D image array. Convert the color space back to BGR.
def contrast_enhance(img):
img_lab = cv.cvtColor(img, cv.COLOR_BGR2Lab)
L, a, b = cv.split(img_lab)
L = cv.equalizeHist(L)
img_lab_merge = cv.merge((L, a, b))
return cv.cvtColor(img_lab_merge, cv.COLOR_Lab2BGR)

Canny edge detection

I will not walk through what is happening under the hood, but you can have quite comprehensive understanding and demo here. From the practical standpoint, the most critical parameters of Canny edge detector are the hysterisis thresholds, which means these parameters have to be selected carefully to yield meaningful results. This begs a question: is there any way to automatically tune the hysterisis thresholds? The answer is yes and further info can be found from article 1 and article 2.

# canny edge detection
def auto_canny(img, method, sigma=0.33):
"""
Args:
img: grayscale image
method: Otsu, triangle, and median
sigma: 0.33 (default)
2 outputs:
edge_detection output, the high threshold for Hough Transform"""
if method=="median":
Th = np.median(img)

elif method=="triangle":
Th, _ = cv.threshold(img, 0, 255, cv.THRESH_TRIANGLE)

elif method=="otsu":
Th, _ = cv.threshold(img, 0, 255, cv.THRESH_OTSU)

else:
raise Exception("method specified not available!")

lowTh = (1-sigma) * Th
highTh = (1+sigma) * Th

return cv.Canny(img, lowTh, highTh), highTh

Color based segmentation

Looking at the traffic images displayed on the first figure of this post, it seems like the key colors would be red, blue, yellow and black. As such, the color based segmentation can be performed with the following steps:

  1. Define the lower and upper HSV color space tuples for each color. These tuples will be the input argument of cv.inRange() function. Geometrically, these tuples define some sort of boxes in the HSV color space, whereby if the input image voxels lie inside the box, it would be assigned “255” in the output array and “0” otherwise. The choices of tuple pairs for red, blue, yellow and black colors are as shown on the code snippet below.
  2. The 4 outputs from cv.inRange() which corresponds to 4 color masks are merged with bitwise OR operator.
  3. Apply morphological opening and closing to the output of (2). Opening is applied to remove noise and small white specks, while the purpose of closing is to join the breaks.
# Color based segmentation
# Color based segmentation (red, blue, yellow, black)
# Red color
lower_red1 = (0, 40, 50)
upper_red1 = (10, 255, 210)
lower_red2 = (165, 40, 50)
upper_red2 = (179, 255, 210)

# Blue color
lower_blue = (90, 40, 50)
upper_blue = (120, 255, 210)

# Yellow colors
lower_yellow = (20, 40, 50)
upper_yellow = (35, 255, 210)

# black colors
lower_black = (0, 0, 0)
upper_black = (179, 255, 5)

def color_seg(img, kernel_size=None):
"""Args:
img: image in bgr
kernel_size: None (default:(3, 3))"""
hsv_img = cv.cvtColor(img, cv.COLOR_BGR2HSV)

mask_red1 = cv.inRange(hsv_img, lower_red1, upper_red1)
mask_red2 = cv.inRange(hsv_img, lower_red2, upper_red2)
mask_blue = cv.inRange(hsv_img, lower_blue, upper_blue)
mask_yellow = cv.inRange(hsv_img, lower_yellow, upper_yellow)
mask_black = cv.inRange(hsv_img, lower_black, upper_black)

mask_combined = mask_red1 | mask_red2 | mask_blue | mask_yellow | mask_black

if kernel_size is not None:
kernel = np.ones(kernel_size, np.uint8)
else:
kernel = np.ones((3, 3), np.uint8)

mask_combined = cv.morphologyEx(mask_combined, cv.MORPH_OPEN, kernel)
mask_combined = cv.morphologyEx(mask_combined, cv.MORPH_CLOSE, kernel)

return mask_combined

Shapes detection

Shape detection focuses on 2 main shapes: circle and rectangle. For the circle detection, Hough Circle Tranform is applied⁴, whereas Douglas-Peucker contour approximation is applied to detect rectangular objects.

# rectangle detection (using Douglas-Peuker algorithm)
def cnt_rect(cnts, coef=0.1):
contour_list = []
for cnt in cnts:
peri = cv.arcLength(cnt, True)
approx = cv.approxPolyDP(cnt, coef*peri, True)
if len(approx) == 4:
contour_list.append(cnt)

if not contour_list:
return None
else:
LC = max(contour_list, key=cv.contourArea)
return LC

# circle detection
hough_circle_parameters = {
"dp": 1,
"minDist": 150,
"param1": 200, # adaptively change according to image
"param2": 15,
"minRadius": 10,
"maxRadius": 100
}
def cnt_circle(img, hough_dict):
"""Args:
img: Grayscale Image after resizing
cnt: contour
hough_dict: hough_circle_transform parameters"""
mask = np.zeros_like(img)
circles = cv.HoughCircles(img,
cv.HOUGH_GRADIENT,
hough_dict["dp"],
hough_dict["minDist"],
param1=hough_dict["param1"],
param2=hough_dict["param2"],
minRadius=hough_dict["minRadius"],
maxRadius=hough_dict["maxRadius"])
if circles is None:
return circles
else:
# perform LCA
list_circles = circles[0]
largest_circles = max(list_circles, key=lambda x: x[2])
center_x, center_y, r = largest_circles
cv.circle(mask, (int(center_x), int(center_y)), int(r), 255)
cnts = cv.findContours(mask, cv.RETR_EXTERNAL, cv.CHAIN_APPROX_SIMPLE)
cnt = cnts[0]
if len(cnts[0])>0:
return max(cnt, key=cv.contourArea)
else:
return cnt[-1]

Contour detection and filtering

Contour detection is applied on both edge detection and color based segmentation outputs. Only largest contour (ranked by area) retained as the output³.

# combine the results of 2 shape detectors
def integrate_circle_rect(rect_cnt, circle_cnt, cnt):
if circle_cnt is not None and rect_cnt is not None:
# compare the area
if cv.contourArea(circle_cnt) >= cv.contourArea(rect_cnt):
output = circle_cnt
else:
output = rect_cnt

elif circle_cnt is not None and rect_cnt is None:
output = circle_cnt

elif circle_cnt is None and rect_cnt is not None:
output = rect_cnt

else:
if len(cnt)==0:
return np.array([])
else:
output = max(cnt, key=cv.contourArea)

return output

# combine the results of edge detector + color based segmentation followed by shape detection combined results
def integrate_edge_color(output1, output2):
if not isinstance(output1, np.ndarray):
output1 = np.array(output1)

if not isinstance(output2, np.ndarray):
output2 = np.array(output2)

if len(output1)==0 and len(output2)==0:
return np.array([])

elif len(output1)==0 and output2.shape[-1]==2:
return output2

elif len(output2)==0 and output1.shape[-1]==2:
return output1

else:
if cv.contourArea(output1[0]) > cv.contourArea(output2[0]):
return output1
else:
return output2

Bounding box and computation of IOU

Simply speaking, bounding box is rectangular (sometimes square) region that localize the object of interest. In our case, the object of interest refers to traffic sign. The IOU which quantify the segmentation quality of our pipeline can be calculated as shown below:

where A is the ground truth bounding box; B is the predicted bounding box⁵.

Results

IOU statistics across all 4170 images. Fairly good mean and median IOU but the distribution is quite scattered and polarized as demonstrated by the high standard deviation.
Distribution of IOUs.

By inspecting the histogram above, there is a considerable number of images with very low IOUs. In fact, there are 236 images with zero IOUs, which means there is no overlap between predicted and ground truth bounding boxes. Let’s have a look at some of these “problematic” images.

Images with zero IOUs.

It can be deduced that illumination changes, image blurriness and similarity of background and foreground do play vital parts in the success of the segmentation pipeline. Empirically, it appears that we have missed out on quite a lot of triangular signs, which imply the need of triangular shape detection technique to be incorporated in the pipeline.

Summary

To wrap up, a simple traffic signs segmentation was implemented. A median and mean IOU score of approximately 0.85 and 0.7 were achieved. However, there is a catch here. Even though the IOU scores of most images look good (more than 0.5), this may be deceptively optimistic as the image is of small size (obviously cropped from image captured through cameras) and rather simple background, which does not reflect the real scenarios on the road.

By the way, these are the “significant” parameters that can substantially impact the segmentation performance:

  • HSV color space values selection.
  • Methods to determine the thresholds for the automatic Canny edge detection.
  • Hough circle transform parameters, like minimum distance between circle centers and accumulator thresholds.
  • Epsilon of the Douglas-Peucker algorithm (rectangle detection).

These are the other less significant parameters: 1) median filter kernel size and 2) Resizing dimension and interpolation method.

All in all, what we can safely say is that this pipeline works well only on cropped images in such a way that there is only one focused object in good lighting condition and the background is not cluttered and complex.

Last but not least, all the source code can be found on this Github repo⁶.

Hope you have enjoyed the ride. Thank you and see you again next time!

Acknowledgement

Credit to the group project done by See Moon (tanseemoon@1utar.my), Wai Kin (kennethkongwk@1utar.my), Hou Yan (celineboey@1utar.my) and Adele (adelelimhuihui@1utar.my) who are currently pursuing their bachelor degree of Computer Science in UTAR. This post, largely inspired by their work is published under the consent from them.

Footnotes

[1] Theoretically, it would be more efficient to run both algorithms in parallel but the implementation are done sequentially due to limited knowledge about parallel computing. Nonetheless the execution time to produce a prediction is within 2s depending on the hardware specification.

[2] I am personally shocked at the scarcity of information regarding the algorithm of this function available online. So, I had checked the source code on GitHub and share the details on this post.

[3] This output will be foreground (i.e. traffic sign mask) in an image.

[4] The parameters of Hough Circle Transform is set such that there will be only one circle being detected on the image.

[5] The computation of IOU will still be valid if A is prediction while B is the ground truth.

[6] This project is written using Python 3.9.0 and OpenCV 4.5.5.

--

--

Jacky Lim

Currently a lecturer in a private university in Malaysia