Why would you want a computer to read Braille? I’m working on a Braille ebook reader and I need to ensure its reliability over many refreshes. To start with I only need to verify the position of 12 pins (2 characters of 6 pins), but as the project progresses I’ll need to read multiple lines of 40 characters.
One option is an electrical system, but that would be complicated to build and then need changing as the system grows. I thought I’d start off with a camera based system: take a photo of the pins and then determine if they’re up or down.
Unfortunately, as the pins are quite small (1.5mm diameter) and dense (pin spacing is 2.5mm) the positioning of the camera is critical. And as we’re constantly making changes and taking the machine apart I wanted a system resilient to small changes in the camera’s orientation.
So I decided to investigate using opencv (a free computer vision library) to process the images returned from the camera. I’ve written up my journey here as a reference for myself and others, and because I find writing a good way of checking my own comprehension.
If you’re interested in following along, you can download my scripts and images from github. You’ll almost certainly need to compile opencv from source because the feature detection stuff isn’t in the latest packages. See here for instructions.
I’m a beginner with computer vision, and found the process quite difficult. For one thing, the documentation is fairly arcane and full of definitions I had to keep reading in wikipedia. An added complication is that cv2, (the modern opencv library for Python), uses numpy for all the array handling – which took me a bit of getting used to as well.
As an aside, I chose a second hand Canon power shot as it’s easily compatible with gphoto2, and set it up with fixed focus and large depth of field to keep the pins in focus. From 20cm away, the pin’s shadows are roughly 10px in diameter. I soldered wires to the battery terminals to avoid interrupting long test sequences due to flat batteries. The settings are applied and a photo is taken with this script.
To check if the idea had any chance of working I mocked up some large (3mm) pins.
The photo shows 2 empty pins, one pin high and 2 low. The shadow from the pin is easily visible and I thought this would be the perfect indicator for the status of a pin.
A fantastic early discovery was these fantastic tutorials for using the opencv Python library. Most of my programs are just copies of the example code with minor changes. Thanks Alexander Mordvintsev & Abid K!
The chessboard in the picture above is how we can calibrate our particular camera. Opencv assumes that cameras are perfect pinhole cameras, but in reality all cameras have some distortion. Luckily for me, there is a tutorial all about calibrating your camera and then undistorting images taken with it.
Unfortunately the code in the tutorials wasn’t working and it took me a while with ipdb to figure out that a couple of functions had changed their return values. One of the great things about open source software is that it allows people to easily get involved and fix issues. So I was able to fix the document for future users!
The image on the left shows successful recognition by findChessboardCorners() and subsequent drawing by drawChessboardCorners(). We can then use calibrateCamera() to dump the camera data to a file. The image on the right shows a (different) picture that’s been undistorted using the camera data and undistort(). You can see how straight the edges of the board are compared to those on the left. Also check out the old school computer in the background! These processes are nothing new. Here’s my program to do the camera calibration, and then to demonstrate undistortion.
So now I can get a good undistorted image from my camera, what next? While looking at another of those great tutorials, I was excited to learn about pose estimation, which turns out to be a really important part of what I want to do.
Once we’ve worked out the pose, we can transform various points onto the image using projectPoints(). So I thought if I could draw a line from the chess board to the pin, then I’d be able to read part of the image that would contain its shadow. This turned out to work really well. I used findChessBoardCorners(), then solvePnPRansac() to get the vectors. Then projectPoints() lets me translate an array containing the points where I expect my pins. Then I define a Region of Interest (ROI), convert the RGB values to HSV, and finally take the mean of the V. V varies from 0 to 255, and I found that over 150 meant the pin was down (no shadow) and less than 100 was the value of the shadow. Here’s the script.
Getting all the arguments and return values right and the arrays shaped correctly was a bit of a faff. Using ipdb for debugging and inspection was invaluable. I recommend it.
Hooray! But I wasn’t happy to stop there – my feeling was (and I admit I haven’t tested this), was that as the number of characters grows, the positioning of the chessboard gets ever more vital, as the pins furthest away from the chessboard will accumulate any positioning error. Wouldn’t a marker at each end work better? And now’s the right time to move to the correct sized and spaced pins.
Markers designed to be easy for a computer to track are called Fiducials, a great term I think! Unfortunately there isn’t a whole lot on the web about using them with opencv. In fact if you search the web for ‘fiducial multiple opencv’, the top post is my recent question on the opencv website!
This is an example of a fiducial, it’s from the ArTag project. Unfortunately their download link is broken at the moment so I couldn’t get hold of the high definition images. It turned out that it doesn’t matter for this application!
It seems that doing things with fiducials are a bit different to the chess board process. Now we use a dedicated feature detection algorithm to find the keypoints of the fiducial, and then the keypoints of the photo of the fiducial. There are lots of feature detectors out there, and I started with SIFT. Here are the keypoints drawn onto the 2 fiducials that I’m trying to match.
We then use a matcher to try to match the fiducial’s keypoints to the photo’s keypoints. The results can be sorted to find the pairs of keypoints that have the least ‘distance’. Distance in this case means how close the match is, and is calculated differently for different feature matching algorithms. We can specify a minimum distance and then check we have enough pairs left to get consistent results. Then we use the resulting closest pairs to calculate a homography.
From what I understand, once we can define the homography of 2 images (of the same thing) we can ask the computer to transform points from one image to the other. So in our case, if my fiducial is 50mm away from the pin, I can transform a point at 50mm onto the photo and then I can look at that area of the photo to find what state the pin is in.
I ran into a couple of problems with the ArTag fiducials. Firstly, without a white border around the marker, the SIFT feature detection didn’t get the outside black corners making detection a lot less reliable. Obvious in hindsight!
Secondly, I couldn’t work out how to use a pair of fiducials. I could find them both, get matches, but then had no idea how to merge the resultant homographies into one.
After a bit of head scratching I realised I could ask the computer to match the pair of fiducials at once, by merging the 2 originals into 1 larger one. Then I get a good match, and with only one resultant homography.
The final improvement I made was switching from SIFT to ORB, which isn’t patented, and is faster. I don’t really need speed for this application, but for offline processing and testing it makes a difference getting faster feedback.
Here we have the undistorted image, with the single fiducial matched (blue border shows the match). The 10px by 10px regions down and right of the pins are then tested for a shadow and the software so far gets 100% accuracy, even if the camera or board is moved. Success! And here’s the final script.