Ocr Matlab Code Download

  1. Matlab Code Examples
  2. Matlab Code For Image Segmentation
  3. Free Matlab Code

The following matlab project contains the source code and matlab examples used for optical character recognition. This example shows how to use the ocr function from the Computer Vision System Toolbox™ to perform Optical Character Recognition.

Automatically Detect and Recognize Text in Natural Images

This example shows how to detect regions in an image that contain text. This is a common task performed on unstructured scenes. Unstructured scenes are images that contain undetermined or random scenarios. For example, you can detect and recognize text automatically from captured video to alert a driver about a road sign. This is different than structured scenes, which contain known scenarios where the position of text is known beforehand.

Segmenting text from an unstructured scene greatly helps with additional tasks such as optical character recognition (OCR). The automated text detection algorithm in this example detects a large number of text region candidates and progressively removes those less likely to contain text.

Step 1: Detect Candidate Text Regions Using MSER

The MSER feature detector works well for finding text regions [1]. It works well for text because the consistent color and high contrast of text leads to stable intensity profiles.

Use the detectMSERFeatures function to find all the regions within the image and plot these results. Notice that there are many non-text regions detected alongside the text.

Step 2: Remove Non-Text Regions Based On Basic Geometric Properties

Although the MSER algorithm picks out most of the text, it also detects many other stable regions in the image that are not text. You can use a rule-based approach to remove non-text regions. For example, geometric properties of text can be used to filter out non-text regions using simple thresholds. Alternatively, you can use a machine learning approach to train a text vs. non-text classifier. Typically, a combination of the two approaches produces better results [4]. This example uses a simple rule-based approach to filter non-text regions based on geometric properties.

There are several geometric properties that are good for discriminating between text and non-text regions [2,3], including:

  • Aspect ratio

  • Eccentricity

  • Euler number

  • Extent

  • Solidity

Use regionprops to measure a few of these properties and then remove regions based on their property values.

Step 3: Remove Non-Text Regions Based On Stroke Width Variation

Another common metric used to discriminate between text and non-text is stroke width. Stroke width is a measure of the width of the curves and lines that make up a character. Text regions tend to have little stroke width variation, whereas non-text regions tend to have larger variations.

To help understand how the stroke width can be used to remove non-text regions, estimate the stroke width of one of the detected MSER regions. You can do this by using a distance transform and binary thinning operation [3].

In the images shown above, notice how the stroke width image has very little variation over most of the region. This indicates that the region is more likely to be a text region because the lines and curves that make up the region all have similar widths, which is a common characteristic of human readable text.

In order to use stroke width variation to remove non-text regions using a threshold value, the variation over the entire region must be quantified into a single metric as follows:

Then, a threshold can be applied to remove the non-text regions. Note that this threshold value may require tuning for images with different font styles.

The procedure shown above must be applied separately to each detected MSER region. The following for-loop processes all the regions, and then shows the results of removing the non-text regions using stroke width variation.

Step 4: Merge Text Regions For Final Detection Result

At this point, all the detection results are composed of individual text characters. To use these results for recognition tasks, such as OCR, the individual text characters must be merged into words or text lines. This enables recognition of the actual words in an image, which carry more meaningful information than just the individual characters. For example, recognizing the string 'EXIT' vs. the set of individual characters {'X','E','T','I'}, where the meaning of the word is lost without the correct ordering.

One approach for merging individual text regions into words or text lines is to first find neighboring text regions and then form a bounding box around these regions. To find neighboring regions, expand the bounding boxes computed earlier with regionprops. This makes the bounding boxes of neighboring text regions overlap such that text regions that are part of the same word or text line form a chain of overlapping bounding boxes.

Now, the overlapping bounding boxes can be merged together to form a single bounding box around individual words or text lines. To do this, compute the overlap ratio between all bounding box pairs. This quantifies the distance between all pairs of text regions so that it is possible to find groups of neighboring text regions by looking for non-zero overlap ratios. Once the pair-wise overlap ratios are computed, use a graph to find all the text regions 'connected' by a non-zero overlap ratio.

Use the bboxOverlapRatio function to compute the pair-wise overlap ratios for all the expanded bounding boxes, then use graph to find all the connected regions.

The output of conncomp are indices to the connected text regions to which each bounding box belongs. Use these indices to merge multiple neighboring bounding boxes into a single bounding box by computing the minimum and maximum of the individual bounding boxes that make up each connected component.

Finally, before showing the final detection results, suppress false text detections by removing bounding boxes made up of just one text region. This removes isolated regions that are unlikely to be actual text given that text is usually found in groups (words and sentences).

Step 5: Recognize Detected Text Using OCR

After detecting the text regions, use the ocr function to recognize the text within each bounding box. Note that without first finding the text regions, the output of the ocr function would be considerably more noisy.

This example showed you how to detect text in an image using the MSER feature detector to first find candidate text regions, and then it described how to use geometric measurements to remove all the non-text regions. This example code is a good starting point for developing more robust text detection algorithms. Note that without further enhancements this example can produce reasonable results for a variety of other images, for example, posters.jpg or licensePlates.jpg.

References

[1] Chen, Huizhong, et al. 'Robust Text Detection in Natural Images with Edge-Enhanced Maximally Stable Extremal Regions.' Image Processing (ICIP), 2011 18th IEEE International Conference on. IEEE, 2011.

[2] Gonzalez, Alvaro, et al. 'Text location in complex images.' Pattern Recognition (ICPR), 2012 21st International Conference on. IEEE, 2012.

[3] Li, Yao, and Huchuan Lu. 'Scene text detection via stroke width.' Pattern Recognition (ICPR), 2012 21st International Conference on. IEEE, 2012.

Matlab Code Examples

[4] Neumann, Lukas, and Jiri Matas. 'Real-time scene text localization and recognition.' Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

IInput image
M-by-N-by-3 truecolorimage | M-by-N 2-D grayscaleimage | M-by-N binary image

Input image, specified in M-by-N-by-3truecolor, M-by-N 2-D grayscale,or binary format. The input image must be a real, nonsparse value.The function converts truecolor or grayscale input images to a binaryimage, before the recognition process. It uses the Otsu’s thresholdingtechnique for the conversion. For best ocr results, the height ofa lowercase ‘x’, or comparable character in the inputimage, must be greater than 20 pixels. From either the horizontalor vertical axes, remove any text rotations greater than +/- 10 degrees,to improve recognition results.

Data Types: single | double | int16 | uint8 | uint16 | logical

One or more rectangular regions of interest, specified as an M-by-4element matrix. Each row, M, specifies a regionof interest within the input image, as a four-element vector, [xywidthheight].The vector specifies the upper-left corner location, [xy],and the size of a rectangular region of interest, [widthheight],in pixels. Each rectangle must be fully contained within the inputimage, I. Before the recognition process, thefunction uses the Otsu’s thresholding to convert truecolorand grayscale input regions of interest to binary regions. The functionreturns text recognized in the rectangular regions as an array ofobjects.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: ocr(I,'TextLayout','Block')

'TextLayout'Input text layout
'Auto' (default) | 'Block' | 'Line' | 'Word'

Input text layout, specified as the comma-separated pair consisting of 'TextLayout' and one of the following:

TextLayoutText Treatment
'Auto'Determines the layout and reading order of text blocks within the input image.
'Block'Treats the text in the image as a single block of text.
'Line'Treats the text in the image as a single line of text.
'Word'Treats the text in the image as a single word of text.

Use the automatic layout analysis to recognize text from a scanneddocument that contains a specific format, such as a double column.This setting preserves the reading order in the returned text. Youmay get poor results if your input image contains a few regions oftext or the text is located in a cluttered scene. If you get poorOCR results, try a different layout that matches the text in yourimage. If the text is located in a cluttered scene, try specifyingan ROI around the text in your image in addition to trying a differentlayout.

'Language'Language
'English' (default) | 'Japanese' | character vector | string scalar | cell array of character vectors | string array

Language to recognize, specified as the comma-separated pairconsisting of 'Language' and the character vector 'English', 'Japanese',or a cell array of character vectors. You can also install the Install OCR Language Data Files packagefor additional languages or add a custom language. Specifying multiplelanguages enables simultaneous recognition of all the selected languages.However, selecting more than one language may reduce the accuracyand increase the time it takes to perform ocr.

To specify any of the additional languages which are containedin the Install OCR Language Data Files package, use the languagecharacter vector the same way as the built-in languages. You do notneed to specify the path.

To use your own custom languages, specify the path to the trained data file as the language character vector. You must name the file in the format, <language>.traineddata. The file must be located in a folder named 'tessdata'. For example:

You can load multiple custom languages as a cell array of character vectors:The containing folder must always be the same for all the files specified in the cell array. In the preceding example, all of the traineddata files in the cell array are contained in the folder ‘path/to/tessdata’. Because the following code points to two different containing folders, it does not work. Some language files have a dependency on another language. For example, Hindi training depends on English. If you want to use Hindi, the English TrainerCodetraineddata file must also exist in the same folder as the Hindi

Matlab Code For Image Segmentation

traineddata file. The ocr only supports traineddata files created using tesseract-ocr 3.02 or using the OCR Trainer.

For deployment targets generated by MATLAB® Coder™:Generated ocr executable and language data file folder must be colocated.The tessdata folder must be named tessdata:

  • For English: C:/path/tessdata/eng.traineddata

  • For Japanese: C:/path/tessdata/jpn.traineddata

  • For custom data files: C:/path/tessdata/customlang.traineddata

  • C:/path/ocr_app.exe

You can copy the English and Japanese trained datafiles from:

'CharacterSet'Character subset
' all characters (default) | character vector | string scalar

Character subset, specified as the comma-separated pair consistingof 'CharacterSet' and a character vector. Bydefault, CharacterSet is set to the empty charactervector, '. The empty vector sets the functionto search for all characters in the language specified by the Language property.You can set this property to a smaller set of known characters toconstrain the classification process.

Free Matlab Code

The ocr function selects the best matchfrom the CharacterSet. Using deducible knowledgeabout the characters in the input image helps to improve text recognitionaccuracy. For example, if you set CharacterSet toall numeric digits, '0123456789', the functionattempts to match each character to only digits. In this case, a non-digitcharacter can incorrectly get recognized as a digit.