Image Segmentation
{ Home | Research | People | Awards | Publications }
Handwritten Text Segmentation using Average Longest Path Algorithm
D. Salvi, J. Zhou, J. Waggoner, S. Wang
IEEE Workshop on the Applications of Computer Vision (WACV), 2013
PDF
Abstract
Offline handwritten text recognition is a very challenging problem. Aside from the large variation of different handwriting styles, neighboring characters within a word are usually connected, and we may need to segment a word into individual characters for accurate character recognition. Many existing methods achieve text segmentation by evaluating the local stroke geometry and imposing constraints on the size of each resulting character, such as the character width, height and aspect ratio. These constraints are well suited for printed texts, but may not hold for handwritten texts. Other methods apply holistic approach by using a set of lexicons to guide and correct the segmentation and recognition. This approach may fail when the lexicon domain is insufficient. In this paper, we present a new global nonholistic method for handwritten text segmentation, which does not make any limiting assumptions on the character size and the number of characters in a word. Specifically,the proposed method finds the text segmentation with the maximum average likeliness for the resulting characters. For this purpose, we use a graph model that describes the possible locations for segmenting neighboring characters, and we then develop an average longest path algorithm to identify the globally optimal segmentation. We conduct experiments on real images of handwritten texts taken from the IAM handwriting database and compare the performance of the proposed method against an existing text segmentation algorithm that uses dynamic programming.
A New Inference Framework for Dependency Networks
F. Chen, Q. Cheng, H. Liu, W. Xu, S. Wang
Communications in Statistics – Theory and Methods, 2013
PDF
Abstract
Dependency networks (DNs) have been receiving more attention recently because their structures and parameters can be easily learned from data. The full conditional distributions (FCDs) are known conditions of DNs. Gibbs sampling is currently the most popular inference method on DNs. However, sampling methods converge slowly and it can be hard to diagnose their convergence. In this article, we introduce a set of linear equations to describe the relations between joint probability distributions (JPDs) and FCDs. These equations provide a novel perspective to understand reasoning on DNs. Based on these linear equations, we develop both exact and approximate algorithms for inference on DNs. Experiments show that the proposed approximate algorithms can produce effective results by maintaining low computational complexity.
Automatic Localization of Solid Organs on 3D CT Images by a Collaborative Majority Voting Decision based on Ensemble Learning
X. Zhou, S. Wang, H. Chen, T. Hara, R. Yokoyama, M. Kanematsu, H. Fujita
Computerized Medical Imaging and Graphics, 2012
PDF
Abstract
PURPOSE: Organ segmentation is an essential step in the development of
computer-aided diagnosis/surgery systems based on computed tomography (CT)
images. A universal segmentation approach/scheme that can adapt to different
organ segmentations can substantially increase the efficiency and robustness of
such computer-aided systems. However, this is a very challenging problem. An
initial determination of the approximate position and range of a target organ in
CT images is prerequisite for precise organ segmentations. In this study, we have
proposed a universal approach that enables automatic localization of the
approximate position and range of different solid organs in the torso region on
three-dimensional (3D) CT scans.
METHODS: The location of a target organ in a 3D CT scan is presented as a 3D
rectangle that bounds the organ region tightly and accurately. Our goal was to
automatically and effectively detect such a target organ-specific 3D rectangle.
In our proposed approach, multiple 2D detectors are trained using ensemble
learning and their outputs are combined using a collaborative majority voting in
3D to accomplish the robust organ localizations.
RESULTS: We applied this approach to localize the heart, liver, spleen,
left-kidney, and right-kidney regions independently using a CT image database
that includes 660 torso CT scans. In the experiment, we manually labeled the
abovementioned target organs from 101 3D CT scans as training samples and used
our proposed approach to localize the 5 kinds of target organs separately on the
remaining 559 torso CT scans. The localization results of each organ were
evaluated quantitatively by comparing with the corresponding ground truths
obtained from the target organs that were manually labeled by human operators.
Experimental results showed that success rates of such organ localizations were
distributed from 99% to 75% of the 559 test CT scans. We compared the performance
of our approach with an atlas-based approach. The errors of the detected
organ-center-positions in the successful CT scans by our approach had a mean
value of 5.14 voxels, and those errors were much smaller than the results (mean
value about 25 voxels) from the atlas-based approach. The potential usefulness of
the proposed organ localization was also shown in a preliminary investigation of
left kidney segmentation in non-contrast CT images.
CONCLUSIONS: We proposed an approach to accomplish automatic localizations of
major solid organs on torso CT scans. The accuracy of localizations, flexibility
of localizations of different organs, robustness to contrast and non-contrast CT
images, and normal and abnormal patient cases, and computing efficiency were
validated on the basis of a large number of torso CT scans.
Recursive Sum–Product Algorithm for Generalized Outer-Planar Graphs
Q. Cheng, F. Chen, W. Xu, S. Wang
Information Processing Letters, 2012
PDF
Abstract
Inference on cyclic graphs is one of the most important problems in the applications of graphical models. While exact inference is NP-hard on general cyclic graphs, it has been found that exact inference can be achieved with a computational complexity as low as O(Nm3) on the outer-planar graph, which is a special kind of cyclic graph. In this paper, we introduce a new kind of cyclic graph, the generalized outer-planar (GOP) graph, which is more general than the outer-planar graph and show that the exact inference on the GOP graphs can be achieved in O(Nm3) by a recursive sum-product (RSP) algorithm. RSP exploits the property of GOP graphs that the faces are reducible, and brings a ''face elimination'' procedure to compute the marginals exactly. Furthermore, RSP can be implemented on general cyclic graphs to obtain approximate marginals. Experimental results show the effectiveness of approximate RSP on various graphical models.
Grain Segmentation of 3D Superalloy Images Using MultiChannel EWCVT under Human Annotation Constraints
Y. Cao, L. Ju, S. Wang
European Conference on Computer Vision (ECCV), 2012
PDF
Abstract
Grain segmentation on 3D superalloy images provides superalloy's micro-structures, based on which many physical and mechanical properties can be evaluated. This is a challenging problem in senses of (1) the number of grains in a superalloy sample could be thousands or even more; (2) the intensity within a grain may not be homogeneous; and (3) superalloy images usually contains carbides and noises. Recently, the Multichannel Edge-Weighted Centroid Voronoi Tessellation (MCEWCVT) algorithm was developed for grain segmentation and showed better performance than many widely used image segmentation algorithms. However, as a general-purpose clustering algorithm, MCEWCVT does not consider possible prior knowledge from material scientists in the process of grain segmentation. In this paper, we address this issue by defining an energy minimization problem which subject to certain constraints. Then we develop a Constrained Multichannel Edge-Weighted Centroid Voronoi Tessellation (CMEWCVT) algorithm to effectively solve this constrained minimization problem. In particular, manually annotated segmentation on a very small set of 2D slices are taken as constraints and incorporated into the whole clustering process. Experimental results demonstrate that the proposed CMEWCVT algorithm significantly improve the previous grain-segmentation performance.
Graph Cut Approaches for Materials Segmentation Preserving Shape, Appearance, and Topology
J. Waggoner, J. Simmons, M. De Graef, S. Wang
International Conference on 3D Materials Science, 2012
PDF
Abstract
Segmenting material images into underlying objects is an important but challenging problem given object complexity and image noise. Consistency of shape, appearance, and topology among the underlying objects are critical properties of materials images and can be considered as criteria to improve segmentation. For example, some materials may have objects with a specific shape or appearance in each serial section slice, which only changes minimally from slice to slice; and some materials may exhibit specific inter-object topology which constrains their neighboring relations. In this paper, we develop new graph-cut based approaches for materials science image segmentation. Specifically, these approaches segment image volumes by repeatedly propagating a 2D segmentation from one slice to another. We introduce different terms into the graph-cut cost function to enforce desirable shape, appearance, and topology consistency. We justify the effectiveness of the proposed approaches by using them to segment sequences of serial-section images of different materials.
Combining Global Labeling and Local Relabeling for Metallic Image Segmentation
J. Waggoner, J. Simmons, S. Wang
Proceedings of SPIE Volume 8296 (Computational Imaging X), 2012
PDF
Abstract
Accurately segmenting a series of 2D serial-sectioned images for multiple, contiguous 3D structures has important applications in medical image processing, video sequence analysis, and materials science image segmentation. While 2D structure topology is largely consistent across consecutive serial sections, it may vary locally because a 3D structure of interest may not span the entire 2D sequence. In this paper, we develop a new approach to address this challenging problem by considering both the global consistency and possible local inconsistency of the 2D structural topology. In this approach, we repeatedly propagate a 2D segmentation from one slice to another, and we formulate each step of this propagation as an optimal labeling problem that can be efficiently solved using the graph-cut algorithm. Specifically, we divide the optimal labeling into two steps: a global labeling that enforces topology consistency, and a local labeling that identifies possible topology inconsistency. We justify the effectiveness of the proposed approach by using it to segment a sequence of serial-section microscopic images of an alloy widely used in material sciences and compare its performance against several existing image segmentation methods.
Graph-cut Methods for Grain Boundary Segmentation
S. Wang, J. Waggoner, J. Simmons
JOM Journal of the Minerals, Metals and Materials Society, 2011
PDF
Abstract
This paper reviews the recent progress on using graph-cut methods for image segmentation, and discusses their applications to segmenting grain boundaries from materials science images.
A Multichannel Edge-Weighted Centroidal Voronoi Tessellation Algorithm for 3D Superalloy Image Segmentation
Y. Cao, L. Ju, Q. Zou, C. Qu, S. Wang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011
PDF
Abstract
In material science and engineering, the grain structure inside a super-alloy sample determines its mechanical and physical properties. In this paper, we develop a new Multichannel Edge-Weighted Centroidal Voronoi Tessellation (MCEWCVT) algorithm to automatically segment all the 3D grains from microscopic images of a super-alloy sample. Built upon the classical k-means/CVT algorithm, the proposed algorithm considers both the voxel-intensity similarity within each cluster and the compactness of each cluster. In addition, the same slice of a super-alloy sample can produce multiple images with different grain appearances using different settings of the microscope. We call this multichannel imaging and in this paper, we further adapt the proposed segmentation algorithm to handle such multichannel images to achieve higher grain-segmentation accuracy. We test the proposed MCEWCVT algorithm on a 4-channel Ni-based 3D super-alloy image consisting of 170 slices. The segmentation performance is evaluated against the manually annotated ground-truth segmentation and quantitatively compared with other six image segmentation/edge-detection methods. The experimental results demonstrate the higher accuracy of the proposed algorithm than the comparison methods.
Segmentation of Laryngeal High Speed Videoendoscopy in Temporal Domain using Paired Active Contours
H. Moukalled, D. Deliyski, R. Schwarz, S. Wang
International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA), 2009
PDF
Abstract
This paper introduces a method for segmention of the vocal-fold edges in temporal domain from laryngeal high-speed videoendoscopy (HSV). The method employs a pair of active contours (snakes), which deform within a series of kymographic images derived from the HSV data. By following a set of deformation rules, this pair of active contours converges to the desired boundaries of the glottis. The proposed method was tested on a dataset of 98 HSV samples, of which 96 were successfully segmented. The new method substantially outperforms existing methods. However, more precise analysis revealed that of the 96 successfully segmented HSV samples, 18 exhibited a fine error up to ±1 pixel, and 78 samples exhibited errors exceeding a pixel. The large majority of the gross errors (76%) were due to problems near the posterior and anterior commissures, which warrants further investigation for improving the accuracy and reliability of the method.
New Benchmark for Image Segmentation Evaluation
F. Ge, S. Wang, T. Liu
Journal of Electronic Imaging, 2007
PDF
Abstract
Image segmentation and its performance evaluation are very difficult but important problems in computer vision. A major challenge in segmentation evaluation comes from the fundamental conflict between generality and objectivity: For general-purpose segmentation, the ground truth and segmentation accuracy may not be well defined, while embedding the evaluation in a specific application, the evaluation results may not be extensible to other applications. We present a new benchmark to evaluate five different image segmentation methods according to their capability to separate a perceptually salient structure from the background with a relatively small number of segments. This way, we not only find a large variety of images that satisfy the requirement of good generality, but also construct ground-truth segmentations to achieve good objectivity. We also present a special strategy to address two important issues underlying this benchmark: (1) most image-segmentation methods are not developed to directly extract a single salient structure; (2) many real images have multiple salient structures. We apply this benchmark to evaluate and compare the performance of several state-of-the-art image segmentation methods, including the normalized-cut method, the watershed method, the efficient graphbased method, the mean-shift method, and the ratio-cut method.
Image-Segmentation Evaluation From the Perspective of Salient Object Extraction
F. Ge, S. Wang, T. Liu
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2006
PDF
Abstract
Image segmentation and its performance evaluation are very difficult but important problems in computer vision. A major challenge in segmentation evaluation comes from the fundamental conflict between generality and objectivity: For general-purpose segmentation, the ground truth and segmentation accuracy may not be well defined, while embedding the evaluation in a specific application, the evaluation results may not be extended to other applications. We present in this paper a new benchmark for evaluating image segmentation. Specifically, we formulate image segmentation as identifying the single most perceptually salient structure from an image. We collect a large variety of test images that conforms to this specific formulation, construct unambiguous ground truth for each image, and define a reliable way to measure the segmentation accuracy. We then present two special strategies to further address two important issues: (a) the most salient structures in some real images may not be unique or unambiguously defined, and (b) many available image-segmentation methods are not developed to directly extract a single salient structure. Finally, we apply this benchmark to evaluate and compare the performance of several state-of-the-art image-segmentation methods, including the normalized-cut method, the level-set method, the efficient graph-based method, the mean-shift method, and the ratio-contour method.
Image Segmentation with Ratio Cut
S. Wang, J. M. Siskind
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2003
PDF
Abstract
This paper proposes a new cost function, cut ratio, for segmenting images using graph-based methods. The cut ratio is defined as the ratio of the corresponding sums of two different weights of edges along the cut boundary and models the mean affinity between the segments separated by the boundary per unit boundary length. This new cost function allows the image perimeter to be segmented, guarantees that the segments produced by bipartitioning are connected, and does not introduce a size, shape, smoothness, or boundarylength bias. The latter allows it to produce segmentations where boundaries are aligned with image edges. Furthermore, the cut-ratio cost function allows efficient iterated region-based segmentation as well as pixel-based segmentation. These properties may be useful for some image-segmentation applications. While the problem of finding a minimum ratio cut in an arbitrary graph is NP-hard, one can find a minimum ratio cut in the connected planar graphs that arise during image segmentation in polynomial time. While the cut ratio, alone, is not sufficient as a baseline method for image segmentation, it forms a good basis for an extended method of image segmentation when combined with a small number of standard techniques. We present an implemented algorithm for finding a minimum ratio cut, prove its correctness,discuss its application to image segmentation, and present the results of segmenting a number of medical and natural images using our techniques.
Image Segmentation with Minimum Mean Cut
S. Wang, J. M. Siskind
IEEE International Conference on Computer Vision (ICCV), 2001
PDF
Abstract
We introduce a new graph-theoretic approach to image segmentation based on minimizing a novel class of 'mean cut' cost functions. Minimizing these cost functions corre- sponds to finding a cut with minimum mean edge weight in a connected planar graph. This approach has several advan- tages over prior approaches to image segmentation. First, it allows cuts with both open and closed boundaries. Sec- ond, it guarantees that the partitions are connected. Third, the cost function does not introduce an explicit bias, such as a preference for large-area foregrounds, smooth or short boundaries, or similar-weight partitions. This lack of bias allows it to produce segmentations that are better aligned with image edges, even in the presence of long thin re- gions. Finally, the global minimum of this cost function is largely insensitive to the precise choice of edge-weight function. In particular, we show that the global minimum is invariant under a linear transformation of the edge weights and thus insensitive to image contrast. Building on algo- rithms by Ahuja, Magnanti, and Orlin (1993), we present a polynomial-time algorithm for finding a global minimum of the mean-cut cost function and illustrate the results of ap- plying that algorithm to several synthetic and real images.