+ All Categories
Home > Documents > [IEEE 2013 4th International Symposium on Electrical and Electronics Engineering (ISEEE) - Galati,...

[IEEE 2013 4th International Symposium on Electrical and Electronics Engineering (ISEEE) - Galati,...

Date post: 20-Jan-2017
Category:
Upload: dan
View: 213 times
Download: 0 times
Share this document with a friend
6
A Spatial Pyramid Approach for Texture Classification Andreea Lavinia Popescu * , Radu Tudor Ionescu , Dan Popescu * * Faculty of Automatic Control and Computer Science, Politehnica University of Bucharest 313 Splaiul Independentei Street, Bucharest, Romania E-mails: [email protected], dan popescu [email protected] Department of Computer Science, University of Bucharest 14 Academiei Street, Bucharest, Romania E-mail: [email protected] Abstract—Texture classification, texture synthesis, or similar tasks are an active topic in computer vision and pattern recog- nition. This paper aims to present two spatial pyramid repre- sentations for texture classification. Most techniques designed for texture classification are based on machine learning. Images are usually represented as feature vectors, which are then used to train a classifier. In the spatial pyramid representation, images are divided into increasingly fine sub-regions (bins) and features are extracted from each bin. This representation is able to capture details about the fractal structure of the texture images. Two experiments are conducted on popular texture classification data sets, namely Brodatz and UIUCTex. In the experiments, several kernel representations and kernel classifiers are combined and evaluated. It seems that the spatial pyramid in combination with intersection kernel and Kernel Discriminant Analysis gives the best results. The proposed pyramid representations can improve the accuracy by as much as 5% over the standard feature representation, showing that the pyramid structure is indeed useful for texture classification. Keywordsspatial pyramid; pyramid representation; kernel methods; texture classification; box counting; fractal structure. I. I NTRODUCTION Texture classification, texture synthesis, or similar tasks are an active topic in computer vision and pattern recognition, having many practical applications. This paper describes two spatial pyramid representations for texture classification. Most techniques designed for texture classification are based on machine learning. Images are usually represented as feature vectors, which are then used to train a classifier. The trained classifier can be used to predict the class label of new texture samples. This paper proposes two spatial pyramid representations for texture classification that are designed to capture the fractal structure in texture images. It is well known that researchers have used fractal techniques to describe image textures and classify various types of texture-based images. Indeed, a cor- relation between fractal dimension and texture coarseness was found in the work of [1]. The entire research on this subject aims to describe the fractal structure by a single numerical value. This value can subsequently be used as a feature in a learning context. But, a single feature is not significant in a vector with many features. The approach presented in this paper aims to describe the fractal structure of the image in a completely different way. First, texture images are divided into increasingly fine sub-regions (also known as spatial bins) and features are extracted from each bin. Note that spatial bins are non-overlapping image regions that cover the entire texture image. The final feature vector is obtained by concatenating the features extracted from bins computed at multiple levels. This gives a pyramid representation of the image, similar to the approach of [2]. The spatial pyramid representations are compared with the standard feature representation on two experiments conducted on two popular texture classification data sets, namely Brodatz and UIUCTex. In the experiments, several kernel representa- tions and kernel classifiers are combined and evaluated. More precisely, the linear and the intersections kernels are combined with Support Vector Machines, Kernel Ridge Regression, or Kernel Discriminant Analysis. It seems that the spatial pyramid in combination with intersection kernel and Kernel Discriminant Analysis gives the best results. The proposed pyramid representations can improve the accuracy by as much as 5% over the standard feature representation, showing that the pyramid structure is indeed useful for texture classification. The paper is organized as follows. The two spatial pyramid representations are described in Section II. The texture features used in the spatial pyramid representation are presented in Sec- tion III. The kernel methods are described in Section IV. The texture classification experiments are presented in Section V. Finally, the conclusion is drawn in Section VI. II. SPATIAL PYRAMID APPROACH The two spatial pyramid representations described in this paper are designed to capture the fractal structure in texture images. It is well known that researchers have used fractal techniques to describe image textures and classify various types of texture-based images. The work of [1] showed the correlation between fractal dimension and texture coarseness. In a similar fashion, the work of [3] used fractal dimension to effectively classify natural texture. In [4] a comparative study of fractal approach versus statistical approach, in texture analysis of remotely sensed images, is made. Due to the simplicity and less calculus amount, the box counting algorithm is widely used to estimate the fractal dimensions of textured images with or without self- similarity. 978-1-4799-2442-4/13/$31.00 c 2013 IEEE
Transcript
Page 1: [IEEE 2013 4th International Symposium on Electrical and Electronics Engineering (ISEEE) - Galati, Romania (2013.10.11-2013.10.13)] 2013 4th International Symposium on Electrical and

A Spatial Pyramid Approach for TextureClassification

Andreea Lavinia Popescu∗, Radu Tudor Ionescu†, Dan Popescu∗∗Faculty of Automatic Control and Computer Science, Politehnica University of Bucharest

313 Splaiul Independentei Street, Bucharest, RomaniaE-mails: [email protected], dan popescu [email protected]

†Department of Computer Science, University of Bucharest14 Academiei Street, Bucharest, Romania

E-mail: [email protected]

Abstract—Texture classification, texture synthesis, or similartasks are an active topic in computer vision and pattern recog-nition. This paper aims to present two spatial pyramid repre-sentations for texture classification. Most techniques designed fortexture classification are based on machine learning. Images areusually represented as feature vectors, which are then used totrain a classifier. In the spatial pyramid representation, imagesare divided into increasingly fine sub-regions (bins) and featuresare extracted from each bin. This representation is able to capturedetails about the fractal structure of the texture images. Twoexperiments are conducted on popular texture classification datasets, namely Brodatz and UIUCTex. In the experiments, severalkernel representations and kernel classifiers are combined andevaluated. It seems that the spatial pyramid in combination withintersection kernel and Kernel Discriminant Analysis gives thebest results. The proposed pyramid representations can improvethe accuracy by as much as 5% over the standard featurerepresentation, showing that the pyramid structure is indeeduseful for texture classification.

Keywords—spatial pyramid; pyramid representation; kernelmethods; texture classification; box counting; fractal structure.

I. INTRODUCTION

Texture classification, texture synthesis, or similar tasks arean active topic in computer vision and pattern recognition,having many practical applications. This paper describes twospatial pyramid representations for texture classification. Mosttechniques designed for texture classification are based onmachine learning. Images are usually represented as featurevectors, which are then used to train a classifier. The trainedclassifier can be used to predict the class label of new texturesamples.

This paper proposes two spatial pyramid representationsfor texture classification that are designed to capture the fractalstructure in texture images. It is well known that researchershave used fractal techniques to describe image textures andclassify various types of texture-based images. Indeed, a cor-relation between fractal dimension and texture coarseness wasfound in the work of [1]. The entire research on this subjectaims to describe the fractal structure by a single numericalvalue. This value can subsequently be used as a feature ina learning context. But, a single feature is not significant ina vector with many features. The approach presented in thispaper aims to describe the fractal structure of the image in

a completely different way. First, texture images are dividedinto increasingly fine sub-regions (also known as spatial bins)and features are extracted from each bin. Note that spatial binsare non-overlapping image regions that cover the entire textureimage. The final feature vector is obtained by concatenatingthe features extracted from bins computed at multiple levels.This gives a pyramid representation of the image, similar tothe approach of [2].

The spatial pyramid representations are compared with thestandard feature representation on two experiments conductedon two popular texture classification data sets, namely Brodatzand UIUCTex. In the experiments, several kernel representa-tions and kernel classifiers are combined and evaluated. Moreprecisely, the linear and the intersections kernels are combinedwith Support Vector Machines, Kernel Ridge Regression,or Kernel Discriminant Analysis. It seems that the spatialpyramid in combination with intersection kernel and KernelDiscriminant Analysis gives the best results. The proposedpyramid representations can improve the accuracy by as muchas 5% over the standard feature representation, showing thatthe pyramid structure is indeed useful for texture classification.

The paper is organized as follows. The two spatial pyramidrepresentations are described in Section II. The texture featuresused in the spatial pyramid representation are presented in Sec-tion III. The kernel methods are described in Section IV. Thetexture classification experiments are presented in Section V.Finally, the conclusion is drawn in Section VI.

II. SPATIAL PYRAMID APPROACH

The two spatial pyramid representations described in thispaper are designed to capture the fractal structure in textureimages. It is well known that researchers have used fractaltechniques to describe image textures and classify varioustypes of texture-based images.

The work of [1] showed the correlation between fractaldimension and texture coarseness. In a similar fashion, thework of [3] used fractal dimension to effectively classifynatural texture. In [4] a comparative study of fractal approachversus statistical approach, in texture analysis of remotelysensed images, is made. Due to the simplicity and less calculusamount, the box counting algorithm is widely used to estimatethe fractal dimensions of textured images with or without self-similarity.978-1-4799-2442-4/13/$31.00 c©2013 IEEE

Page 2: [IEEE 2013 4th International Symposium on Electrical and Electronics Engineering (ISEEE) - Galati, Romania (2013.10.11-2013.10.13)] 2013 4th International Symposium on Electrical and

The entire research in the area of fractal dimension esti-mation aims to describe the fractal structure of texture imagesthrough a single numerical value. This value can subsequentlybe used as a feature in a learning context. But, a single featureis not significant in a vector with many features. The spatialpyramid approach presented in this paper aims to describe thefractal structure of the image using many features that areextracted using an algorithm similar to box counting. First,texture images are recursively divided into spatial bins orboxes. Instead of counting the number of filled boxes at eachlevel, in the spatial pyramid approach, texture-specific featuresare extracted from each bin. The final feature vector is obtainedby concatenating the features extracted from bins computed atmultiple levels. At the first level, features are extracted fromthe entire image. At the second pyramid level, the image isdivided into 2 × 2 spatial bins. In general, at each pyramidlevel, the bins obtained at the previous level are recursivelydivided in 2 × 2 bins. Thus, at the third level, 4 × 4 bins areobtained. The recursive division should be stopped before thebins become too small, since features extracted from very smallsub-regions are not relevant. For example, in the experiments,spatial pyramids are computed only on the first three levels.This is enough to improve the classification accuracy by up to5%. The spatial pyramid representation for texture images isdescribed in Figure 1.

The pyramid representation can capture some informationthat encodes the fractal structure of texture. But this informa-tion is only made useful in a learning context. The pyramidrepresentation contains the same features extracted at differentlevels and from different image regions. A machine learningmethod can identify what features are relevant for each spatialbin and what are the spatial relationships between the featuresextracted at different scales and different regions. It is clear thatthe spatial pyramid provides extra information to the machinelearning method. This information can be useful to improve theaccuracy of texture classification methods. In this work, stateof the art kernel methods are used for classification. Kernelmethods are based on similarity. Pyramid feature vectors aretransformed into pairwise similarity matrices using differentkernel representations, such as the linear kernel or the inter-section kernel.

It is interesting to mention that the spatial pyramid rep-resentation was also used for the bag of visual words modelin [2]. In computer vision, the bag of words model can beapplied to image classification and related tasks, by treatingimage descriptors as words. A bag of visual words is asparse vector of occurrence counts of a vocabulary of localimage features. This representation can also be described as ahistogram of visual words. The vocabulary is usually obtainedby vector quantizing image features into visual words [5].The standard bag of words model ignores spatial relationshipsbetween image features. Despite ignoring spatial information,visual words showed a high discriminatory power and havebeen used for region or image level classification [6, 7]. Theperformance improves when spatial information is included.This can be achieved by dividing the image into spatial bins.The frequency of each visual word is recorded in a histogramfor each bin. The final feature vector for the image is aconcatenation of these histogram. A more robust approach isto use a spatial pyramid, as the work of [2] suggests.

Fig. 1. The spatial pyramid representation is obtained by dividing the imageinto increasingly fine sub-regions (bins) and by extracting texture-specificfeatures from each bin.

In this work, two spatial pyramid representations are pro-posed. In the first representation, texture-specific features areextracted from each bin and recorded into feature vectors.In the second representation, the features are extracted onlyat the first pyramid level, more precisely, from the entireimage. At the second pyramid level, the absolute differencebetween the features extracted from the entire image and thefeatures extracted from each bin is recorded in the featurevectors representing the second-level bins. For the third level,the process is analogous to the second level. In other words,the features are recorded only once in the second pyramidrepresentation. But, the pyramid representation also recordsthe changes of these features in each sub-region of the image.Both representations show significant improvements in theexperiments.

III. TEXTURE FEATURES

A set of several image features is extracted from eachspatial bin to obtain the pyramid represention. There are 99features extracted from each bin, that are described next. Aninteresting remark is that the more features are added to thepyramid representation, the better the accuracy of the learningmethod gets. However, a lighter representation, such as the

Page 3: [IEEE 2013 4th International Symposium on Electrical and Electronics Engineering (ISEEE) - Galati, Romania (2013.10.11-2013.10.13)] 2013 4th International Symposium on Electrical and

one based on 99 features, results in a faster and more efficientalgorithm. One may choose to add or remove features in orderto obtain the desired trade-off between accuracy and speed.

The first two statistical features extracted are the meanand the standard deviation. These two basic features canbe computed indirectly, in terms of the image histogram.The shape of an image histogram provides many clues tocharacterize the image, but the features obtained from an imagehistogram are not always adequate to discriminate textures,since they are unable to indicate local intensity differences.

One of the most powerful statistical methods for texturedimage analysis is based on features extracted from the Gray-Level Co-Occurrence Matrix (GLCM), proposed in [8]. TheGLCM is a second order statistical measure of image variationand it gives the joint probability of occurrence of gray levelsof two pixels, separated spatially by a fixed vector distance.Smooth texture gives a co-occurrence matrix with high valuesalong diagonals for small distances. The range of gray levelvalues within a given image determines the dimensions of a co-occurrence matrix. Thus, 4 bits gray level images give 16×16co-occurrence matrices. Relevant statistical features for textureclassification can be computed from a GLCM. The featuresproposed by [8], which show a good discriminatory power,are the contrast, the energy, the entropy, the homogeneity, thevariance and the correlation. Among these features that showa good discriminatory power, only four of them are used inthe pyramid representation, namely the contrast, the energy,the homogeneity, and the correlation. They are computed on4 direction using gaps of 1 and 2 pixels. In total, there are 32features computed from the GLCM.

Another feature that is relevant for texture analysis is thefractal dimension. It provides a statistical index of complexitycomparing how detail in a fractal pattern changes with thescale at which it is measured. The fractal dimension is usuallyapproximated. The most popular method of approximation isbox counting [9]. The idea behind the box counting dimensionis to consider grids at different scale factors over the fractalimage, and count how many boxes are filled over each grid.The box counting dimension is computed by estimating howthis number changes as the grid gets finer, by applying abox counting algorithm. The box counting dimension of eachspatial bin is also included in the pyramid representation.

The work of [10] found that cells in the visual cortexof mammalian brains can be modeled by Gabor functions.Thus, image analysis by the Gabor functions is similar toperception in the human visual system. A set of Gabor filterswith different frequencies and orientations may be helpful forextracting useful features from an image.

The local isotropic phase symmetry measure (LIPSyM)presented in [11] takes the discrete time Fourier transform ofthe input image, and filters this frequency information througha bank of Gabor filters. The work of [11] also notes thatlocal responses of each Gabor filter can be represented interms of energy and amplitude. Thus, Gabor features, suchas the mean-squared energy and the mean amplitude, canbe computed through the phase symmetry measure for abank of Gabor filters with various scales and rotations. Thesefeatures are relevant because Gabor filters have been foundto be particularly appropriate for texture representation and

discrimination. Using 4 scales and 8 different rotations, thereare 64 Gabor features extracted from each spatial bin.

An important remark is that the final feature vectors mustbe normalized before the learning stage. After the featureextraction step, features are normalized in the [0, 1] interval,so that they have an equal contribution to the learning process.

IV. KERNEL METHODS

Kernel-based learning algorithms work by embedding thedata into a Hilbert space, and searching for linear relationsin that space using a learning algorithm. The embedding isperformed implicitly, that is by specifying the inner productbetween each pair of points rather than by giving their co-ordinates explicitly. The power of kernel methods lies in theimplicit use of a Reproducing Kernel Hilbert Space (RKHS)induced by a positive semi-definite kernel function. Despitethe fact that the mathematical meaning of a kernel is the innerproduct in a Hilbert space, another interpretation of a kernelis the pairwise similarity between samples.

The kernel function offers to the kernel methods the powerto naturally handle input data that is not in the form ofnumerical vectors, such as strings, images, or even video andaudio files. The kernel function captures the intuitive notion ofsimilarity between objects in a specific domain and can be anyfunction defined on the respective domain that is symmetricand positive definite. For images, many such kernel functionsare used in various applications including object recognition,image retrieval, or similar tasks. Popular choices are the linearkernel, the intersection kernel, the Hellinger’s kernel, the χ2

kernel or the Jensen-Shannon kernel. Another state of artapproach is the pyramid match kernel [2]. In this paper, thespatial pyramid representation is only combined with the linearand the intersection kernels. For two feature vectors x, z ∈ Rn,the linear kernel is defined by k(x, z) = 〈x, z〉, while theintersection kernel is given by k(x, z) =

∑i min {xi, zi}. Note

that 〈·, ·〉 denotes the inner product.

In the case of binary classification problems, kernel-basedlearning algorithms look for a discriminant function, a functionthat assigns +1 to examples belonging to one class and −1 toexamples belonging to the other class. This function will bea linear function in the space F , that means it will have theform:

f(x) = sign(< w,φ(x) > +b),

for some weight vector w and an embedding map φ. The kernelcan be exploited whenever the weight vector can be expressed

as a linear combination of the training points,n∑

i=1

αiφ(xi),

implying that f can be expressed as follows:

f(x) = sign

(n∑

i=1

αik(xi, x) + b

).

Various kernel methods differ by the way in which they findthe vector w (or equivalently the vector α). Support VectorMachines (SVM) try to find the vector w that defines thehyperplane that maximally separates the images in F of the

Page 4: [IEEE 2013 4th International Symposium on Electrical and Electronics Engineering (ISEEE) - Galati, Romania (2013.10.11-2013.10.13)] 2013 4th International Symposium on Electrical and

Fig. 2. Sample images from three classes of the Brodatz data set.

training examples belonging to the two classes. Mathemati-cally, the SVM classifier chooses the w and the b that satisfythe following optimization criterion:

minw,b

1

n

n∑i=1

[1− yi(< w,φ(xi) > +b)]+ + ν||w||2

where yi is the label (+1/−1) of the training example xi, ν aregularization parameter and [x]+ = max{x, 0}.

Kernel Ridge Regression (KRR) selects the vector w thatsimultaneously has small empirical error and small norm in theRKHS generated by the kernel k. The resulting minimizationproblem is:

minw

1

n

n∑i=1

(yi− < w,φ(xi) >)2 + λ||w||2

where again yi is the label (+1/−1) of the training examplexi, and λ a regularization parameter.

The Linear Discriminant Analysis (LDA) method, alsoknown as Fisher Discriminant Analysis, maximizes the ratio ofbetween-class variance to the within-class variance in order toguarantee maximal separability for a particular set of samples.The work of [12] derived the LDA approach for a twoclass problem, under the assumptions that the classes havenormal distributions and identical covariance matrices. Theassumption of identical covariance matrices implies that theBayes classifier is linear. Therefore, LDA provides a projectionof the data points to a one-dimensional subspace where theBayes classification error is smallest. The KDA method [13]is the kernel version of the LDA algorithm, which is somewhatsimilar to the KRR algorithm.

V. EXPERIMENTS AND RESULTS

A. Data Sets Description

The first data set used for testing the pyramid represen-tations presented in this paper is the Brodatz data set [14].This data set is probably the best known benchmark used fortexture classification, but also one of the most difficult, sinceit contains 111 classes with only 9 samples per class. Samplesof 213 × 213 pixels are cut using a 3 by 3 grid from largerimages of 640 × 640 pixels. Figure 2 presents three sampleimages per class of three classes randomly selected from theBrodatz data set.

Fig. 3. Sample images from four classes from the UIUCTex data set. Eachimage is showing a textured surface viewed under different poses.

The second experiment is conducted on the UIUCTex dataset of [15]. It contains 1000 texture images of 640×480 pixelsrepresenting different types of textures such as bark, wood,floor, water, and more. There are 25 classes of 40 textureimages per class. Textures are viewed under significant scale,viewpoint and illumination changes. Images also include non-rigid deformations. This data set is available for download athttp://www-cvr.ai.uiuc.edu/ponce grp. Figure 3 presents foursample images per class of four classes representing bark,brick, pebbles, and plaid.

B. Learning Methods

Several state of the art kernel methods are used in theexperiments, namely the KRR, the SVM, and the KDA. Fora particular classification problem, some kernel methods maybe more suitable than others. The accuracy level depends onmany aspects such as class distribution, the number of classes,data noise, size of the training data, and so on. For example,the KRR classifier can be used with success for problems withwell-balanced classes, while the KDA classifier may be moresuitable for many class problems. In some particular cases,when the number of classes is greater than 2, there is a seriousproblem with the regression methods. More precisely, someclasses can be masked by others. The KDA classifier is ableto improve accuracy by avoiding the masking problem [16].More details about SVM, KRR and KDA can be found in [13].The important fact is that these classifiers can be trained insuch a way that the coordinates of the embedded points arenot needed, only their pairwise inner products which in turnare given by the kernel function k. The dual representation ofthe spatial pyramid is obtained by the use of the linear kernelor the intersection kernel, respectively.

In both experiments, the linear and the intersection kernel iscombined with the three kernel classifiers, thus obtaining 6 dif-ferent classification methods. For each classification method,

Page 5: [IEEE 2013 4th International Symposium on Electrical and Electronics Engineering (ISEEE) - Galati, Romania (2013.10.11-2013.10.13)] 2013 4th International Symposium on Electrical and

TABLE I. ACCURACY RATES ON THE ENTIRE BRODATZ DATA SETUSING 3 RANDOM SAMPLES PER CLASS FOR TRAINING. THE TWO SPATIAL

PYRAMID REPRESENTATIONS ARE COMPARED WITH THE STANDARDREPRESENTATION USING A VARIETY OF KERNEL FUNCTIONS AND

CLASSIFIERS.

Features Kernel SVM KRR KDAStandard linear 90.67% 84.55% 90.45%Standard intersection 90.58% 89.18% 89.99%Pyramid 1 linear 86.49% 80.94% 89.13%Pyramid 1 intersection 88.66% 86.77% 88.97%Pyramid 2 linear 90.95% 81.14% 92.23%Pyramid 2 intersection 90.64% 88.94% 89.68%

various feature representations are used. The baseline featurerepresentation is given by texture features extracted from theentire image. Another two representations are given by thetwo spatial pyramids described in Section II. The goal of theexperiments is to compare the spatial pyramid representationswith the baseline (standard) representation and to assess theperformance level that can be obtained with the pyramidstructure.

C. Brodatz Experiment

In [15], the accuracy rate reported on the Brodatz data setusing 3 training samples per class is 88.15%. Table I comparesaccuracy rates of several kernel classifiers based on differentfeature representations, using the same setup with 3 randomsamples per class for training. In this experiment, the spatialpyramids were computed using only 3 levels, thus containing2079 features. The accuracy rates presented in Table I areactually averages of accuracy rates obtained over 50 runs foreach method.

The empirical results indicate that the first pyramid rep-resentation gives a lower accuracy rate than the baselinerepresentation. The worst result is obtained by the KRRmethod (80.94%), which seems to give the lowest accuracyrates, no matter the representation. Among the classifiers, thebest accuracy rates are obtained by KDA. Indeed, the bestaccuracy on this data set (92.23%) is obtained with KDAbased on the second pyramid representation and the linearkernel. This represents an improvement of 1.78% over thebaseline representation. However, the results of the pyramidrepresentations are not so impressive on this dataset. Thiscan be explained by the fact that the are not enough trainingsamples per class (only 3), for the kernel classifiers, to choosethe most discriminant features. Since there are 2079 featuresin the pyramid representation, it is very difficult to obtainthe right separating hyperplanes with only 333 samples. Thetask is much easier for the standard representation basedon 99 features. Despite the fact that the second pyramidrepresentation shows some improvements, it seems that, ingeneral, the pyramid representation must also be supportedby a larger number of training samples.

D. UIUCTex Experiment

In this experiment, the same classifiers evaluated on theBrodatz data set are also evaluated on the UIUCTex data set.The best accuracy level of the state of the art classifier on theUIUCTex data set, reported in [15] using 20 training samplesper class, is 97.41%.

TABLE II. ACCURACY RATES ON THE UIUCTEX DATA SET USING 20RANDOM SAMPLES PER CLASS FOR TRAINING. THE TWO SPATIAL

PYRAMID REPRESENTATIONS ARE COMPARED WITH THE STANDARDREPRESENTATION USING A VARIETY OF KERNEL FUNCTIONS AND

CLASSIFIERS.

Features Kernel SVM KRR KDAStandard linear 82.19% 75.66% 80.27%Standard intersection 83.62% 80.02% 82.97%Pyramid 1 linear 84.79% 79.52% 83.73%Pyramid 1 intersection 86.80% 84.97% 87.00%Pyramid 2 linear 78.99% 74.33% 82.61%Pyramid 2 intersection 85.24% 83.07% 86.06%

Table II shows accuracy rates of the kernel classifiers basedon the baseline representation versus the accuracy rates ofthe same kernel classifiers based on the two spatial pyramidrepresentations. The accuracy rates are averaged over 50 runsfor each method. Again, the spatial pyramids were computedusing only 3 levels. On this data set, both pyramid represen-tations are roughly 2− 5% better than the standard represen-tation, for all the three classifiers. For the KRR method, thegreatest improvement (4.95%) is achieved by the first pyramidrepresentation in combination with the intersection kernel. Itseems that the intersection kernel is always better than thelinear kernel on this dataset. The best performance for all threeclassifiers is obtained with the first pyramid representation incombination with the intersection kernel. The best accuracyof 87.00% is obtained with KDA. However, the performancelevels of the SVM and the KDA are similar. As in the previousexperiment, the KRR seems to achieve the worst accuracyrates. Overall, the empirical results show that both pyramidrepresentations improve the classification performance on theUIUCTex data set, especially when they are combined with theintersection kernel. Despite an accuracy rate comparable to thestate of the art is not achieved, the accuracy of the proposedmethods can always be improved by adding more features tothe representation.

VI. CONCLUSION

This work proposed two spatial pyramid representations fortexture classification. Experiments were conducted on populartexture classification data sets. In the experiments, severalkernel representations and kernel classifiers were combinedand evaluated. The proposed pyramid representations improvedthe accuracy by as much as 5% over the standard featurerepresentation, proving that the pyramid structure is indeeduseful for texture classification. In future work, the pyramidrepresentation can be further evaluated using different kernelrepresentations and classifiers. More features can also be addedto achieve better accuracy rates.

REFERENCES

[1] A. P. Pentland, “Fractal-Based Description ofNatural Scenes,” IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 6, no. 6,pp. 661–674, Nov. 1984. [Online]. Available:http://dx.doi.org/10.1109/TPAMI.1984.4767591

[2] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bagsof Features: Spatial Pyramid Matching for Recognizing

Page 6: [IEEE 2013 4th International Symposium on Electrical and Electronics Engineering (ISEEE) - Galati, Romania (2013.10.11-2013.10.13)] 2013 4th International Symposium on Electrical and

Natural Scene Categories,” Proceedings of CVPR, vol. 2,pp. 2169–2178, 2006.

[3] J. M. Keller, S. Chen, and R. M. Crownover, “Tex-ture description and segmentation through fractal geom-etry,” Computer Vision, Graphics and Image Processing,vol. 45, no. 2, pp. 150–166, Feb. 1989.

[4] S. W. Myint, “Fractal approaches in texture analysis andclassification of remotely sensed data: Comparisons withspatial autocorrelation techniques and simple descrip-tive statistics,” International Journal of Remote Sensing,vol. 24, no. 9, pp. 1925–1947, 2003.

[5] T. Leung and J. Malik, “Representing and Recogniz-ing the Visual Appearance of Materials using Three-dimensional Textons,” IJCV, vol. 43, no. 1, pp. 29–44,Jun. 2001.

[6] G. Csurka, C. R. Dance, L. Fan, J. Willamowski, andC. Bray, “Visual categorization with bags of keypoints,”In Workshop on Statistical Learning in Computer Vision,ECCV, pp. 1–22, 2004.

[7] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid,“Local Features and Kernels for Classification of Textureand Object Categories: A Comprehensive Study,” IJCV,vol. 73, no. 2, pp. 213–238, Jun. 2007.

[8] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textu-ral Features for Image Classification,” IEEE Transactionson Systems, Man and Cybernetics, vol. 3, no. 6, pp. 610–621, Nov. 1973.

[9] K. Falconer, Fractal Geometry: Mathematical Founda-tions and Applications, 2nd ed. Wiley, Nov. 2003.

[10] J. G. Daugman, “Uncertainty relation for resolution inspace, spatial frequency, and orientation optimized bytwo-dimensional visual cortical filters,” Journal of theOptical Society of America A, vol. 2, no. 7, pp. 1160–1169, 1985.

[11] M. Kuse, Y.-F. Wang, V. Kalasannavar, M. Khan, andN. Rajpoot, “Local isotropic phase symmetry measurefor detection of beta cells and lymphocytes,” Journal ofPathology Informatics, vol. 2, no. 2, p. 2, 2011.

[12] R. A. Fisher, “The Use of Multiple Measurements inTaxonomic Problems,” Annals of Eugenics, vol. 7, no. 7,pp. 179–188, 1936.

[13] J. Shawe-Taylor and N. Cristianini, Kernel Methods forPattern Analysis. Cambridge University Press, 2004.

[14] P. Brodatz, Textures: a photographic album for artistsand designers, ser. Dover pictorial archives. New York,USA: Dover Publications, 1966.

[15] S. Lazebnik, C. Schmid, and J. Ponce, “A Sparse Tex-ture Representation Using Local Affine Regions,” PAMI,vol. 27, no. 8, pp. 1265–1278, Aug. 2005.

[16] T. Hastie and R. Tibshirani, The Elements of StatisticalLearning, corrected ed. Springer, Jul. 2003.


Recommended