Preview only show first 10 pages with watermark. For full document please download

Mri Mammogram Image Classification Using Id3 And Ann

Breast cancer is one of the most common forms of cancer in women. In order to reduce the death rate , early detection of cancerous regions in mammogram images is needed. The existing system is not so accurate and it is time consuming. The Proposed

   EMBED


Share

Transcript

  International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), © IAEME 241 MRI MAMMOGRAM IMAGE CLASSIFICATION USING ID3 ANDANN (1) S.Pitchumani Angayarkanni, (2) Dr.Nadira Banu Kamal  Department of Computer Science, Lady Doak College,Madurai, Tamil Nadu, India(10 Italic) [email protected],  Department of M.C.A, TBAK College ,Kilakarai,Ramnad,Tamil Nadu,India ABSTRACT Breast cancer is one of the most common forms of cancer in women. In order to reducethe death rate , early detection of cancerous regions in mammogram images is needed. Theexisting system is not so accurate and it is time consuming. The Proposed system is mainly usedfor automatic segmentation of the mammogram images and classify them as benign,malignant ornormal based on the decision tree ID3 algorithm. A hybrid method of data mining technique isused to predict the texture features which play a vital role in classification. The sensitivity, thespecificity, positive prediction value and negative prediction value of the proposed algorithmaccounts to 93.45% , 99.95%,94% and 98.5% which rates very high when compared to theexisting algorithms. The size and the stages of the tumor is detected using the ellipsoid volumeformula which is calculated over the segmented region.Keywords :GLCM,Texture,SOM ID3algorith and ANN. 1.   INTRODUCTION Breast cancer has been determined to be the second leading cause of cancer death inwomen, and the most common type of cancer in women. The mammography is the best methodof diagnosis by images that exists at the present time to detect minimum mammary injuries,fundamentally small carcinomas that are shown by micro calcifications or tumors smaller than1cm. of diameter that are not palpated during medical examination. [Antonie et al , 2001].Currently, joint efforts are being made in order to detect tissue anomalies in a timely fashion,given that there are no methods for breast cancer prevention. Early detection has proved anessential weapon in cancer detection, since it helps to prolong patients' lives. Physiciansproviding test results must have diagnostic training based on mammography, and must issue acertain number of reports annually. Double reading of reports increases sensitivity for detectionof minimal lesions by about 7%, though at a high cost. The physician shall then interpret thesereports and determine the steps to be taken for the proper diagnosis and treatment of the patient.for this reason, physicists, engineers, and physicians are in search of new tools to fight cancer,which would also allow physicians to obtain a second opinion [Gokhale et al , 2003, Simoff  et al ,   INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print)ISSN 0976 – 6375(Online ) Volume 3, Issue 1, January- June (2012), pp. 241-249© IAEME: www.iaeme.com/ijcet.html Journal Impact Factor (2011): 1.0425 (Calculated by GISI)  www.jifactor.com   IJCET © I A E M E  International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), © IAEME 242 2002]. Different methods have been used to classify and/or detect anomalies in medical images,such as wavelets, fractal theory, statistical methods and most of them used features extractedusing image processing techniques[1]. In addition, some other methods were presented in theliterature based on fuzzy set theory, Markov models and neural networks. Most of the computer-aided methods proved to be powerful tools that could assist medical staff in hospitals and lead tobetter results in diagnosing a patient [Antonie et al , 2001]. Different studies on using data miningin the processing of medical images have rendered very good results using neural networks forclassification and grouping. In recent years different computerized systems have been developedto support diagnostic work of radiologists in mammography. The proposed method includes thefollowing phases i)Image Pre-processing and enhancement ii)Segmentation iii)Classificationusing ID3 Algorithm, iv) predicting size and stages and v) accuracy of algorithm prediction. 2. IMAGE PRE-PROCESSING AND ENHANCEMENT The main objective of pre-processing is to enhance the image and remove unwanted data.This is done by using gabor filter and histogram equalization. Gabor wavelet filters smooth theimage by blocking detailed information. Mass detection aims to extract the edge of the tumorfrom surrounding normal tissues and background. PSNR, RMS, MSE, NSD, ENL valuecalculated for each of 121 pairs of mammogram images clearly shows that gabor wavelet filterwhen applied to mammogram image leads to best Image Quality[4].The orientation and scalecan be changed in this program to extract texture information. Here 3 scales and 4 orientation wasused[9].Input Image Output ImageUnprocessed Enhanced using – gabor filterHistogram Equalized Image Figure 1: Results after preprocessing and enhancement  International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), © IAEME 243 Table 1: Signal to Noise ratio calculation The results clearly show that the gabor filter with histogram equalization produces high PSNRvalue indicating that the image is highly enhanced. 3. TEXTURE BASED APPROACH AND SOM BASED VISUALIZATION Texture based segmentation is implemented because when a person is affected bycancer the texture of the skin becomes smooth. This Segmentation method segments thecalcification pattern and the other suspicious regions in the mammograms. Using GLCM(Graylevel co-occurrence matrix) technique we know how often different combination of brightnessvalues occur in an image. The GLCM image is divided into 3x3 matrix and the texture featuresare calculated[2,3].Texture Features are: ClusterProminence,Energy,Entropy,Homogenity,Difference variance,Difference Entropy, Information Measure, Normalized ,correlation.Using GLCM (Gray Level Co-Occurrence Matrix) technique, the differentcombination of brightness values that occur on the texture segmented image is found. Usually theGLCM matrix is found for small windows but in this project the GLCM matrix is found for thewhole image. Then the GLCM Matrix is divided into small windows of size 3x3. Since the size of the Mammogram is larger, the size of the image is resized to 17x17 and hence the GLCM matrixgets segmented into 289 images. GLCM features: Cor-relation, Cluster Prominence, Energy,Entropy, Homogeneity, Difference variance, Difference Entropy and Information Measure relatedto Cor-relation, and Normalized are calculated and stored in a Excel file.The texture values for121 pairs of mammogram MRI images are calculated and are stored in an excel sheet and it isanalysed using SOM based Visualization technique .Pseudo code for performing Texture segmentation is:Step 1: Read ImageStep 2: Create Texture ImageStep 3:Create Rough Mask for the background TextureStep 4: Use Rough Mask to Segment the Foreground TextureStep 5: Display Segmentation Results Table 2: The Texture Parameter value for the left Mammogram Image – Benign CasePSNR RMS NSD ENL MES Nature of Filter 87.65 2.97 4.55 89.89 8.83 Gabor  International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), © IAEME 244 The unified distance matrix or U-matrix is a representation of the Self-Organizing Map thatvisualizes the distance between the network neurons or units. It contains the distance from eachunit center to all of its neighbors. The neurons of the SOM network are represented here byhexagonal cells. The distance between the adjacent neurons is calculated and presented withdifferent colorings. A dark coloring between the neurons corresponds to a large distance and thusrepresents a gap between the values in the input space. A light coloring between the neuronssignifies that the vectors are close to each other in the input space. Light areas represent clustersand dark areas cluster separators. This representation can be used to visualize the structure of theinput space and to get an impression of otherwise invisible structures in a multidimensional dataspace. The U-matrix representation (Figure 2) reveals the clustering structure of the datasetexplored(Texture parameter) in this experiment. Texture parameters having similar characteristicsare arranged close to each other and the distance between them represents the degree of similarityor dissimilarity. A)   Visualization OutputB) SOM TOOLBOXFigure 2: SOM based visualization for Benign case using SOM Toolbox This produces secondary, strengthened, features which can then be used to segment or classify theimage according to the texture energy. SOM toolbox in this research has helped to visualize therelationship between the features and also how the feature varies for different types of cases like(Benign, Malignant and Normal).From this project it is found that ‘Information Measure relatedto Correlation’ varies for the above specified cases during Mapping and it is also found thatEnergy and Entropy are oppositely correlated. 4. DECISION TREE INDUCTION METHOD, ID3 ALGORITHM    A mathematical algorithm for building the decision tree    Invented by J. Ross Quinlan in 1979.    Uses Information Theory invented by Shannon in 1948.    Builds the tree from the top down, with no backtracking.Information Gain is used to select the most useful attribute for classification.Entropy    A formula to calculate the homogeneity of a sample.    A completely homogeneous sample has entropy of 0.    An equally divided sample has entropy of 1.    Entropy(s) = - p+log2 (p+) -p-log2 (p-) for a sample of negative and positive elements.    The formula for entropy is:  International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 –6367(Print), ISSN 0976 – 6375(Online) Volume 3, Issue 1, January- June (2012), © IAEME 245 Information Gain (IG):    The information gain is based on the decrease in entropy after a dataset is split on anattribute[7].    Which attribute creates the most homogeneous branches?    First the entropy of the total dataset is calculated.    The dataset is then split according to the different attributes.    The entropy for each branch is calculated. Then it is added proportionally, to get totalentropy for the split.    The resulting entropy is subtracted from the entropy before the split.    The result is the Information Gain, or decrease in entropy.    The attribute that yields the largest IG is chosen for the decision node.The attributes used were the nine texture parameters with the class as benign and malignant.Based on the rule derived by testing 121 pairs of various mammogram images the rules areapplied in classifying the new cases without prior knowledge of whether they are benign,malignant or normal[8].Hence, let us see an example of ID3 decision tree classification applied for a left benign case of mammogram MIAS dataset. Node1 Cluster Prominence  Node2 Energy  Node3 Entropy  Node4 Homogenity  Node5 Difference variance, Node6 Difference Entropy, Node7 Information Measure, Node8 Normalized Node9 Correlation Class Benign/Malignant Table 3: Nodes representing the 10 attributesCross validation=== Confusion Matrix ===a b <-- classified as204 85 | a = Benign144 145 | b = Malignant