handling missing data in software effort prediction with naive bayes and em algorithm

CORC > 软件研究所 > 软件所图书馆 > 会议论文

	handling missing data in software effort prediction with naive bayes and em algorithm
	Zhang Wen ; Yang Ye ; Wang Qing
	2011
会议名称	7th International Conference on Predictive Models in Software Engineering, PROMISE 2011, Co-located with ESEM 2011
会议日期	September
会议地点	Banff, AB, Canada
关键词	Algorithms Data handling Embedded software Experiments Forecasting Mathematical models Models Predictive control systems Software engineering
页码	-
中文摘要	Background: Missing data, which usually appears in software effort datasets, is becoming an important problem in software effort prediction. Aims: In this paper, we adapt nai¨Bayes and EM (Expectation Maximization) for software effort prediction, and develop two embedded strategies: missing data toleration and missing data imputation, to handle the missing data in software effort datasets. Method: The missing data toleration strategy ignores missing values in software effort datasets while missing data imputation strategy uses observed values to impute missing values. Results: Experiments on ISBSG and CSBSG datasets demonstrate that: 1)both proposed strategies outperform BPNN with classic imputation techniques as MI and MINI. Meanwhile, the imputation strategy outperforms toleration strategy in most cases and has produced the highest accuracy as 75.15%; 2) the unlabeled projects used in training prediction model has signifintly improved the performances of effort prediction of nai¨Bayes and EM with both strategies, especially when the size of training data to the size of unlabeled data is at a relatively optimal level; 3) each class of software effort data exactly corresponds to a Gaussian component for both ISBSG and CSBSG datasets. Conclusion: Although initial experiments on ISBSG data set demonstrate some promising aspects of the proposed strategies, we cannot draw that they can be generalized to be applied in all the other software effort datasets. Copyright © 2011 ACM.
英文摘要	Background: Missing data, which usually appears in software effort datasets, is becoming an important problem in software effort prediction. Aims: In this paper, we adapt nai¨Bayes and EM (Expectation Maximization) for software effort prediction, and develop two embedded strategies: missing data toleration and missing data imputation, to handle the missing data in software effort datasets. Method: The missing data toleration strategy ignores missing values in software effort datasets while missing data imputation strategy uses observed values to impute missing values. Results: Experiments on ISBSG and CSBSG datasets demonstrate that: 1)both proposed strategies outperform BPNN with classic imputation techniques as MI and MINI. Meanwhile, the imputation strategy outperforms toleration strategy in most cases and has produced the highest accuracy as 75.15%; 2) the unlabeled projects used in training prediction model has signifintly improved the performances of effort prediction of nai¨Bayes and EM with both strategies, especially when the size of training data to the size of unlabeled data is at a relatively optimal level; 3) each class of software effort data exactly corresponds to a Gaussian component for both ISBSG and CSBSG datasets. Conclusion: Although initial experiments on ISBSG data set demonstrate some promising aspects of the proposed strategies, we cannot draw that they can be generalized to be applied in all the other software effort datasets. Copyright © 2011 ACM.
收录类别	EI
会议录	ACM International Conference Proceeding Series
语种	英语
ISBN号	9781450307093
内容类型	会议论文
源URL	[http://ir.iscas.ac.cn/handle/311060/16211]
专题	软件研究所_软件所图书馆_会议论文
推荐引用方式 GB/T 7714	Zhang Wen,Yang Ye,Wang Qing. handling missing data in software effort prediction with naive bayes and em algorithm[C]. 见:7th International Conference on Predictive Models in Software Engineering, PROMISE 2011, Co-located with ESEM 2011. Banff, AB, Canada. September.