BuildFast: History-Aware Build Outcome Prediction for Fast Feedback and Reduced Cost in Continuous Integration

Long build times in continuous integration (CI) can greatly increase the cost in human and computing resources, and thus become a common barrier faced by software organizations adopting CI. Build outcome prediction has been proposed as one of the remedies to reduce such cost. However, the state-of-the-art approaches have a poor prediction performance for failed builds, and are not designed for practical usage scenarios. To address the problems,we first conduct an empirical study on 2,590,917 builds to characterize builds times in real world projects, and a survey with 75 developers to understand their perceptions about build outcome prediction. Then, motivated by our study and survey results,we propose a newhistory-aware approach,named BuildFast, to predict CI build outcomes cost-efficiently and practically. It can help to obtain fast integration feedback and reduce integration cost. In particular, we introduce multiple failure-specific features from closely related historical builds via analyzing build logs and changed files, and propose an adaptive prediction model to switch between two models based on the build outcome of the previous build. We also investigate a practical online usage scenario of BuildFast, where builds are predicted in chronological order, and measure the benefit from correct predictions and the cost from incorrect predictions. Our experiments on 20 projects have demonstrated that BuildFast can improve the state-of-the-art approach by 47.5% in F1-score for failed builds.

Survey

You can get the survey in details in here.

Features

Features about the Current Build

In this table, for fine-grained feature extractions such as class-, method-, field- and import-level changes, we use the ClDiff tool

ID	Feature	Description	Implementation
C1	src_churn	# of lines of production code changed	use ruby library Rugged to get diff data of two build commit, use string matching to filter the src code
C2	test_churn	# of lines of test code changed	get test file changes in the diff data of two build commit,use string matching the filter the test code
C3	src_ast_diff	whether production code is changed in AST	use ClDiff tool
C4	test_ast_diff	whether test code is changed in AST	use ClDiff tool
C5	line_added	# of added lines in all files	get files changes in the diff data
C6	line_deleted	# of deleted lines in all files	get files changes in the diff data
C7	files_added	# of files added	getfiles changes in the diff data
C8	files_deleted	# of files deleted	get files changes in the diff tool
C16	met_body_modified	# of method bodies modified	use ClDiff tool
C17	met_changed	# of methods added or deleted	use ClDiff tool
C18	field_changed	# of fields modified, added or deleted	use ClDiff tool
C19	import_changed	# of import statements added or deleted	use ClDiff tool
C20	class_modified	# of classes modified	use ClDiff tool
C21	class_added	# of classes added	use ClDiff tool
C22	class_deleted	# of classes deleted	use ClDiff tool
C23	met_added	# of methods added	use ClDiff tool
C24	met_deleted	# of methods deleted	use ClDiff tool
C25	field_modified	# of fields modified	use ClDiff tool
C26	field_added	# of fields added	use ClDiff tool
C27	field_deleted	# of fields deleted	use ClDiff tool
C28	import_added	# of import statements added	use ClDiff tool
C29	import_deleted	# of import statements deleted	use ClDiff tool
C30	commits	# of commits included	search for the current build commit’s parent until it is a build commit
C31	fix_commits	# of bug-fixing commits included	whether the pull request title or commit message include parttern F[f]ix
C32	merge_commits	# of merge commits included	whether the pull request title or commit message include parttern M[m]erge
C33	committers	# of unique committers	the unique commiters of commits mentioned above
C34	by_core_member	whether a core member triggers the build	committer committed code at least once within the 3 months before this commit
C35	is_master	whether the build occurs on master branch	get from the build information
C36	time_interval	time interval since the previous build	time interval between two build time
C37	day_of_week day	of week when the build starts	ruby library time, method time.day
C38	time_of_day	time of day when the build starts	ruby library time, method time.hour

Features about the Previous Build

ID	Feature	Description	Implementation
P1	pr_state build state (i.e., passed, errored or failed)	get from the build information
P2	pr_compile_error	whether compilation error occurs	build log include character string “COMPILATION ERROR “
P3	pr_test_exception	whether tests throw exceptions	build log include character string “Tests in error”
P4	pr_tests_ok	# of tests passed	extract from build log
P5	pr_tests_fail	# of tests failed	extract from build log
P6	pr_duration	overall time duration of the build	get from build information
P7	pr_src_churn	# of lines of production code changed	diff information of previous build
P8	pr_test_churn	# of lines of test code changed	diff information of previous build

Features about Historical Builds

ID	Feature	Description	Implementation
H1	fail_ratio_pr	% of broken builds in all the previous builds	# of previous failed builds / # of previous builds
H2	fail_ratio_pr_inc	increment of fail_ratio_pr at last broken build to fail_ratio_pr at penultimate broken build	increment of fail_ratio_pr
H3	fail_ratio_re	% of broken builds in recent 5 builds	# of fail builds in recent 5 builds / 5
H4	fail_ratio_com_pr	% of broken builds in all the previous builds that were triggered by the current committer	# of failed builds triggered by current committer/ # of all previous builds
H5	fail_ratio_com_re	% of broken builds in recent 5 builds that were triggered by the current committer	# of failed builds triggered by current committer in recent 5 builds/5
H6	last_fail_gap	# of builds since the last broken build	search the current build’s last build untill the build is broken
H7	consec_fail_max	maximum of # of consecutive broken builds	find all the current build’s last pass builds(one or more), count the maximum interval
H8	consec_fail_avg	average of # of consecutive broken builds	find all the current build’s last pass builds(one or more), count the average interval
H9	consec_fail_sum	sum of # of consecutive broken builds	find all the current build’s last pass builds(one or more), count the sum of interval
H10	commits_on_files	# of commits on the files in last 3 months	get information of the changed files of current build, as well as the commits in last 3 months,count the number if these commits’s files include the current changed files
H11	file_fail_prob_max	maximum of the probability of each changed file involved in previous broken builds	count the frequency of each current changed files appeared in the previous broken builds ,then divided by the total num of previous broken builds, find the maximum ratio
H12	file_fail_prob_avg	average of the probability of each changed file involved in previous broken builds	count the frequency of each current changed files appeared in the previous broken builds ,then divided by the total num of previous broken builds, find average ratio
H13	file_fail_prob_sum	sum of the probability of each changed file involved in previous broken builds	count the frequency of each current changed files appeared in the previous broken builds ,then divided by the total num of previous broken builds, find the sum ratio
H14	pr_src_files	# of production files changed between the latest passed build and the previous build	find the lastest pass build of current build, and then get the diff data of lastest pass and previous build, use string matching to filter the src files
H15	pr_src_files_in	size of the intersection of src_files and pr_src_files	size of the intersection of src_files and pr_src_files
H16	pr_test_files	# of test files changed between the latest passed build and the previous build	the same as pr_src_files, but filter the test file
H17	pr_test_files_in	size of the intersection of test_files and pr_test_files	size of the intersection of test_files and pr_test_files
H18	pr_config_files	# of build script files changed between the latest passed build and the previous build	the same as pr_src_files, but filter the config file
H19	pr_config_files_in	size of the intersection of config_files and pr_config_files	size of the intersection of config_files and pr_config_files
H20	pr_doc_files	# of documentation files changed between the latest passed build and the previous build	the same as pr_src_files, but filter the doc file
H21	pr_doc_files_in	size of the intersection of doc_files and pr_doc_files	size of the intersection of doc_files and pr_doc_files
H22	log_src_files	# of production files reported in the build log of the previous build	use string matching to filter the src files in the build log
H23 l	og_src_files_in	size of the intersection of log_src_files and src_files	size of the intersection of log_src_files and src_files
H24	log_test_files	# of test files reported in the build log of the previous build	use string matching to filter the test files in the build log
H25	log_test_files_in	size of the intersection of log_test_files and test_files	size of the intersection of log_test_files and test_files
H26	team_size	size of team contributing in last 3 months	non-repeat committers that make commit in last 3 months

Code and Dataset

You can download the code to extract features and train the model on github

Feature selections and different classifiers

Use different feature selections

Evaluation	BuildFast_IG&Chi2	Select From model	BuildFast_IG	BuildFast_Chi2	BuildFast_Mutual	BuildFast_Fpr	BuildFast_Fdr
f1-fail	0.472	0.436	0.446	0.464	0.430	0.437	0.434
f1-pass	0.913	0.877	0.912	0.900	0.862	0.906	0.891
f1-macro	0.692	0.657	0.679	0.682	0.646	0.671	0.663
f1-micro	0.883	0.843	0.881	0.875	0.819	0.874	0.856
f1-weighted	0.874	0.841	0.870	0.866	0.821	0.862	0.850
recall-fail	0.439	0.454	0.414	0.444	0.451	0.413	0.435
recall-pass	0.926	0.869	0.926	0.905	0.856	0.917	0.892
recall-macro	0.682	0.661	0.670	0.674	0.654	0.665	0.664
recall-micro	0.883	0.843	0.881	0.875	0.819	0.874	0.856
recall_weighted	0.883	0.843	0.881	0.875	0.819	0.874	0.856
pre-fail	0.572	0.506	0.541	0.546	0.498	0.566	0.528
pre-pass	0.902	0.901	0.900	0.902	0.890	0.900	0.899
pre-macro	0.737	0.703	0.720	0.724	0.694	0.733	0.714
pre-micro	0.883	0.843	0.881	0.875	0.819	0.874	0.856
precision-weighted	0.874	0.860	0.868	0.868	0.854	0.866	0.863
auc	0.784	0.783	0.784	0.787	0.755	0.788	0.785
benefit	2723.000	2624.008	2700.224	2654.035	2703.035	2663.223	2669.579
cost	592.000	410.181	548.578	510.746	643.953	408.812	440.902
gain	2131.000	2213.827	2151.646	2143.288	2059.082	2254.411	2228.677

These feature selection methods can be got on scikit-learn

BuildFast_IG&Chi2 is our feature selection approach, we adopted Chi-Squared Testing to select the top 30 features for our first model, and Information Gain to select the top 25 features for our second model.

Select From model : when training the model, we select features where the features feature importance is larger than the mean feature importances.

BuildFast_IG : we adopted Information Gain to select 30 features and 25 features for the two models respectively.

BuildFast_Chi2 : we adopted Chi-Squared Testing to select 30 features and 25 features for the two models respectively.

BuildFast_Mutual : we adopted Mutual information to select 30 features and 25 features for the two models respectively.

BuildFast_Fpr : we select the pvalues below alpha(0.01) based on a FPR test.

BuildFast_Fdr : we select the p-values for an estimated false discovery rate.

Compared with other approaches, BuildFast_IG&Chi2 improved the precision, recall and F1-score for failed builds by 4% and 2% and 3% in most cases; and for other metrics, BuildFast_IG&Chi2 slightly improved 1%-4%. We can see that BuildFast_IG&Chi2 get the most stable result for all the metrics compared with other method such as Select From model , BuildFast_Mutual. For example, Select From model slightly improved the recall-fail by 1.5% but it has a much lower pre-fail, reduced by 6.6%. For benefit, cost and gain, there was no statistically significant differencedue to the minority of failed builds and the variance of build times. Still, BuildFast_IG&Chi2 had a total gain of 2,131 hours for all projects from one-fourth of the builds (i.e., testing data) with its benefit exceeding its cost. Thus, BuildFast is cost-efficiency and can save CI cost.

Use different classifier

Evaluation	Xgboost	Randomforest
f1-fail	0.472	0.432
f1-pass	0.913	0.912
f1-macro	0.692	0.672
f1-micro	0.883	0.881
f1-weighted	0.874	0.867
recall-fail	0.439	0.385
recall-pass	0.926	0.935
recall-macro	0.682	0.660
recall-micro	0.883	0.881
recall_weighted	0.883	0.881
pre-fail	0.572	0.592
pre-pass	0.902	0.894
pre-macro	0.737	0.743
pre-micro	0.883	0.881
precision-weighted	0.874	0.867
auc	0.784	0.779
benefit	2723.000	2825.768
cost	592.000	613.467
gain	2131.000	2212.302

Compared with Randomforest , Xgboost improved the recall and F1-score for failed builds by 5.1% and 4%, thus Xgboost model contributes to the improved recall and F1-score for failed builds.