BuildFast: History-Aware Build Outcome Prediction for Fast Feedback and Reduced Cost in Continuous Integration
Long build times in continuous integration (CI) can greatly increase the cost in human and computing resources, and thus become a common barrier faced by software organizations adopting CI. Build outcome prediction has been proposed as one of the remedies to reduce such cost. However, the state-of-the-art approaches have a poor prediction performance for failed builds, and are not designed for practical usage scenarios. To address the problems,we first conduct an empirical study on 2,590,917 builds to characterize builds times in real world projects, and a survey with 75 developers to understand their perceptions about build outcome prediction. Then, motivated by our study and survey results,we propose a newhistory-aware approach,named BuildFast, to predict CI build outcomes cost-efficiently and practically. It can help to obtain fast integration feedback and reduce integration cost. In particular, we introduce multiple failure-specific features from closely related historical builds via analyzing build logs and changed files, and propose an adaptive prediction model to switch between two models based on the build outcome of the previous build. We also investigate a practical online usage scenario of BuildFast, where builds are predicted in chronological order, and measure the benefit from correct predictions and the cost from incorrect predictions. Our experiments on 20 projects have demonstrated that BuildFast can improve the state-of-the-art approach by 47.5% in F1-score for failed builds.
Survey
You can get the survey in details in here.
Features
Features about the Current Build
In this table, for fine-grained feature extractions such as class-, method-, field- and import-level changes, we use the ClDiff tool
ID | Feature | Description | Implementation |
---|---|---|---|
C1 | src_churn | # of lines of production code changed | use ruby library Rugged to get diff data of two build commit, use string matching to filter the src code |
C2 | test_churn | # of lines of test code changed | get test file changes in the diff data of two build commit,use string matching the filter the test code |
C3 | src_ast_diff | whether production code is changed in AST | use ClDiff tool |
C4 | test_ast_diff | whether test code is changed in AST | use ClDiff tool |
C5 | line_added | # of added lines in all files | get files changes in the diff data |
C6 | line_deleted | # of deleted lines in all files | get files changes in the diff data |
C7 | files_added | # of files added | getfiles changes in the diff data |
C8 | files_deleted | # of files deleted | get files changes in the diff tool |
C16 | met_body_modified | # of method bodies modified | use ClDiff tool |
C17 | met_changed | # of methods added or deleted | use ClDiff tool |
C18 | field_changed | # of fields modified, added or deleted | use ClDiff tool |
C19 | import_changed | # of import statements added or deleted | use ClDiff tool |
C20 | class_modified | # of classes modified | use ClDiff tool |
C21 | class_added | # of classes added | use ClDiff tool |
C22 | class_deleted | # of classes deleted | use ClDiff tool |
C23 | met_added | # of methods added | use ClDiff tool |
C24 | met_deleted | # of methods deleted | use ClDiff tool |
C25 | field_modified | # of fields modified | use ClDiff tool |
C26 | field_added | # of fields added | use ClDiff tool |
C27 | field_deleted | # of fields deleted | use ClDiff tool |
C28 | import_added | # of import statements added | use ClDiff tool |
C29 | import_deleted | # of import statements deleted | use ClDiff tool |
C30 | commits | # of commits included | search for the current build commit’s parent until it is a build commit |
C31 | fix_commits | # of bug-fixing commits included | whether the pull request title or commit message include parttern F[f]ix |
C32 | merge_commits | # of merge commits included | whether the pull request title or commit message include parttern M[m]erge |
C33 | committers | # of unique committers | the unique commiters of commits mentioned above |
C34 | by_core_member | whether a core member triggers the build | committer committed code at least once within the 3 months before this commit |
C35 | is_master | whether the build occurs on master branch | get from the build information |
C36 | time_interval | time interval since the previous build | time interval between two build time |
C37 | day_of_week day | of week when the build starts | ruby library time, method time.day |
C38 | time_of_day | time of day when the build starts | ruby library time, method time.hour |
Features about the Previous Build
ID | Feature | Description | Implementation |
---|---|---|---|
P1 | pr_state build state (i.e., passed, errored or failed) | get from the build information | |
P2 | pr_compile_error | whether compilation error occurs | build log include character string “COMPILATION ERROR “ |
P3 | pr_test_exception | whether tests throw exceptions | build log include character string “Tests in error” |
P4 | pr_tests_ok | # of tests passed | extract from build log |
P5 | pr_tests_fail | # of tests failed | extract from build log |
P6 | pr_duration | overall time duration of the build | get from build information |
P7 | pr_src_churn | # of lines of production code changed | diff information of previous build |
P8 | pr_test_churn | # of lines of test code changed | diff information of previous build |
Features about Historical Builds
ID | Feature | Description | Implementation |
---|---|---|---|
H1 | fail_ratio_pr | % of broken builds in all the previous builds | # of previous failed builds / # of previous builds |
H2 | fail_ratio_pr_inc | increment of fail_ratio_pr at last broken build to fail_ratio_pr at penultimate broken build | increment of fail_ratio_pr |
H3 | fail_ratio_re | % of broken builds in recent 5 builds | # of fail builds in recent 5 builds / 5 |
H4 | fail_ratio_com_pr | % of broken builds in all the previous builds that were triggered by the current committer | # of failed builds triggered by current committer/ # of all previous builds |
H5 | fail_ratio_com_re | % of broken builds in recent 5 builds that were triggered by the current committer | # of failed builds triggered by current committer in recent 5 builds/5 |
H6 | last_fail_gap | # of builds since the last broken build | search the current build’s last build untill the build is broken |
H7 | consec_fail_max | maximum of # of consecutive broken builds | find all the current build’s last pass builds(one or more), count the maximum interval |
H8 | consec_fail_avg | average of # of consecutive broken builds | find all the current build’s last pass builds(one or more), count the average interval |
H9 | consec_fail_sum | sum of # of consecutive broken builds | find all the current build’s last pass builds(one or more), count the sum of interval |
H10 | commits_on_files | # of commits on the files in last 3 months | get information of the changed files of current build, as well as the commits in last 3 months,count the number if these commits’s files include the current changed files |
H11 | file_fail_prob_max | maximum of the probability of each changed file involved in previous broken builds | count the frequency of each current changed files appeared in the previous broken builds ,then divided by the total num of previous broken builds, find the maximum ratio |
H12 | file_fail_prob_avg | average of the probability of each changed file involved in previous broken builds | count the frequency of each current changed files appeared in the previous broken builds ,then divided by the total num of previous broken builds, find average ratio |
H13 | file_fail_prob_sum | sum of the probability of each changed file involved in previous broken builds | count the frequency of each current changed files appeared in the previous broken builds ,then divided by the total num of previous broken builds, find the sum ratio |
H14 | pr_src_files | # of production files changed between the latest passed build and the previous build | find the lastest pass build of current build, and then get the diff data of lastest pass and previous build, use string matching to filter the src files |
H15 | pr_src_files_in | size of the intersection of src_files and pr_src_files | size of the intersection of src_files and pr_src_files |
H16 | pr_test_files | # of test files changed between the latest passed build and the previous build | the same as pr_src_files, but filter the test file |
H17 | pr_test_files_in | size of the intersection of test_files and pr_test_files | size of the intersection of test_files and pr_test_files |
H18 | pr_config_files | # of build script files changed between the latest passed build and the previous build | the same as pr_src_files, but filter the config file |
H19 | pr_config_files_in | size of the intersection of config_files and pr_config_files | size of the intersection of config_files and pr_config_files |
H20 | pr_doc_files | # of documentation files changed between the latest passed build and the previous build | the same as pr_src_files, but filter the doc file |
H21 | pr_doc_files_in | size of the intersection of doc_files and pr_doc_files | size of the intersection of doc_files and pr_doc_files |
H22 | log_src_files | # of production files reported in the build log of the previous build | use string matching to filter the src files in the build log |
H23 l | og_src_files_in | size of the intersection of log_src_files and src_files | size of the intersection of log_src_files and src_files |
H24 | log_test_files | # of test files reported in the build log of the previous build | use string matching to filter the test files in the build log |
H25 | log_test_files_in | size of the intersection of log_test_files and test_files | size of the intersection of log_test_files and test_files |
H26 | team_size | size of team contributing in last 3 months | non-repeat committers that make commit in last 3 months |
Code and Dataset
You can download the code to extract features and train the model on github
Feature selections and different classifiers
Use different feature selections
Evaluation | BuildFast_IG&Chi2 | Select From model | BuildFast_IG | BuildFast_Chi2 | BuildFast_Mutual | BuildFast_Fpr | BuildFast_Fdr |
---|---|---|---|---|---|---|---|
f1-fail | 0.472 | 0.436 | 0.446 | 0.464 | 0.430 | 0.437 | 0.434 |
f1-pass | 0.913 | 0.877 | 0.912 | 0.900 | 0.862 | 0.906 | 0.891 |
f1-macro | 0.692 | 0.657 | 0.679 | 0.682 | 0.646 | 0.671 | 0.663 |
f1-micro | 0.883 | 0.843 | 0.881 | 0.875 | 0.819 | 0.874 | 0.856 |
f1-weighted | 0.874 | 0.841 | 0.870 | 0.866 | 0.821 | 0.862 | 0.850 |
recall-fail | 0.439 | 0.454 | 0.414 | 0.444 | 0.451 | 0.413 | 0.435 |
recall-pass | 0.926 | 0.869 | 0.926 | 0.905 | 0.856 | 0.917 | 0.892 |
recall-macro | 0.682 | 0.661 | 0.670 | 0.674 | 0.654 | 0.665 | 0.664 |
recall-micro | 0.883 | 0.843 | 0.881 | 0.875 | 0.819 | 0.874 | 0.856 |
recall_weighted | 0.883 | 0.843 | 0.881 | 0.875 | 0.819 | 0.874 | 0.856 |
pre-fail | 0.572 | 0.506 | 0.541 | 0.546 | 0.498 | 0.566 | 0.528 |
pre-pass | 0.902 | 0.901 | 0.900 | 0.902 | 0.890 | 0.900 | 0.899 |
pre-macro | 0.737 | 0.703 | 0.720 | 0.724 | 0.694 | 0.733 | 0.714 |
pre-micro | 0.883 | 0.843 | 0.881 | 0.875 | 0.819 | 0.874 | 0.856 |
precision-weighted | 0.874 | 0.860 | 0.868 | 0.868 | 0.854 | 0.866 | 0.863 |
auc | 0.784 | 0.783 | 0.784 | 0.787 | 0.755 | 0.788 | 0.785 |
benefit | 2723.000 | 2624.008 | 2700.224 | 2654.035 | 2703.035 | 2663.223 | 2669.579 |
cost | 592.000 | 410.181 | 548.578 | 510.746 | 643.953 | 408.812 | 440.902 |
gain | 2131.000 | 2213.827 | 2151.646 | 2143.288 | 2059.082 | 2254.411 | 2228.677 |
These feature selection methods can be got on scikit-learn
BuildFast_IG&Chi2 is our feature selection approach, we adopted Chi-Squared Testing to select the top 30 features for our first model, and Information Gain to select the top 25 features for our second model.
Select From model : when training the model, we select features where the features feature importance is larger than the mean feature importances.
BuildFast_IG : we adopted Information Gain to select 30 features and 25 features for the two models respectively.
BuildFast_Chi2 : we adopted Chi-Squared Testing to select 30 features and 25 features for the two models respectively.
BuildFast_Mutual : we adopted Mutual information to select 30 features and 25 features for the two models respectively.
BuildFast_Fpr : we select the pvalues below alpha(0.01) based on a FPR test.
BuildFast_Fdr : we select the p-values for an estimated false discovery rate.
Compared with other approaches, BuildFast_IG&Chi2 improved the precision, recall and F1-score for failed builds by 4% and 2% and 3% in most cases; and for other metrics, BuildFast_IG&Chi2 slightly improved 1%-4%. We can see that BuildFast_IG&Chi2 get the most stable result for all the metrics compared with other method such as Select From model , BuildFast_Mutual. For example, Select From model slightly improved the recall-fail by 1.5% but it has a much lower pre-fail, reduced by 6.6%. For benefit, cost and gain, there was no statistically significant differencedue to the minority of failed builds and the variance of build times. Still, BuildFast_IG&Chi2 had a total gain of 2,131 hours for all projects from one-fourth of the builds (i.e., testing data) with its benefit exceeding its cost. Thus, BuildFast is cost-efficiency and can save CI cost.
Use different classifier
Evaluation | Xgboost | Randomforest |
---|---|---|
f1-fail | 0.472 | 0.432 |
f1-pass | 0.913 | 0.912 |
f1-macro | 0.692 | 0.672 |
f1-micro | 0.883 | 0.881 |
f1-weighted | 0.874 | 0.867 |
recall-fail | 0.439 | 0.385 |
recall-pass | 0.926 | 0.935 |
recall-macro | 0.682 | 0.660 |
recall-micro | 0.883 | 0.881 |
recall_weighted | 0.883 | 0.881 |
pre-fail | 0.572 | 0.592 |
pre-pass | 0.902 | 0.894 |
pre-macro | 0.737 | 0.743 |
pre-micro | 0.883 | 0.881 |
precision-weighted | 0.874 | 0.867 |
auc | 0.784 | 0.779 |
benefit | 2723.000 | 2825.768 |
cost | 592.000 | 613.467 |
gain | 2131.000 | 2212.302 |
Compared with Randomforest , Xgboost improved the recall and F1-score for failed builds by 5.1% and 4%, thus Xgboost model contributes to the improved recall and F1-score for failed builds.