Measuring model variability using robust non-parametric testing
CoRR(2024)
摘要
Training a deep neural network often involves stochastic optimization,
meaning each run will produce a different model. The seed used to initialize
random elements of the optimization procedure heavily influences the quality of
a trained model, which may be obscure from many commonly reported summary
statistics, like accuracy. However, random seed is often not included in
hyper-parameter optimization, perhaps because the relationship between seed and
model quality is hard to describe. This work attempts to describe the
relationship between deep net models trained with different random seeds and
the behavior of the expected model. We adopt robust hypothesis testing to
propose a novel summary statistic for network similarity, referred to as the
α-trimming level. We use the α-trimming level to show that the
empirical cumulative distribution function of an ensemble model created from a
collection of trained models with different random seeds approximates the
average of these functions as the number of models in the collection grows
large. This insight provides guidance for how many random seeds should be
sampled to ensure that an ensemble of these trained models is a reliable
representative. We also show that the α-trimming level is more
expressive than different performance metrics like validation accuracy, churn,
or expected calibration error when taken alone and may help with random seed
selection in a more principled fashion. We demonstrate the value of the
proposed statistic in real experiments and illustrate the advantage of
fine-tuning over random seed with an experiment in transfer learning.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要