Developing machine-learning-based amyloid predictors with Cross-Beta DB
biorxiv(2024)
摘要
Due to shifts in environmental conditions, mutations, or interactions with other biomolecules, some proteins that would normally be soluble can undergo aggregation, resulting in the formation of clumps of amyloid fibrils. Understanding of this phenomenon is of paramount importance due not only to its association with various diseases (including Alzheimer’s disease), but also due to increasingly abundant evidence for its functional roles. Numerous studies have demonstrated that the propensity to form amyloids is coded by the amino acid sequence and this finding has paved the way for the development of several computational predictors of amyloidogenicity. The ultimate objective of computational methods is to accurately predict the formation of disease-related and functionally relevant amyloids that occur in vivo . These amyloid fibrils are known to form very specific “cross-β” structures of protein regions longer than about 15 residues. Remarkably, despite the significance of the naturally occurring amyloids, there has been a lack of datasets specifically dedicated to them. Hence, we built Cross-Beta DB, a database composed of cross-β amyloids formed in natural conditions. This database is expected to be indispensable for benchmarking amyloid predictors. We used the Cross-Beta DB to train and benchmark several such algorithms, using machine learning. The best-performing of these, the random-forest-based Cross-Beta RF Predictor, demonstrated superior performance over the other existing methods, fostering high expectations for an improved prediction of naturally occurring amyloids.
### Competing Interest Statement
The authors have declared no competing interest.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要