High accuracy methylation identification tools on single molecular level for PacBio HiFi data

Ying Chen, Bo Wu, Yuying Ding, Longjian Niu, Xin Bai, Zhuobin Lin,Chuan-Le Xiao

biorxiv(2024)

引用 0|浏览0
暂无评分
摘要
PacBio Circular Consensus Sequencing (CCS) allows us to obtain highly accurate bases and simultaneously determine the methylation states of individual molecules. However, existing CCS-based methods for 5mC detection have low accuracy (<90% on most datasets) at the single-molecule level and can produce inaccurate methylation patterns. These methods rely on the information from 21 bp contexts surrounding the target CpGs and have over 29% low-confidence (<75% accuracy) calls at CpGs with less distinguishable signals. We hypothesize that incorporating CpG methylation correlation information at the single-molecule level could improve the methylation calls on low-confidence CpGs. Here, we present a novel deep graph convolutional network (hifimeth) that uses 400 bp context in CCS-based 5mC calling and show that its improved performance is mainly due to the inclusion of more neighboring CpGs in contexts. Hifimeth achieves an average single-molecule accuracy of 94.7% and an average F1 score of 94.2%, 5.5% and 5.9% higher than the previous state-of-the-art method, respectively. Hifimeth-based methylation frequency quantification by read counting outperforms previous methods on all human and zebrafish datasets tested. The results also show that the high-accuracy calls of hifimeth can reveal complex single-molecule methylation patterns, either related to haplotypes or repeat regions, with up to single-motif resolution. ### Competing Interest Statement The authors have declared no competing interest.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要