High accuracy methylation identification tools on single molecular level for PacBio HiFi data
biorxiv(2024)
摘要
PacBio Circular Consensus Sequencing (CCS) allows us to obtain highly accurate bases and simultaneously determine the methylation states of individual molecules. However, existing CCS-based methods for 5mC detection have low accuracy (<90% on most datasets) at the single-molecule level and can produce inaccurate methylation patterns. These methods rely on the information from 21 bp contexts surrounding the target CpGs and have over 29% low-confidence (<75% accuracy) calls at CpGs with less distinguishable signals. We hypothesize that incorporating CpG methylation correlation information at the single-molecule level could improve the methylation calls on low-confidence CpGs. Here, we present a novel deep graph convolutional network (hifimeth) that uses 400 bp context in CCS-based 5mC calling and show that its improved performance is mainly due to the inclusion of more neighboring CpGs in contexts. Hifimeth achieves an average single-molecule accuracy of 94.7% and an average F1 score of 94.2%, 5.5% and 5.9% higher than the previous state-of-the-art method, respectively. Hifimeth-based methylation frequency quantification by read counting outperforms previous methods on all human and zebrafish datasets tested. The results also show that the high-accuracy calls of hifimeth can reveal complex single-molecule methylation patterns, either related to haplotypes or repeat regions, with up to single-motif resolution.
### Competing Interest Statement
The authors have declared no competing interest.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要