발간년도 : [2024]
논문정보 |
|
논문명(한글) |
[Vol.19, No.2] Classification of the Evidence Types of Genetic Variant Literature Using BERT and Domain-Specific Words |
|
논문투고자 |
Lae-Jeong Park, Taegyun Kim |
|
논문내용 |
In the diagnostic process for rare genetic disorders, genetic testing often reveals thousands to tens of thousands of genetic variants. Many of these variants are assessed for pathogenicity according to the ACMG-AMP guidelines. For variants absent in public genetic variant annotation databases, such as ClinVar, reviewers resort to literature search systems like PubMed or LitVar to find related studies. The information from these studies, including patient symptoms, pathogenicity, evidences, and evidence types is then utilized to determine whether or not the variants are disease-causative. This paper presents a method for effectively classifying the evidence types of genetic variant literature by utilizing a BERT, a deep learning model for natural language processing, to make the variant interpretation process, which requires variant literature search systems, more efficient. The proposed method employs BioBERT, a variant of pre-trained BERT for the biomedical domain, to extract features reflecting the context of the biomedical domain text. It also utilizes a BERT input pre-processing technique that selectively emphasizes domain-specific words closely associated with the variant evidence types. This group of domain-specific words is carefully chosen by variant interpretation-specific domain knowledge. The method demonstrated here shows significant improvements over previous studies, confirming the advantages of BioBERT's biomedical domain-specific features over general BERT. By incorporating a technique that emphasizes the domain-specific terms in the BERT input preprocessing phase, a consistent, albeit slight, improvement in performance is observed compared to using BioBERT alone. |
|
첨부논문 |
|
|
|
|
|