논문윤리하기 논문투고규정
  • 오늘 가입자수 0
  • 오늘 방문자수 148
  • 어제 방문자수 734
  • 총 방문자수 1629
2024-05-20 03:30am
논문지
HOME 자료실 > 논문지

발간년도 : [2024]

 
논문정보
논문명(한글) [Vol.19, No.2] Classification of the Evidence Types of Genetic Variant Literature Using BERT and Domain-Specific Words
논문투고자 Lae-Jeong Park, Taegyun Kim
논문내용 In the diagnostic process for rare genetic disorders, genetic testing often reveals thousands to tens of thousands of genetic variants. Many of these variants are assessed for pathogenicity according to the ACMG-AMP guidelines. For variants absent in public genetic variant annotation databases, such as ClinVar, reviewers resort to literature search systems like PubMed or LitVar to find related studies. The information from these studies, including patient symptoms, pathogenicity, evidences, and evidence types is then utilized to determine whether or not the variants are disease-causative. This paper presents a method for effectively classifying the evidence types of genetic variant literature by utilizing a BERT, a deep learning model for natural language processing, to make the variant interpretation process, which requires variant literature search systems, more efficient. The proposed method employs BioBERT, a variant of pre-trained BERT for the biomedical domain, to extract features reflecting the context of the biomedical domain text. It also utilizes a BERT input pre-processing technique that selectively emphasizes domain-specific words closely associated with the variant evidence types. This group of domain-specific words is carefully chosen by variant interpretation-specific domain knowledge. The method demonstrated here shows significant improvements over previous studies, confirming the advantages of BioBERT's biomedical domain-specific features over general BERT. By incorporating a technique that emphasizes the domain-specific terms in the BERT input preprocessing phase, a consistent, albeit slight, improvement in performance is observed compared to using BioBERT alone.
첨부논문
   19-2-02.pdf (580.7K) [1] DATE : 2024-05-01 10:39:11