발간년도 : [2023]
논문정보 |
|
논문명(한글) |
[Vol.18, No.2] Natural Language Processing-based Korean Summary System Considering Linguistic Features |
|
논문투고자 |
Jongwon Lee, Sungjun Park, Hanjung Kim, Hoekyung Jung |
|
논문내용 |
As large-scale data is distributed through the Internet, it has become difficult for Internet users to find the data they need. In addition, when the form of data is text, it is required to compress and summarize the text. In this paper, we construct an efficient system for compressing and summarizing text composed of Korean using the Transformer Encoder-Decoder-based KoBART (Korean Bidirectional and Auto-Regulatory Transformers) model. This system consisted of a preprocessor that performs an extraction summary and a KoBART model that performs a generation summary. The preprocessor performs an extraction summary based on the sentence when a specific phrase appears considering the linguistic characteristics of the Korean language, and the KoBART model performs a generation summary on texts not processed by the preprocessor. The proposed system used the preprocessor considering the linguistic features of Korean and the KoBART model, which is a pre-learning language model, to compress and summarize text composed of Korean, and showed superior performance compared to the general extraction summary model and generation summary model. This suggests that a method of analyzing and utilizing the characteristics of a specific country's language in more detail can expect better results than using only a pre-learning language model with excellent performance. It is expected that this paper will be a leading study in spreading the synergy of the pre-learning language model with linguistic features and excellent performance. |
|
첨부논문 |
|
|
|
|
|