General Information

ISSN: 1793-8201 (Print), 2972-4511 (Online)
Abbreviated Title: Int. J. Comput. Theory Eng.
Frequency: Quarterly
DOI: 10.7763/IJCTE
Editor-in-Chief: Prof. Mehmet Sahinoglu
Associate Editor-in-Chief: Assoc. Prof. Alberto Arteta, Assoc. Prof. Engin Maşazade
Managing Editor: Ms. Mia Hu
Abstracting/Indexing: Scopus (Since 2022), INSPEC (IET), CNKI, Google Scholar, EBSCO, etc.
Average Days from Submission to Acceptance: 192 days
E-mail: ijcte@iacsitp.com
Journal Metrics:

0.8

2022CiteScore

11th percentile

Powered by

Editor-in-chief

Prof. Mehmet Sahinoglu

Computer Science Department, Troy University, USA

I'm happy to take on the position of editor in chief of IJCTE. We encourage authors to submit papers concerning any branch of computer theory and engineering.

HOME > Archive > 2016 > Volume 8, Number 1 (Feb. 2016) >

IJCTE 2016 Vol.8(1): 32-35 ISSN: 1793-8201
DOI: 10.7763/IJCTE.2016.V8.1015

Automatic Keyword Extraction for Wikification of East Asian Language Documents

Kensuke Horita, Fuminori Kimura, and Akira Maeda

Abstract—In recent years, research on Wikification, which aims to promote the effective reuse the Wikipedia resources and the understanding of document contents, is attracting much attention. Wikification is a method to automatically extract keywords from a document, and to link them to an appropriate Wikipedia article. Wikification consists of two processes. First, we extract keywords from a document. Second, we identify the appropriate Wikipedia article for each of them. In this paper, we focus on the extraction of keywords from a document for Wikification. Research on Wikification has been conducted for documents in variety of languages. We focus on East Asian language documents and experiment with Japanese documents. Besides, we are planning to do the Wikification not only for documents in the same language but also for other languages (e.g. keywords in Japanese documents are linked to appropriate English Wikipedia articles).
Our proposed method consists of two steps. First, we extract nouns from a document using a morphological analysis tool, and extract candidate keywords by a method called Top Consecutive Nouns Cohesion (TCNC). The TCNC connects continuous nouns and treat them as one compound word. Second, we rank the extracted candidate keywords using one of two measures for keyword importance, Dice coefficient or Keyphraseness.
In our experiments of extracting appropriate keywords for Wikification in Japanese documents, our proposed method, especially the combination of TCNC and Keyphraseness, achieved the best results.

Index Terms—Wikipedia, wikification, keyword extraction, compound word.

K. Horita is with the Graduate School of Information Science and Engineering, Ritsumeikan University, Shiga, Japan (e-mail: is0038ep@ed.ritsumei.ac.jp).
F. Kimura is with Kinugasa Research Organization, Ritsumeikan University, Kyoto, Japan (e-mail: fkimura@is.ritsumei.ac.jp).
A. Maeda is with the College of Information Science and Engineering, Ritsumeikan University, Shiga, Japan (e-mail: amaeda@is.ritsumei.ac.jp).

[PDF]

Cite:Kensuke Horita, Fuminori Kimura, and Akira Maeda, "Automatic Keyword Extraction for Wikification of East Asian Language Documents," International Journal of Computer Theory and Engineering vol. 8, no. 1, pp. 32-35, 2016.

PREVIOUS PAPER

Controlling a Humanoid Robot Arm for Grasping and Manipulating a Moving Object in the Presence of Obstacles without Cameras

NEXT PAPER

Monitoring Remote Data Problem

Home

About IJCTE

Editorial Board

Author Guidelines

Reviewers Guidelines

Current issue

Archive

Special Issue

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Automatic Keyword Extraction for Wikification of East Asian Language Documents