General Information

ISSN: 1793-8201 (Print), 2972-4511 (Online)
Abbreviated Title: Int. J. Comput. Theory Eng.
Frequency: Quarterly
DOI: 10.7763/IJCTE
Editor-in-Chief: Prof. Mehmet Sahinoglu
Associate Editor-in-Chief: Assoc. Prof. Alberto Arteta, Assoc. Prof. Engin Maşazade
Managing Editor: Ms. Mia Hu
Abstracting/Indexing: Scopus (Since 2022), INSPEC (IET), CNKI, Google Scholar, EBSCO, etc.
Average Days from Submission to Acceptance: 192 days
E-mail: ijcte@iacsitp.com
Journal Metrics:

0.8

2022CiteScore

11th percentile

Powered by

Editor-in-chief

Prof. Mehmet Sahinoglu

Computer Science Department, Troy University, USA

I'm happy to take on the position of editor in chief of IJCTE. We encourage authors to submit papers concerning any branch of computer theory and engineering.

HOME > Archive > 2011 > Volume 3, Number 2 (Apr. 2011) >

IJCTE 2011 Vol.3(2): 261-269 ISSN: 1793-8201
DOI: 10.7763/IJCTE.2011.V3.314

Pattern discovery for semi-structured web pagesusing bar-tree representation

Z. Akbar, L. T. Handoko

Abstract—Many websites with an underlying database containing structured data provide the richest and most dense source of information relevant for topical data integration. The real data integration requires sustainable and reliable pattern discovery to enable accurate content retrieval and to recognize pattern changes from time to time; yet, extracting the structured data from web documents is still lacking from its accuracy. This paper proposes the bar-tree representation to describe the whole pattern of web pages in an efficient way based on the reverse algorithm. While previous algorithms always trace the pattern and extract the region of interest from top root, the reverse algorithm recognizes the pattern from the region of interest to both top and bottom roots simultaneously. The attributes are then extracted and labeled reversely from the region of interest of targeted contents. Since using conventional representations for the algorithm should require more computational power, the bar-tree method is developed to represent the generated patterns using bar graphs characterized by the depths and widths from the document roots. We show that this representation is suitable for extracting the data from the semistructured web sources, and for detecting the template changes of targeted pages. The experimental results show perfect recognition rate for template changes in several web targets.

Index Terms—data extraction, data mining, web-based information system

Z. Akbar is also with the Group for Bioinformatics and Information Mining, Department of Computer and Information Science, University of Konstanz, Box D188, D-78457 Konstanz, Germany.

[PDF]

Cite: Z. Akbar, L. T. Handoko, "Pattern discovery for semi-structured web pagesusing bar-tree representation," International Journal of Computer Theory and Engineering vol. 3, no. 2, pp. 261-269, 2011.

PREVIOUS PAPER

Frequent Itemsets from Multiple Datasets with Fuzzy data

NEXT PAPER

Classification in EEG-Based Brain Computer Interfaces Using Inverse Model

Home

About IJCTE

Editorial Board

Author Guidelines

Reviewers Guidelines

Current issue

Archive

Special Issue

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

Pattern discovery for semi-structured web pagesusing bar-tree representation