Абстрактный

A Study of Information Extraction Tools for Online English Newspapers (PDF): Comparative Analysis

M. Hanumanthappa, Deepa T. Nagalavi, Manish Kumar

Information retrieval is the task of retrieving relevant and useful information from e-newspapers. Electronic newspapers are electronic replicas of traditional newspapers. E-newspapers are becoming increasingly popular because of the ease and convenience in accessing them. Newspapers are the source of timely information. These are the documents comprising news items and several independent informative articles. It is also interesting to note that many newspapers present news on the same subject with different perspectives. In this fast moving era, it is impossible to read multiple newspapers. Thus, it is an essential to quickly summarize an article collected from different newspapers and present it to the reader in a compact and concise manner without compromising the structure and format of the news. A system that achieves this task should parse the e-newspapers available in PDF format and convert to text format. Secondly, data mining techniques are applied to identify and summarize the articles from various newspapers. This survey, focuses on article identification methods and popular extraction tools used for extracting the contents of e-newspapers for conversion from PDF to text format. A comparative study on extraction tools based on the source type, programming language and working characteristics is also presented.

Отказ от ответственности: Этот реферат был переведен с помощью инструментов искусственного интеллекта и еще не прошел проверку или верификацию

Индексировано в

Индекс Коперника
Академические ключи
CiteFactor
Космос ЕСЛИ
РефСик
Университет Хамдарда
Всемирный каталог научных журналов
Импакт-фактор Международного инновационного журнала (IIJIF)
Международный институт организованных исследований (I2OR)
Cosmos

Посмотреть больше