Extraction of Information from the Text of Chemical Patents. 1. Identification of Specific Chemical Names

Nick Kemp; Michael E Lynch

doi:10.1021/ci980324v

Back

Extraction of Information from the Text of Chemical Patents. 1. Identification of Specific Chemical Names

Article 1998 en

Authors

NK
Nick Kemp
Michael E Lynch
Cornell University

Abstract

1 min read

Much attention has been paid to translating isolated chemical names into forms such as connection tables, but less effort has been expended in identifying substance names in running text to make them available for processing. The requirement for automatic name identification becomes a more urgent priority today, not the least in light of the inherent importance of patents and the increasing complexity of newly synthesized substances and, with these, the need for error-free processing of information from patent and other documents. The elaboration of a methodology for isolating substance names in the text of English-language patents is described here, using, in part, the SGML (Standard Generalized Markup Language) of the patent text as an aid to this process. Evaluation of the procedures, which are still at an early stage of development, demonstrates that even simple methods can achieve very high degrees of success.

Discussion(0)

No comments yet. Be the first to comment.

Related publications

Preprint2022

Structured information extraction from complex scientific text with fine-tuned large language models

Alexander Dunn, John Dagdelen, Nicholas Walker, Sang‐Hoon Lee, Andrew Rosen, Gerbrand Ceder, Kristin A. Persson, Anubhav Jain

Article2022

Towards Text-to-SQL over Aggregate Tables

Shuqin Li, Kaibin Zhou, Zeyang Zhuang, Haofen Wang, Jun Ma

Data Intelligence

Article2020

Data-driven materials research enabled by natural language processing and information extraction

Elsa Olivetti, Jacqueline M. Cole, Edward Kim, Olga Kononova, Gerbrand Ceder, T. Yong-Jin Han, Anna M. Hiszpanski

Article2013

Identification of Patients with Acute Lung Injury from Free-Text Chest X-Ray Reports

Meliha Yetisgen-Yildiz, Adrian Bejan, Mark M. Wurfel

Meeting of the Association for Computational Linguistics

Article2019

A Machine Learning Approach to Zeolite Synthesis Enabled by Automatic Literature Data Extraction

Zach Jensen, Edward Kim, Soonhyoung Kwon, Terry Z. H. Gani, Yuriy Román‐Leshkov, Manuel Moliner, Avelino Avelino, Elsa Olivetti

ACS Central Science