Publications

526 publications from this institution

A figure search engine architecture for a chemistry digital library

Academic papers contain multiple figures representing important findings and experimental results; we present a search engine specifically focused on figures in academic documents. This search engine allows users to search on figures in approximately 150,000 chemistry journal articles though the method is easily extendable to other domains. Our system indexes figure caption and mentions extracted from the PDF in documents using a custom built extractor. Recall and precision performance of extracted figures is in the 80 to 90% range. We give the frame work for the extraction algorithm, architecture and ranking function.

Sagnik Ray Choudhury, Suppawong Tuarob, Prasenjit Mitra et al. 2013Article

Choosing the right word: Using bidirectional LSTM tagger for writing support systems

Scientific writing is difficult. It is even harder for those for whom English is a second language (ESL learners). Scholars around the world spend a significant amount of time and resources proofreading their work before submitting it for review or publication. In this paper we present a novel machine learning based application for proper word choice task. Proper word choice is a generalization the lexical substitution (LS) and grammatical error correction (GEC) tasks. We demonstrate and evaluate the usefulness of applying bidirectional Long Short Term Memory (LSTM) tagger, for this task. While state-of-the-art grammatical error correction uses error-specific classifiers and machine translation methods, we demonstrate an unsupervised method that is based solely on a high quality text corpus and does not require manually annotated data. We use a bidirectional Recurrent Neural Network (RNN) with LSTM for learning the proper word choice based on a word’s sentential context. We demonstrate and evaluate our application in various settings, including both a domain-specific (scientific), writing task and a general-purpose writing task. We perform both strict machine and human evaluation. We show that our domain-specific and general-purpose models outperform state-of-the-art general context learning. As an additional contribution of this research, we also share our code, pre-trained models, and a new ESL learner test set with the research community.

Publications

A figure search engine architecture for a chemistry digital library

Choosing the right word: Using bidirectional LSTM tagger for writing support systems

Rebound of COVID-19 infection in patients with chronic lymphocytic leukemia treated for SARS-CoV-2 with Nirmatrelvir/Ritonavir or Molnupiravir

TTTS: Tree Test Time Simulation for Enhancing Decision Tree Robustness against Adversarial Examples

Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem

Data Mining for Improving the Quality of Manufacturing: A Feature Set Decomposition Approach

Securing Your Transactions: Detecting Anomalous Patterns In XML Documents

Predicting Refractive Surgery Outcome: Machine Learning Approach With Big Data

A Direct Learning Approach for Neural Network Based Pre-Distortion for Coherent Nonlinear Optical Transmitter

Data Mining and Knowledge Discovery Handbook

Recommender Systems Handbook

Soft Computing for Knowledge Discovery and Data Mining

Genetic algorithm-based feature set partitioning for classification problems

Proactive Data Mining: A General Approach and Algorithmic Framework

CaSTLe – Classification of single cells by transfer learning: Harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments

Adversarial Vulnerability of Deep Learning Models in Analyzing Next Generation Sequencing Data

Using Bandits for Effective Database Activity Monitoring

Data mining and knowledge discovery handbook

User Authentication Based on Representative Users

Theory and applications of attribute decomposition