Lessons Learned from Applying off-the-shelf BERT: There is no Silver Bullet

Victor Makarenkov; Lior Rokach

doi:10.48550/arxiv.2009.07238

Abstract

1 min read

One of the challenges in the NLP field is training large classification models, a task that is both difficult and tedious. It is even harder when GPU hardware is unavailable. The increased availability of pre-trained and off-the-shelf word embeddings, models, and modules aim at easing the process of training large models and achieving a competitive performance. We explore the use of off-the-shelf BERT models and share the results of our experiments and compare their results to those of LSTM networks and more simple baselines. We show that the complexity and computational cost of BERT is not a guarantee for enhanced predictive performance in the classification tasks at hand.

Discussion(0)

No comments yet. Be the first to comment.

Open reviews(0)

Public, signed peer feedback on this preprint.

No reviews yet.

Lessons Learned from Applying off-the-shelf BERT: There is no Silver Bullet

Abstract

Discussion(0)

Open reviews(0)

Related publications

Text Classification Algorithm Based on TF-IDF and BERT

Randomized Geometric Algebra Methods for Convex Neural Networks

Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science

INTEGRATING MULTIPLE MODALITIES FOR ACCURATE EMOTION RECOGNITION: A DEEP LEARNING ENSEMBLE APPROACH

Iterative Feature eXclusion (IFX): Mitigating feature starvation in gradient boosted decision trees

Related publications

Article2022
Text Classification Algorithm Based on TF-IDF and BERT
Article2022

Preprint2024
Randomized Geometric Algebra Methods for Convex Neural Networks
Preprint2024

Article2022
Quantifying the advantage of domain-specific pre-training on named entity recognition tasks in materials science
Article2022

Article2023
INTEGRATING MULTIPLE MODALITIES FOR ACCURATE EMOTION RECOGNITION: A DEEP LEARNING ENSEMBLE APPROACH
Article2023

Article2024
Iterative Feature eXclusion (IFX): Mitigating feature starvation in gradient boosted decision trees
Article2024