An experiment with named entity recognition

1 minute read

Published:

Named entity recognition (NER) is the practice of recognizing names in a given body of text and identifying the entity category associated with them: persons, locations, etc.

Overview

Model applied to the comment section of New York Times articles

There are many pretrained NER models in the literature. We make use of the spaCy en_core_web_sm model. This model is a CNN model trained on the OntoNotes data. This CNN model carries out multiple tasks such as Part of the Speech Tag and recognition of named entities.

We use the New York Times API to fetch the comments of a given nytimes article whose link is provided by the user. Once the comments are feteched, we use the en_core_web_sm model to identify the named entities in all the comments.

Sentiment

Then we use a rule-based sentiment analayzer to find the sentiment polarity associated with the most common named entities mentioned in the comments.

Serving on a browser

We use streamlit to serve the model in an app that can be hosted on the browser. Let’s have a look at how the results look like:

In the example shown above, we see a comment in which three named entities are recognized: Action (a person), Ohio (GPE, a geographical location) and New York (GPE).

In the figure shown above, we see the distribution of sentiment scores associated with the named entity Acton in the comments. The scores are bounded between -1 and 1. The higher (lower) score values correspond to more positive (negative) sentiments.