Background
Named Entity Recognition (NER) is the task of picking out named entities – people, places, organizations etc. – from a text. For instance, “Ola Nordmann” is a named entity, and so is “Nasjonalbiblioteket”.
In the sentence “Per i dag er status at Per og Dag har startet firmaet Per og Dag.” (As of today, Per and Dag has established the company Per and Dag), humans are able to work out that “Per” and “Dag” are people, and “Per og Dag” is a company, even without capital letters. Writing rules to accomplish this, however, is very hard. Thus the solution is to feed a language model a lot of labeled examples of where named entities occur in a text and which type of entity it is.
Models
To do the NER task, we have fine-tuned nb-bert-base on the NorNE dataset to create the nb-bert-base-ner model. It can predict 9 different entity types, all displayed in this single constructed example:
Our service
For NER, we have two API endpoints:
- https://ai.nb.no/api/ner/v1/nb-bert-base-ner uses our own BERT-base model trained on the NorNE dataset. Model link
- https://ai.nb.no/api/ner/v1/nbailab-base-ner-scandi uses a Scandinavian language model trained by Dan Saattrup Nielsen using our model as the base. Model link