FiNER, GliNER, and Smolagents in Financial Named Entity Recognition
Tinier, Shinier, and Readier to Spot the Cash: Ongoing Adventures in Financial NER
With the recent announcement of ModernBert, I've had revisiting my previous work on Financial Named Entity Recognition on my task list for a long time. Data labeling is a manual, boring, and expensive task, so financial companies have been investing heavily in automated data labeling solutions for a while now. However, none of these are perfect, i.e., a 100% human-like detection rate. Even a 90% hit rate is not good enough as the missing 10% might have the important information that the research team was looking for.
In my professional life, I have used Named Entity Recognition (NER) in several tasks:
Efficient Information Extraction – SEC filing documents contain a vast amount of unstructured text → NER helps to identify key entities such as company names, monetary values, dates, percentages, and regulatory terms.
Risk Management and Compliance – As the CRO, I was leading the team that ensured we complied with AML (Anti-Money Laundering) and KYC (Know Your Customer) regulations → NER assists in detecting relevant entities in reports to flag potential risks or fraudulent activities.
Improved Search and Retrieval – As I had shown, even BM25-related semantic search relies on full-text search and context distance based on recognized entities rather than just keywords.
Algorithmic Trading, Reasoning, and Predictive Analytics – Identifying named entities in real-time financial news can support my cognitive financial agents and their trading algorithms by providing signals for rule-based decision-making.
The point is, that these data labels set the stage for what the article is all about. They set the context. Context is the gray matter the agent operates on. It simply matters if the article is about Cristiano Ronaldo or Ubisoft and mislabeling can have consequences. Some matter more than others.
But what is context, really? When I ran the Portfolio Management team in Japan, I always used to say information is data in context. I have used the Data → Information → Knowledge → Insight → Wisdom paradigm already several times throughout this publication. Therefore, this time I will bore you only with a short example.
Let’s start with the number 5.
You might have found that number in the text body of a given dataset when trying to extract all numeric values. So what does 5 mean?
Nothing.
The number 5 by itself has a context dimension of 0. Then what happens when we notice that actually in the body of text, it is not the Integer “5” alone, but there is a sign next to it “%”. So, now as humans, we realize that it is a percentage. Having a percentage might indicate that there is a cake of 100 and we got 1/20th of it. Or … does it? We expand our scope further and notice the word “annual”. Annual means that this percentage is measured over a yearly period. So now we understand that "5%" is not just a random number—it represents a normalized value to base 100 changing or measured over the time horizon of a year. Then we add the penultimate context dimension and realize that we are looking at “5% annual revenue growth”, which in isolation is probably a good number. But if the overall industry has grown by 10% on average, it actually shows an underperformer.
This is why context is not just about adding information—it’s about making information meaningful by providing the right context. Especially when you provide the context to a cognitive financial agent.
So how do I get data?
FiNER
If you just scrape financial news or other websites for information, you mainly will get and want to get plain text. With XBRL-based filings, the identification of interesting data points is much easier to automate.
Here is a select group of data sources where you can get XBRL-based data. As of now the individual data access for EDGAR is about 55 USD/yr, nothing out of the ordinary though if you want to be serious about this. Data quality and service reliability are key items.
U.S. Securities and Exchange Commission (SEC) - EDGAR
European Securities and Markets Authority (ESMA) - ESEF Filings
Financial Reports from Japan - EDINET) - FSA Japan
Companies House (UK) - UK Companies
International Financial Reporting Standards (IFRS) - XBRL Taxonomy
Open Data Portals - NASDAQ QDL
Bonus item: Google Sheets (plugin)
The problem that you will face though is that your NER models might not be finetuned on this data. FiNER (Financial Numeric Entity Recognition) is a dataset aimed at improving the tagging of financial reports using reports sourced through the SEC’s eXtensible Business Reporting Language (XBRL) that you can retrieve for free on Hugging Face.
source
The dataset focuses on entity extraction in financial text, particularly dealing with numeric expressions that require context-based tagging. Finer-139 consists of about 1.1M sentences with 139 entity types (hence the 139), mostly numeric, and presents challenges due to the fragmentation of numeric expressions. The paper also introduces solutions, such as pseudo-tokens, to improve performance in tagging financial data.
You can access this dataset like this:
import datasets
finer_train = datasets.load_dataset("nlpaueb/finer-139", split="train")
finer_tag_names = finer_train.features["ner_tags"].feature.name
In Jupyter, finter_tag_names should look like this
A real example
So far for this. Let’s do a case study. How about we try to do NER on this Techcrunch article “Google combines Maps and Waze teams as pressures mount to cut costs” from 2022?
What should we expect?
The article is quite short and basically covers a corporate efficiency measure. From a brief human eyeball check, we might expect to see something like this:
For us humans, it’s clear that Rebecca Bellan is the author who writes about two people (Neha and Sundar) who, both as Google employees, execute a management and product shakeup at Google.
How can we make this automatic and in the similar, same would be 100% match, quality?
GLiNER
Modern-GLiNER-Bi-Large is a general-purpose Named Entity Recognition (NER) model designed to handle diverse entity types across multiple domains. Unlike vanilla NER models, which often require domain-specific fine-tuning, Modern-GLiNER is built for adaptability. Its architecture consists of a bi-directional transformer that extracts entities using contextual awareness. Contextual awareness means that entity recognition is not just about identifying and tagging keywords but understanding them in relation to their surroundings. As I had hoped to show in my “5” -er example.
What makes the key strength of Modern-GLiNER, in my opinion, is grounded in its ability to generalize. Many existing NER models struggle when applied outside their training datasets, often failing to recognize entities in new contexts. That’s why training evals might be super, but inference doesn’t work well. Modern-GLiNER, however, uses the bi-directional transformer to recognize contextual patterns.
source
This generalizability makes it a powerful tool for financial organizations that deal with heterogeneous data sources (Filings, Financial Statements, Earnings Calls, Analyst reports, etc), where traditional models would require extensive customization and training. By capturing broader contextual signals, ModernBert will minimize the risk of misclassification and improve the quality of extracted information.
However just to be transparent, I still think that fine-tuning for domain adaptation remains crucial. While Modern-GLiNER does provide a strong base model, adapting it to domain-specific datasets, like FiNER-139, can significantly enhance its accuracy for specialized tasks.
Here is an example of how you would run GLiNER.
from gliner import GLiNER
# Load the pre-trained GLiNER model with ModernBERT as the backbone
model = GLiNER.from_pretrained("knowledgator/modern-gliner-bi-large-v1.0")
# Define the text in which you want to recognize entities
text = data[index]['_expContent']
# Define the labels (entity types) you want to recognize
labels = ["Source","Financial Metric", "Date","Organization","Person", "Product", "Percentage", "Monetary Value","Duration"]
# Predict entities in the text
entities = model.predict_entities(text, labels, threshold=0.3)
# Display the recognized entities
for entity in entities:
print(f"{entity['text']} => {entity['label']}")
Please note that my “text” sample comes out of my database and not the website. But in the end, how you get the text doesn’t matter.
If it ran correctly, then you should get something like this:
I think overall, the result was quite robust and also fast.
NER Smolagents
Intelligence is the ability to generalize knowledge. Therefore, my strategy was and has been to use vanilla models without finetuning for my posts here. Qwen in itself is a fantastic model that with the new Smolagents library could perform a similar task. Of course, here it is interesting to understand how well it does it.
The imports are standard, so there is no need to replicate them here. the Qwen2.5 Coder model is free and you would be able to implement it just by having an account.
repo_id = "Qwen/Qwen2.5-Coder-32B-Instruct"
llm_engine = HfApiModel(model_id=repo_id, provider="together", timeout=3000)
Interestingly, in the journey from Agents 2.0 to Smolagents, only “CodeAgent” seemed to have survived. As per their GAIA exercise, my understanding is that the team realized that for the agent’s thought process, programming code is an efficient and robust mechanism. The
agent = CodeAgent(tools=[], model=llm_engine)
agent.run(f"List all of these labels {labels} in this test {text}")
Then we run the agent with a simple instruction where we provide the body of text and also the target labels. We use the same list of labels from above and then we get something like this.
Well, acceptable enough I suppose. But it takes much longer for a quite similar result. So, what if we could ask the agent to qualify the relationship?
agent.run(f"Make a list of entities and their actions for all of these entites {labels} in this test {text}. ")
Then after about 16 seconds of waiting, we get this:
And that for me is the biggest benefit of using agents. They use with the bi-directional transformer a quite similar technology underneath but are much more versatile.
Next Steps
Labeling of text is getting increasingly accurate. In my opinion, it’s not only the subjects and objects that are in a body of text but also their relationships and actions. Each of these actions will have an effect.
agent.run(f"Make a list of entities and their actions for all of these entites {labels} in this test {text}. Reason step by step what the result of this action could be and add it do the output dataset.")
Secondly, with this setup, I effectively designed a node-edge layout indicative of a knowledge graph. We could use them as further input for a visualization, a reasoning agent, or a management agent.
Hope you enjoyed this exercise. Please like, share, subscribe.