Why Entity Resolution Is Harder Than Named Entity Recognition
Introduction
Most Named Entity Recognition (NER) tutorials end with a prediction. The model successfully extracts: COMPANY, INVOICE, CONTRACT, PURCHASE_ORDER. The article ends. The notebook prints a beautiful JSON response. Mission accomplished. Or so it seems.
In real enterprise systems, extracting entities is only the beginning. Consider the following prediction:
{
"COMPANY": "ALPHABRIDGE",
"INVOICE": "MFG-INV-000157"
}
At first glance, everything looks correct. But from a business perspective, the system still knows almost nothing. Questions remain unanswered. Which ALPHABRIDGE? Which customer record? Which contract? Which invoice? Which business relationship?
These questions belong to a completely different problem known as Entity Resolution. Entity Resolution transforms extracted text into business knowledge. Without it, AI understands words but not businesses.
NER Finds Text
Named Entity Recognition answers one question: "What pieces of text represent meaningful entities?" For example:
PAYMENT FROM ALPHABRIDGE SOLUTIONS MFG-INV-000157 becomes:
{
"COMPANY": "ALPHABRIDGE SOLUTIONS",
"INVOICE": "MFG-INV-000157"
}
This is extraction. Nothing more. The model has no idea whether:
- the company exists
- the invoice exists
- the invoice belongs to the company
- the invoice has already been paid
- the contract is still active
Extraction is syntax. Enterprise automation requires semantics.
The Hidden Problem
Imagine the following customer master:
CUS-00001 ALPHABRIDGE SOLUTIONS
Now imagine receiving these transaction narratives:
- PAYMENT FROM ALPHABRIDGE
- PAYMENT FROM ALPHABRIDGE LTD
- PAYMENT FROM ABS
- PAYMENT FROM ALPHA BRIDGE
Humans immediately recognize these as the same customer. Machines do not. To a computer, every string is different. Without resolution, automation immediately breaks.
What Entity Resolution Actually Does
Entity Resolution answers a different question. Instead of asking "What entity is this?" it asks "Which business object does this entity represent?" For example:
NER Output:
{
"COMPANY": "ALPHABRIDGE"
}
Entity Resolution:
{
"customer_id": "CUS-00002",
"legal_name": "ALPHABRIDGE SOLUTIONS",
"country": "United States"
}
Notice the difference. The output is no longer text. It is business knowledge.
Why Enterprise Data Is Difficult
Enterprise systems evolve over decades. Customer names change. Companies merge. Subsidiaries appear. Legal entities are renamed. Regional offices use abbreviations. As a result:
- Microsoft
- Microsoft Ltd
- Microsoft Corporation
- MSFT
- Microsoft APAC
may all refer to different legal entities. Or exactly the same one. Only business context can answer that question.
Resolution Strategies
Modern Entity Resolution engines rarely rely on a single algorithm. Instead, they combine multiple strategies.
1. Exact Matching
The simplest approach. ALPHABRIDGE SOLUTIONS ↓ ALPHABRIDGE SOLUTIONS. Fast. Reliable. But extremely limited.
2. Alias Matching
Many businesses maintain alias dictionaries. Example: ABS ↓ ALPHABRIDGE SOLUTIONS or IBM ↓ International Business Machines. Alias lookup dramatically improves recall.
3. Normalization
Formatting differences should disappear before matching. Example: MFG INV 000157 ↓ MFG-INV-000157. Similarly: INV001 ↓ INV-001. Normalization often solves more problems than machine learning.
4. Fuzzy Matching
Some differences cannot be normalized. Example: ALPHA BRIDGE ↓ ALPHABRIDGE. Fuzzy similarity algorithms such as Levenshtein distance can identify likely matches. However, fuzzy matching should be used carefully. A low similarity threshold increases false positives.
5. Embedding Similarity
The final strategy uses semantic representations. Instead of comparing characters, we compare meaning. Sentence embeddings allow systems to recognize that Advance Payment and Project Deposit may represent similar business concepts. Embedding similarity becomes particularly useful when dealing with free-form narratives.
Hybrid Resolution
In production, no single strategy is sufficient. A typical pipeline looks like:
NER Output
│
▼
Normalization
│
▼
Exact Match
│
▼
Alias Match
│
▼
Fuzzy Match
│
▼
Embedding Similarity
│
▼
Business Validation
Every stage increases confidence. Every stage reduces ambiguity.
Confidence Scores
Entity Resolution should never return only a match. It should also return confidence. Example:
{
"customer_id": "CUS-00002",
"match_method": "alias",
"match_score": 0.96
}
Confidence allows downstream systems to decide:
- High Confidence ↓ Automatic Reconciliation
- Low Confidence ↓ Human Review
Confidence is one of the most important features of production AI systems.
Why Resolution Enables Automation
Imagine two scenarios. Without Entity Resolution:
{
"COMPANY": "ALPHABRIDGE"
}
- Can we reconcile? No.
- Can we validate invoices? No.
- Can we update ERP? No.
- Can we trigger workflows? No.
Now consider:
{
"customer_id": "CUS-00002",
"contract_id": "CNT-2024-587",
"invoice_number": "MFG-INV-000157"
}
Everything changes. Business rules become possible. Automation becomes possible. Decision engines become possible. AI Agents become possible. Entity Resolution is the bridge.
Building a Resolution Engine
The architecture we implemented looks like this:
NER Prediction
│
▼
Normalization
│
▼
Exact Matching
│
▼
Alias Lookup
│
▼
Fuzzy Matching
│
▼
Embedding Similarity
│
▼
Master Data Validation
│
▼
Resolved Business Entity
Each component has one responsibility. This modular architecture makes the system easier to improve over time.
Lessons Learned
The biggest surprise during this project was realizing that Entity Resolution was more difficult than training the transformer itself. Training a model is largely an engineering exercise. Building Entity Resolution requires understanding how the business operates. It requires domain knowledge. Master data. Business rules. Historical context.
In other words: NER learns language. Entity Resolution learns the business.
Conclusion
Most discussions around AI focus on extracting information. Enterprise automation requires understanding information. Named Entity Recognition identifies entities. Entity Resolution transforms those entities into trusted business objects. This transformation enables reconciliation, analytics, intelligent workflows, and autonomous decision-making.
Without Entity Resolution, enterprise AI remains a language model. With Entity Resolution, it becomes an operational system.
What's Next?
In Part 5, we'll build the Reconciliation Engine that combines:
- Named Entity Recognition
- Entity Resolution
- Business Rules
- Validation Logic
- Decision Intelligence
to automatically determine whether enterprise transactions can be reconciled without human intervention. We'll also discuss why rule engines still matter in the age of Large Language Models.
Comments
No comments yet. Start the discussion.