Behind the Screen Door: Can you be too sophisticated?

Years ago, I worked with a system that had a very extensive global synonym data. Every once in a while, we caught it making matches it really shouldn’t have, so we got the vendor to turn some off – since our customers wanted an explanation that made sense to them.

From that, I wondered, if AI/LLM based systems (my “One Funnel” model) wouldn’t make things worse.

So, my prompt:

I am familiar with 2 basic kinds of false positive reduction for sanctions screening systems. One model relies on sophisticated name matching algorithms rather than whitelisting or rules-based processing to minimize false positive matches. I call that a “one funnel” model.

In contrast, a “two funnel system” uses a more basic set of matching algorithms, combined with user-built and maintained false positive reduction rules, multi-element score boarding, and/or white listing tools.

Assume, as a starting point, that this is a valid way to categorize the overwhelming bulk of systems offered by solutions providers.

Now, assume that third party data is in the Latin character set, but is global in nature (e.g. could represent parties from any country on Earth).

Assuming that tokens that appear on sanctions lists also have alternative, benign meanings in other languages (e.g. “PRIMO” can mean “best” or “first”) but, due to the low incidence of single-token sanctioned party names, the actual incidence where such collisions occur is very small.

Assume, too, that the system must be tuned with a significant bias toward minimizing false negatives, although there needs to be the capability to also reduce false positives in a significant way.

How do these considerations affect the choice of a false positive reduction model and how it is designed, configured, implemented and maintained?

And the response I got:

Based on your scenario—specifically the need to handle global data, the presence of rare “dictionary word” collisions, and a strict requirement to miss zero sanctions hits—the considerations heavily favor a “Two Funnel” system architecture.

Here is the analysis of how your assumptions dictate the design, configuration, implementation, and maintenance, with key industry terms explained for clarity.


1. Choice of Model: The “Two Funnel” Imperative

While “One Funnel” systems (often AI-driven models that try to decide “match or no match” in a single step) are efficient, your requirement to minimize False Negatives makes them risky.

  • The Problem with One Funnel: To ensure you don’t miss any bad actors in a “One Funnel” model, you have to tell the system to be very paranoid. This lowers the confidence threshold (the minimum similarity score required to trigger an alert). Without a secondary filter, this “paranoid” setting floods you with False Positives (innocent parties wrongly flagged as sanctioned) that the algorithm can’t easily explain away.
  • The Strength of Two Funnel: This architecture separates Recall from Precision.
    • Recall (The ability of the system to catch every possible bad actor): You handle this in Funnel 1 by casting a very wide net.
    • Precision (The ability of the system to filter out the noise and only present true risks): You handle this in Funnel 2 using specific logic rules.

Verdict: Because you cannot afford to miss a hit, you need the Two Funnel approach. It allows you to be “loose” in the initial scan to catch everything, and “strict” in the second phase to clean up the mess.


2. Design Implications

Designing the system under these assumptions requires a “Wide Net, Fine Mesh” architecture.

  • Funnel 1 Design (The Wide Net):
    • Algorithm Selection: Since your data is global but uses the Latin alphabet (A-Z), simple “sounds-like” algorithms (phonetics) aren’t enough. You need algorithms focused on Edit Distance (calculating how many typos or character changes it takes to turn one name into another). This catches typos and variations in transliteration (the process of converting names from scripts like Arabic or Cyrillic into English characters).
    • Segmentation: You should design the system to treat people and companies differently. A collision with the word “PRIMO” is fine for a company (e.g., “Primo Logistics”) but suspicious for a person. The design must allow different rules for different entity types.
  • Funnel 2 Design (The Fine Mesh):
    • The “Weak Alias” Logic: Because you noted that single-word collisions (like “PRIMO”) are rare but possible, you should design a “Weak Alias” or “Stop List” module. This prevents the system from generating alerts solely on common dictionary words unless there is more corroborating data (like a matching country or date of birth).

3. Configuration & Tuning

The global nature of your data and the “Zero False Negative” goal drive specific configuration choices.

  • Fuzzy Matching Thresholds:
    • To minimize False Negatives (missed hits), you must configure your matching thresholds loosely (e.g., flagging names that are only 80% similar). This accounts for the messy reality of global data.
    • Effect: This guarantees the system catches variations like “Mohammed,” “Mohamad,” and “Muhamad.”
  • Handling the “PRIMO” Assumption:
    • Since actual collisions with dictionary words are statistically rare, you do not need complex AI to figure out context.
    • Configuration: Instead, implement a Token Whitelist Rule.
      • Rule Logic: “If the matched token (the specific word or part of the name) is ‘PRIMO’, AND it is the onlyword that matches, THEN suppress the alert.”
      • This effectively neutralizes the issue without over-engineering the system.
  • Multi-Element Score-boarding:
    • Score-boarding is a method where the system assigns “points” to a match based on various factors.
    • Since the data is global, you cannot rely on name matching alone. You must configure geographic weighting. If the customer is in “Canada” and the sanctioned party is in “Iran,” the system should deduct points from the risk score, helping automate the dismissal of the alert.

4. Implementation Strategy

Implementing this system requires a focus on cleaning data before the matching even begins.

  • Pre-Processing (The Hidden Funnel):
    • Because the data is global, implementation must include a normalization layer. This means stripping out accents, umlauts, and special characters (converting ñ to nç to c) so that the basic matching algorithms in Funnel 1 can compare “apples to apples.”
  • Testing for “Zero Miss” Bias:
    • You cannot just test random names. You must implement a “Sanctions Testing Deck”—a dataset of known sanctioned entities with intentionally corrupted spellings (e.g., testing “Usama bin Ladin” to see if it matches “Osama bin Laden”). The system is only ready for launch if Funnel 1 catches 100% of these variations.

5. Maintenance & Governance

The choice of a Two Funnel system shifts the operational burden from monitoring a “Black Box” AI to managing a set of transparent rules.

  • Rule Decay:
    • The “PRIMO” rule works today. But if the government sanctions a vessel named “PRIMO” tomorrow, your rule essentially creates a False Negative (a blind spot).
    • Maintenance Protocol: You must implement a “List Management” process. Every time a sanctions list is updated, an automated script must check if any new sanctioned names conflict with your existing “Good Guy” rules.
  • False Negative Tuning:
    • To ensure you aren’t missing things, you must perform “Below the Line” testing.
    • Process: Periodically look at customers who did not generate an alert. Run them against a secondary, more aggressive screening tool. If you find a missed match, you know you need to loosen the settings in Funnel 1.

Summary Table

FeatureEffect of your Assumptions
Model ChoiceTwo Funnel. It allows you to keep the “Screening” loose (to catch everything) while using “Rules” to manage the resulting noise.
AlgorithmsMust focus on Edit Distance (fuzzy matching) rather than phonetics to handle global spelling differences.
Linguistic NoiseAddressed via exclusion lists (lists of words to ignore) due to the low frequency of these collisions.
Risk BiasFalse Negative intolerance dictates loose matching thresholds in Funnel 1 to ensure high Recall.
MaintenanceFocus shifts to Rule Hygiene (ensuring your rules don’t accidentally hide new sanctions).

Was my prompt exactly what I wanted to say? Nah, but it was pretty close – yes, in our experience, we had very few collisions between tokens that had meanings in multiple languages. But I think it was more about that matching is, largely, a multiple token matching business – with the exception of cargo vessels, of course. So, even if you hit one of those colliding tokens, usually you’re going to have to match something else in addition – and that reduces the likelihood you’ve got a match to review.

It also highlights a design consideration for matching systems. Do you need synonyms, semantic equivalents, etc for an extensive set of common terms – or only for tokens in sanctions listings? I suspect it’s the latter, which might call into question risk/reward of LLMs in cases where the source data is more heterogeneous.

Leave a comment