Burning Questions: What goes into independent testing?

So, I asked Gemini:

What considerations should go into the design of a sanctions testing program? List sources and justify your choices

And it replied:

Designing a robust sanctions testing program requires balancing regulatory expectations (effectiveness) with operational reality (efficiency). The goal is to prove that your screening controls are working as intended and are aligned with your institution’s specific risk profile.

Below are the critical design considerations, categorized by the testing lifecycle, followed by authoritative sources and justifications.

1. Key Design Considerations

A. Governance and Scope

  • Independence: The testing function (e.g., Internal Audit or an independent third party) must be separate from the function responsible for setting the screening parameters to ensure objectivity.
  • Frequency & Triggers: Design the program to test not just annually, but also upon “trigger events” (e.g., new product launches, major system updates, or significant changes to sanctions lists).
  • End-to-End Coverage: Testing should not be limited to the software logic. It must cover the entire lifecycle: from data extraction (source systems) to the final disposition of an alert by an analyst.

B. Data Integrity (Inputs)

  • Completeness & Lineage: Verify that all relevant data (customer names, counter-parties, vessel names, payment instructions) is actually making it from the source system to the screening engine. Dropped fields are a common failure point.
  • Data Quality: Assess the quality of the data being screened. Are there dummy values (e.g., “N/A”, “Unknown”) that might bypass screening logic?
  • List Management: Verify that the internal “Block Lists” (e.g., OFAC SDN, EU, UN) are updated timely. Test the “time-to-deployment” gap between a regulator’s update and your system’s update.

C. Screening Logic & Model Validation (Processing)

  • Fuzzy Matching & Thresholds: Test the sensitivity of your algorithm.
    • Exact Match: Does it catch “Iran”?
    • Fuzzy Match: Does it catch variations like “Tehran,” “Ir@n,” or specific misspellings?
    • Justification: If the threshold is too high (e.g., 95%), you miss risks. If too low (e.g., 60%), you drown in false positives.
  • Good Guy/White Lists: specific attention must be paid to “suppression rules.” Ensure that rules designed to auto-close false positives are not accidentally auto-closing true matches (e.g., a rule to ignore “Cuba, Missouri” must not ignore “Cuba” the country).
  • Segmentation: Ensure different business lines (e.g., Trade Finance vs. Retail Banking) have appropriate rules. Trade finance often requires stricter, keyword-based screening compared to retail customer databases.

D. Operational Effectiveness (Outputs)

  • Alert Adjudication: Test a sample of “Cleared” alerts to ensure the rationale for discounting them was sound and documented.
  • System availability: Verify that the system processes transactions in real-time (if required) without timeouts that could allow a sanctioned transaction to slip through during an outage.

2. Sources and Justifications

To defend your program design to regulators or auditors, you should cite the following hierarchy of sources:

1. OFAC (Office of Foreign Assets Control)

  • Source Document: A Framework for OFAC Compliance Commitments (2019).
  • Justification: This is the primary regulator for US sanctions. OFAC explicitly lists “Testing and Auditing” as one of the five essential components of a compliance program. They emphasize that a failure to test the “effectiveness” of the program is a root cause of many enforcement actions. Using their framework is the best defense against a penalty calculation.

2. The Wolfsberg Group

  • Source Document: Wolfsberg Guidance on Sanctions Screening (2019).
  • Justification: This represents the global banking industry’s “gold standard” for best practices. It provides specific technical guidance on “fuzzy matching,” data quality, and the limitations of screening that regulators often accept as a reasonable benchmark for commercial feasibility.

3. NYDFS (New York Department of Financial Services)

  • Source Document: Part 504 (Transaction Monitoring and Filtering Program Requirements).
  • Justification: Even if you are not NY-regulated, Part 504 is widely viewed as the most prescriptive and rigorous standard in the US. It explicitly mandates “end-to-end, pre- and post-implementation testing,” including data mapping and model validation. Adhering to Part 504 standards usually ensures you are over-compliant with federal standards.

4. FFIEC (Federal Financial Institutions Examination Council)

  • Source Document: BSA/AML Examination Manual – OFAC Section.
  • Justification: This is the “playbook” examiners use when they visit your institution. Designing your testing program to mirror the procedures in this manual ensures you are testing for exactly what the examiners will look for (e.g., validating that the bank processes list updates quickly).

Do you quibble with any of this? It seems like a reasonable starting point…

Leave a comment