‘Early Enough’ to Stop Artificial Intelligence from Having Social Media’s Jew-Hatred Problem, ADL Says: The Continuing Trajectory of Bias Detection and Model Evolution

As the digital age matures, the Anti-Defamation League (ADL) has sounded a crucial, if cautiously optimistic, alarm: the period to proactively stop artificial intelligence from mirroring the entrenched hatred found on social media may be drawing to a close, but the window remains open. Fresh data from the ADL’s latest endeavor—the AI Index covering testing from August to October Two Thousand Twenty-Five—reveals a complex landscape where foundational models show progress but share a collective mandate for rigorous, ongoing ethical remediation.
The Continuing Trajectory of Bias Detection and Model Evolution
The release of the latest ADL AI Index is not a definitive verdict but rather a snapshot of a technology in relentless motion. Researchers conducted over 25,000 chats across six leading large language models (LLMs)—OpenAI’s ChatGPT, Google Gemini, xAI’s Grok, Meta’s Llama, Anthropic’s Claude, and DeepSeek—to probe their responses to antisemitic conspiracies, anti-Zionist tropes, and other extremist content. The methodology, which assessed everything from document summaries to image recognition, scored models higher when they refused to comply with antisemitic prompts and provided explanations for their refusal.
Recognition of Dynamic Model Updates and Fluid Results
A vital caveat accompanying the release of the AI Index data is the acknowledgment that the technology landscape is characterized by constant, rapid iteration. The very systems tested are, by their nature, “evolving,” meaning that the specific performance scores recorded during the August to October Two Thousand Twenty-Five testing window are not necessarily static benchmarks. This implies that the best-performing model one month might regress the next, underscoring the need for ongoing, continuous auditing rather than a one-time compliance check.
The Universal Requirement for Further Improvement Across the Board
Despite the measurable differences in performance, the overarching conclusion drawn from the initial Index findings was one of shared responsibility for improvement. The research indicated that while models varied significantly in their proficiency at detecting and effectively refuting harmful or false narratives, the collective assessment was that all models surveyed “require improvement when responding to harmful content”. No single platform achieved perfection, reinforcing the notion that the fight against algorithmic bias is an ongoing collaborative endeavor for the remainder of the year and beyond. For instance, while Claude earned the highest score at 80 out of 100 for its “exceptional” rebuttal of classic Jew-hatred and anti-Zionist conspiracy theories, the lowest-scoring model, Grok, earned only 21 points.
Specific Biases Identified in Earlier Stages of AI Development
The findings from late 2025 build upon a body of work that has consistently flagged systemic issues within the industry, confirming that bias is not an accident but a reflection of training data and alignment processes that require focused attention.
Pre-Index Findings on Prevalent Anti-Jewish and Anti-Israel Sentiment
Further context is provided by earlier research conducted in the initial part of Two Thousand Twenty-Five, which highlighted concerning trends even before the full Index was launched. That earlier work had already flagged significant anti-Jewish and anti-Israel biases within the four major foundational models of the time: GPT, Claude, Gemini, and Llama. This confirmed a systemic issue within the training data or initial alignment processes across the industry, not just an isolated problem in one or two systems. The March 2025 ADL report, which formed the basis for this earlier context, noted that all four models showed concerning responses to questions about anti-Semitism and Israel.
The Case of Llama and Specific Conspiracy Theories
That foundational research noted that the Llama model, in particular, demonstrated pronounced biases across a spectrum of topics touching on Jewish and Israeli subjects. As the only open-source model tested in that earlier cohort, Llama exhibited the most pronounced anti-Jewish and anti-Israel biases overall. Specific findings indicated that Llama scored the lowest on a question related to the “great replacement conspiracy theory”, a particularly noxious narrative. Furthermore, specific findings from that period indicated that certain models showed a concerning agreement with demonstrably false statements, such as the assertion that “Many Jews are involved in kidnapping,” even if the agreement was partial across the testing cohort.
Divergent Biases Against Israel in Leading Models
The earlier analysis also pointed to specialized areas where different models exhibited unique weaknesses. For example, both the GPT and Claude systems were found to display particular bias specifically directed against Israel. In the testing related to the ongoing conflict between Israel and Hamas, GPT was noted as scoring the lowest among the surveyed models when responding to questions framed around that conflict, indicating a difficulty in maintaining neutrality or providing balanced context on geopolitical flashpoints. This suggested that while Llama struggled more with general antisemitic tropes, some closed-source leaders showed specific, heightened weakness on current events related to the Jewish state.
The Future Regulatory and Development Landscape for Responsible AI
The collective findings, presented in this pivotal moment of Two Thousand Twenty-Five, serve as a loud call for greater transparency from the companies developing these foundational technologies. The transition from identifying bias to mandating correction is currently playing out across global legislative bodies and corporate procurement offices.
Advocacy for Greater Transparency in Model Architecture
Moving forward, to truly mitigate the risks associated with prejudice, the industry must become more open about the guardrails, filtering mechanisms, and data curation processes used to train these powerful entities. The demand for this transparency is escalating from multiple vectors. In the United States, state-level legislation like California’s Transparency in Frontier Artificial Intelligence Act (TFAIA) and the Generative Artificial Intelligence: Training Data Transparency Act (AB 2013) took effect on January 1, 2026, requiring developers of public-use models to publish high-level training data information. Simultaneously, federal procurement rules issued in late 2025 require agencies purchasing LLMs to demand model cards and evaluation artifacts by March 2026. Only through shared insight can external experts effectively contribute to the safety architecture.
Internationally, the European Union’s AI Act continues to set a high bar, with key transparency requirements for high-risk AI systems scheduled to phase in by August 2, 2026. This global regulatory divergence presents a compliance challenge for international firms, who may need to adopt the strictest global standards to avoid maintaining costly dual systems—one for the EU and one for a more “laissez-faire” domestic U.S. market, though recent U.S. state laws are beginning to counter that trend.
The Public Mandate for Continuous Safety Prioritization
The central message conveyed by the organization’s work is that stopping the social media-like hatred problem in AI is possible, but it is not inevitable. It requires constant pressure, both from consumer demand and regulatory bodies, to ensure that the drive for speed and capability does not perpetually outpace the commitment to ethical development. The year Two Thousand Twenty-Five is seen as the last truly ‘early’ opportunity to set an unshakeable precedent for prioritizing human dignity over raw computational power in the evolution of Artificial Intelligence. The ADL’s concurrent research, which shows that specially trained AI can effectively reduce user belief in antisemitic conspiracy theories through factual debunking, presents a powerful counter-narrative, suggesting that AI can be engineered not just to reflect the world’s biases, but to actively help correct them. The challenge for 2026 and beyond lies in making these safety measures—and their demonstrable effectiveness—a non-negotiable component of every model release.