Deep Papers

By Arize AI

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by Arize AI

Category: Mathematics

Open in Apple Podcasts


Open RSS feed


Open Website


Rate for this podcast

Subscribers: 15
Reviews: 0
Episodes: 60

Description

Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning. 


Episode Date
CUGA Agent: From Benchmarks to Business Impact of IBM's Generalist Agent
Feb 11, 2026
TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture
Nov 24, 2025
Meta AI Researcher Explains ARE and Gaia2: Scaling Up Agent Environments and Evaluations
Nov 10, 2025
Georgia Tech's Santosh Vempala Explains Why Language Models Hallucinate, His Research With OpenAI
Oct 14, 2025
Atropos Health’s Arjun Mukerji, PhD, Explains RWESummary: A Framework and Test for Choosing LLMs to Summarize Real-World Evidence (RWE) Studies
Sep 22, 2025
Stan Miasnikov, Distinguished Engineer, AI/ML Architecture, Consumer Experience at Verizon Walks Us Through His New Paper
Sep 06, 2025
Small Language Models are the Future of Agentic AI
Sep 05, 2025
Watermarking for LLMs and Image Models
Jul 30, 2025
Self-Adapting Language Models: Paper Authors Discuss Implications
Jul 08, 2025
The Illusion of Thinking: What the Apple AI Paper Says About LLM Reasoning
Jun 20, 2025
Accurate KV Cache Quantization with Outlier Tokens Tracing
Jun 04, 2025
Scalable Chain of Thoughts via Elastic Reasoning
May 16, 2025
Sleep-time Compute: Beyond Inference Scaling at Test-time
May 02, 2025
LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection
Apr 18, 2025
AI Benchmark Deep Dive: Gemini 2.5 and Humanity's Last Exam
Apr 04, 2025
Model Context Protocol (MCP)
Mar 25, 2025
AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs
Mar 01, 2025
How DeepSeek is Pushing the Boundaries of AI Development
Feb 21, 2025
Multiagent Finetuning: A Conversation with Researcher Yilun Du
Feb 04, 2025
Training Large Language Models to Reason in Continuous Latent Space
Jan 14, 2025
LLMs as Judges: A Comprehensive Survey on LLM-Based Evaluation Methods
Dec 23, 2024
Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies
Dec 10, 2024
Agent-as-a-Judge: Evaluate Agents with Agents
Nov 23, 2024
Introduction to OpenAI's Realtime API
Nov 12, 2024
Swarm: OpenAI's Experimental Approach to Multi-Agent Systems
Oct 29, 2024
KV Cache Explained
Oct 24, 2024
The Shrek Sampler: How Entropy-Based Sampling is Revolutionizing LLMs
Oct 16, 2024
Google's NotebookLM and the Future of AI-Generated Audio
Oct 15, 2024
Exploring OpenAI's o1-preview and o1-mini
Sep 27, 2024
Breaking Down Reflection Tuning: Enhancing LLM Performance with Self-Learning
Sep 19, 2024
Composable Interventions for Language Models
Sep 11, 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aug 16, 2024
Breaking Down Meta's Llama 3 Herd of Models
Aug 06, 2024
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines
Jul 23, 2024
RAFT: Adapting Language Model to Domain Specific RAG
Jun 28, 2024
LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic
Jun 14, 2024
Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment
May 30, 2024
Breaking Down EvalGen: Who Validates the Validators?
May 13, 2024
Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models
Apr 26, 2024
Demystifying Chronos: Learning the Language of Time Series
Apr 04, 2024
Anthropic Claude 3
Mar 25, 2024
Reinforcement Learning in the Era of LLMs
Mar 15, 2024
Sora: OpenAI’s Text-to-Video Generation Model
Mar 01, 2024
RAG vs Fine-Tuning
Feb 08, 2024
HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels
Feb 02, 2024
Phi-2 Model
Feb 02, 2024
A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I
Dec 27, 2023
How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings
Dec 18, 2023
The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets
Nov 30, 2023
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Nov 20, 2023
RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models
Oct 18, 2023
Explaining Grokking Through Circuit Efficiency
Oct 17, 2023
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior
Sep 29, 2023
Skeleton of Thought: LLMs Can Do Parallel Decoding
Aug 30, 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Jul 31, 2023
Lost in the Middle: How Language Models Use Long Contexts
Jul 26, 2023
Orca: Progressive Learning from Complex Explanation Traces of GPT-4
Jul 21, 2023
Toolformer: Training LLMs To Use Tools
Mar 20, 2023
Hungry Hungry Hippos - H3
Feb 13, 2023
ChatGPT and InstructGPT: Aligning Language Models to Human Intention
Jan 18, 2023