Deep Papers Podcast Republic

Deep Papers

By Arize AI

Listen to a podcast, please open Podcast Republic app. Available on Google Play Store and Apple App Store.

Image by Arize AI

Category: Mathematics

Open Website

Rate for this podcast

Subscribers: 15
Reviews: 0
Episodes: 60

Description

Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.

Episode	Date
CUGA Agent: From Benchmarks to Business Impact of IBM's Generalist Agent Read the full episode description	Feb 11, 2026
TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture Read the full episode description	Nov 24, 2025
Meta AI Researcher Explains ARE and Gaia2: Scaling Up Agent Environments and Evaluations Read the full episode description	Nov 10, 2025
Georgia Tech's Santosh Vempala Explains Why Language Models Hallucinate, His Research With OpenAI Read the full episode description	Oct 14, 2025
Atropos Health’s Arjun Mukerji, PhD, Explains RWESummary: A Framework and Test for Choosing LLMs to Summarize Real-World Evidence (RWE) Studies Read the full episode description	Sep 22, 2025
Stan Miasnikov, Distinguished Engineer, AI/ML Architecture, Consumer Experience at Verizon Walks Us Through His New Paper Read the full episode description	Sep 06, 2025
Small Language Models are the Future of Agentic AI Read the full episode description	Sep 05, 2025
Watermarking for LLMs and Image Models Read the full episode description	Jul 30, 2025
Self-Adapting Language Models: Paper Authors Discuss Implications Read the full episode description	Jul 08, 2025
The Illusion of Thinking: What the Apple AI Paper Says About LLM Reasoning Read the full episode description	Jun 20, 2025
Accurate KV Cache Quantization with Outlier Tokens Tracing Read the full episode description	Jun 04, 2025
Scalable Chain of Thoughts via Elastic Reasoning Read the full episode description	May 16, 2025
Sleep-time Compute: Beyond Inference Scaling at Test-time Read the full episode description	May 02, 2025
LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection Read the full episode description	Apr 18, 2025
AI Benchmark Deep Dive: Gemini 2.5 and Humanity's Last Exam Read the full episode description	Apr 04, 2025
Model Context Protocol (MCP) Read the full episode description	Mar 25, 2025
AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs Read the full episode description	Mar 01, 2025
How DeepSeek is Pushing the Boundaries of AI Development Read the full episode description	Feb 21, 2025
Multiagent Finetuning: A Conversation with Researcher Yilun Du Read the full episode description	Feb 04, 2025
Training Large Language Models to Reason in Continuous Latent Space Read the full episode description	Jan 14, 2025
LLMs as Judges: A Comprehensive Survey on LLM-Based Evaluation Methods Read the full episode description	Dec 23, 2024
Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies Read the full episode description	Dec 10, 2024
Agent-as-a-Judge: Evaluate Agents with Agents Read the full episode description	Nov 23, 2024
Introduction to OpenAI's Realtime API Read the full episode description	Nov 12, 2024
Swarm: OpenAI's Experimental Approach to Multi-Agent Systems Read the full episode description	Oct 29, 2024
KV Cache Explained Read the full episode description	Oct 24, 2024
The Shrek Sampler: How Entropy-Based Sampling is Revolutionizing LLMs Read the full episode description	Oct 16, 2024
Google's NotebookLM and the Future of AI-Generated Audio Read the full episode description	Oct 15, 2024
Exploring OpenAI's o1-preview and o1-mini Read the full episode description	Sep 27, 2024
Breaking Down Reflection Tuning: Enhancing LLM Performance with Self-Learning Read the full episode description	Sep 19, 2024
Composable Interventions for Language Models Read the full episode description	Sep 11, 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges Read the full episode description	Aug 16, 2024
Breaking Down Meta's Llama 3 Herd of Models Read the full episode description	Aug 06, 2024
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines Read the full episode description	Jul 23, 2024
RAFT: Adapting Language Model to Domain Specific RAG Read the full episode description	Jun 28, 2024
LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic Read the full episode description	Jun 14, 2024
Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment Read the full episode description	May 30, 2024
Breaking Down EvalGen: Who Validates the Validators? Read the full episode description	May 13, 2024
Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models Read the full episode description	Apr 26, 2024
Demystifying Chronos: Learning the Language of Time Series Read the full episode description	Apr 04, 2024
Anthropic Claude 3 Read the full episode description	Mar 25, 2024
Reinforcement Learning in the Era of LLMs Read the full episode description	Mar 15, 2024
Sora: OpenAI’s Text-to-Video Generation Model Read the full episode description	Mar 01, 2024
RAG vs Fine-Tuning Read the full episode description	Feb 08, 2024
HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels Read the full episode description	Feb 02, 2024
Phi-2 Model Read the full episode description	Feb 02, 2024
A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I Read the full episode description	Dec 27, 2023
How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings Read the full episode description	Dec 18, 2023
The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets Read the full episode description	Nov 30, 2023
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning Read the full episode description	Nov 20, 2023
RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models Read the full episode description	Oct 18, 2023
Explaining Grokking Through Circuit Efficiency Read the full episode description	Oct 17, 2023
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior Read the full episode description	Sep 29, 2023
Skeleton of Thought: LLMs Can Do Parallel Decoding Read the full episode description	Aug 30, 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models Read the full episode description	Jul 31, 2023
Lost in the Middle: How Language Models Use Long Contexts Read the full episode description	Jul 26, 2023
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 Read the full episode description	Jul 21, 2023
Toolformer: Training LLMs To Use Tools Read the full episode description	Mar 20, 2023
Hungry Hungry Hippos - H3 Read the full episode description	Feb 13, 2023
ChatGPT and InstructGPT: Aligning Language Models to Human Intention Read the full episode description	Jan 18, 2023

Deep Papers

By Arize AI

Category: Mathematics

Open in Apple Podcasts

Open RSS feed

Open Website

Description