ibl.ai

ibl.ai is a generative AI education platform based in NYC. This podcast, curated by its CTO, Miguel Amigot, focuses on high-impact trends and reports about AI.

Listen on:

Episodes

Monday Jun 16, 2025

OpenAI: A Practical Guide to Building Agents

Monday Jun 16, 2025

Summary of https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
Practical guide explains that agents are advanced systems utilizing large language models (LLMs) to independently perform multi-step workflows by leveraging tools. It identifies suitable applications for agents in scenarios involving complex decisions, unstructured data, or unwieldy rule-based systems, emphasizing that simpler LLM applications are not considered agents.
The document outlines the fundamental components of an agent as an LLM model, external tools for interaction, and explicit instructions. It also explores orchestration patterns, from single-agent systems to more complex multi-agent architectures, and stresses the importance of robust guardrails and planning for human intervention to ensure safe and reliable agent operation.
Agents are LLM-powered systems capable of independently accomplishing complex, multi-step tasks by managing workflow execution and leveraging tools to interact with external systems.
Agents are particularly well-suited for workflows involving complex decision-making, difficult-to-maintain rules, or heavy reliance on unstructured data, where traditional automation methods encounter friction.
The foundational components of an agent include the Model (the LLM for reasoning), Tools (external functions/APIs to take action), and Instructions (explicit guidelines for behavior).
Agent orchestration can follow Single-agent systems (using tools within a loop) or Multi-agent systems(coordinating specialized agents via a manager or peer-to-peer handoffs), often starting with a single agent and scaling up as complexity requires.
Implementing Guardrails (such as relevance/safety classifiers and tool safeguards) and planning for Human Intervention (for failures or high-risk actions) are critical to ensure agents operate safely, predictably, and reliably.

Sunday Jun 15, 2025

Vanderbilt University: The AI Labor Playbook – How to Build, Lead, and Scale Generative AI and AI Agents in Your Organization

Sunday Jun 15, 2025

Summary of https://www.gaiin.org/the-ai-labor-playbook
Advocates a fundamental shift in how organizations view and utilize generative AI, proposing it be treated as a new form of labor rather than simply a tool.
The author argues that success hinges on a conceptual change: recognizing AI as a workforce to be led and scaled, emphasizing the importance of strategic labor planning over mere technology procurement.
A core concept introduced is the "labor-to-token exchange," where prompts represent tasks delegated to AI and tokens are the units of work and cost. The paper stresses the need to train all employees to effectively lead AI labor through natural language chat interfaces, which are presented as the primary marketplace for this new workforce.
Finally, it highlights that organizational architecture and strategy should prioritize modular, open systems to ensure access to the best AI labor at competitive costs, ultimately aiming to amplify human capability and drive innovation rather than focusing solely on cost reduction.
AI is labor, not software. Organizations should shift from thinking about AI as a tool or product to procure, and instead treat it as a workforce or labor to be led, developed, and scaled. Prompts are tasks assigned to this AI labor market, and AI models are programmable workers that require oversight, guidance, and leadership.
Labor-to-token exchanges are fundamental. These exchanges convert traditionally human tasks into interactions with generative AI systems, measured and priced in tokens. This transforms labor into a fluid, scalable, and programmable form, enabling tasks previously not possible for computers, especially cognitive ones, to be delegated through natural language. The cost of an exchange is measured by the input and output tokens.
AI labor amplifies human potential, rather than replacing it. The primary strategic shift is recognizing that this transformation is about doing more, doing new things, and unlocking latent capacity for innovation, not just cutting costs or headcount. Humans remain essential as orchestrators, supervisors, and integrators of AI labor, providing the creativity, ethical reasoning, and context that AI cannot replicate. The goal is to empower humans to amplify their thinking and enhance the enjoyment of their work.
Effective deployment requires strategic architectural and cultural changes. A major barrier is that directing AI labor is a new skill requiring training in communication, problem-solving, and system design. Organizations must avoid vendor lock-in and siloed AI within tools; instead, they should build open, modular systems, decoupling the AI labor interface (enterprise chat), the reasoning engine, the system integration (APIs), and the supervisory layer. Enterprise chat emerges as a crucial interface for accessing and assigning tasks to AI labor using natural language.
AI labor strategy must focus on empowering the workforce. The greatest returns come from distributing AI widely and training everyone to lead it effectively. Success requires overcoming fear and misunderstanding, creating champions, building learning into daily work, normalizing exploration, and emphasizing conversation and persistence. Teaching how to collaborate with AI labor, including prompt engineering and problem decomposition, is the new digital literacy essential for unlocking scale, creativity, and agility.

Saturday Jun 14, 2025

OpenAI: AI in the Enterprise

Saturday Jun 14, 2025

Summary of https://cdn.openai.com/business-guides-and-resources/ai-in-the-enterprise.pdf
Outlines OpenAI's approach to enterprise AI adoption, focusing on practical lessons learned from working with seven "frontier" companies. It highlights three key areas where AI delivers measurable improvements: enhancing workforce performance, automating routine tasks, and powering products with more relevant customer experiences.
The text emphasizes an iterative development process and an experimental mindset for successful AI integration, detailing seven essential strategies such as starting with rigorous evaluations, embedding AI into products, investing early, customizing models, empowering experts, unblocking developers, and setting ambitious automation goals, all while ensuring data security and privacy are paramount.
Embrace an iterative and experimental approach: Successful companies treat AI as a new paradigm, adopting an iterative development approach to learn quickly, improve performance and safety, and get to value faster with greater buy-in. An open, experimental mindset is key, supported by rigorous evaluations and safety guardrails.
Start early and invest for compounding benefits: Begin AI adoption now and invest early because the value compounds through continuous testing, refinement, and iterative improvements. Encouraging organization-wide familiarity and broad adoption helps companies move faster and launch initiatives more efficiently.
Prioritize strategic implementation with evaluations: Instead of broadly injecting AI, start with systematic evaluations to measure how models perform against specific use cases, ensuring quality and safety. Align implementation around high-return opportunities such as improving workforce performance, automating routine operations, or powering products.
Customize models and empower experts: Investing in customizing and fine-tuning AI models to specific data and needs can dramatically increase value, improve accuracy, relevance, and consistency. Getting AI into the hands of employees who are closest to the processes and problems is often the most powerful way to find AI-driven solutions.
Set bold automation goals and unblock developers: Aim high by setting bold automation goals to free people from repetitive tasks so they can focus on high-impact work. Unblock developer resources, which are often a bottleneck, by accelerating AI application builds through platforms or automating aspects of the software development lifecycle.

Saturday Jun 14, 2025

Microsoft Research: Shifting Work Patterns with Generative AI

Saturday Jun 14, 2025

Summary of https://arxiv.org/pdf/2504.11436
Details a large-scale randomized experiment involving over 7,000 knowledge workers across multiple industries to study the impact of a generative AI tool integrated into their workflow. The researchers measured changes in work patterns over six months by comparing workers who received access to the AI tool with a control group.
Key findings indicate that the AI tool primarily influenced individual behaviors, significantly reducing time spent on email and moderately speeding up document completion, while showing no significant effect on collaborative activities like meeting time.
The study highlights that while AI adoption can lead to noticeable shifts in personal work habits, broader changes in job responsibilities and coordinated tasks may require more systemic organizational adjustments and widespread tool adoption.
A 6-month, cross-industry randomized field experiment involving 7,137 knowledge workers from 66 large firms studied the impact of access to Microsoft 365 Copilot, a generative AI tool integrated into commonly used applications like email, document creation, and meetings.
Workers who used the AI tool regularly spent 3.6 fewer hours per week on email, a 31% reduction from their pre-period average. Intent-to-treat estimates showed a 1.3 hour reduction per week. This time saving condensed email work, opening up almost 4 hours per week of concentration time and reducing out-of-hours email activity for regular users.
While there was suggestive evidence that users completed documents moderately faster (5-25% faster for regular users), especially collaborative documents, there was no significant change in time spent in meetings or the types of meetings attended. There was also no change in the number of documents authored by the primary editor.
The observed changes primarily impacted behaviors workers could change independently, such as managing their own email inbox. Behaviors requiring coordination with colleagues or significant organizational changes, like meeting duration or reassigning document responsibilities, did not change significantly. This suggests that in the early adoption phase, individual exploration and time savings on solitary tasks were more common than large-scale workflow transformations.
Copilot usage intensity varied widely across workers and firms, but firm-specific differences were the strongest predictor of usage, explaining more variation than industry differences, pre-experiment individual behavior, or the share of coworkers with access to Copilot.

Friday Jun 13, 2025

Springer: Why AI Will Not Democratize Education – A Critical Pragmatist Perspective

Friday Jun 13, 2025

Summary of https://link.springer.com/article/10.1007/s13347-025-00883-8
This academic paper argues from a Deweyan perspective that artificial intelligence (AI), particularly in its current commercial Intelligent Tutoring System form, is unlikely to democratize education.
The author posits that while proponents focus on AI's potential to increase access to quality education, a truly democratic education, as defined by John Dewey, requires cultivating skills for democratic living, providing experience in communication and cooperation, and allowing for student participation in shaping their education.
The paper suggests that the emphasis on individualization, mastery of curriculum, and automation of teacher tasks in current educational AI tools hinders the development of these crucial democratic aspects, advocating instead for public development of AI that augments teachers' capabilities and fosters collaborative learning experiences.
The paper argues that current commercial AI, especially Intelligent Tutoring Systems (ITS), is likely to negatively impact democratic education based on John Dewey's philosophy.
A Deweyan understanding of democratic education involves preparing students for democratic living, incorporating democratic practices, democratic governance, and ensuring equal access. The paper contrasts this with a narrow view often used by AI proponents, which primarily focuses on increasing access to quality education.
Current commercial educational AI tools are characterized by an emphasis on the individualization of learning, a narrow focus on the mastery of the curriculum, and the automation of teachers' tasks.
These characteristics are seen as obstacles to democratic education because they can deprive children of experiences in democratic living, hinder the acquisition of communicative and collaborative skills, habituate them to environments with little control, and reduce opportunities for intersubjective deliberation and experiencing social differences.
Increased reliance on AI from private companies also poses a threat by reducing public influence and democratic governance over education and creating environments where students have little say. While current AI poses challenges, the author suggests alternative approaches like using AI to augment teachers or for simulations could better serve democratic goals.

Friday Jun 13, 2025

McKinsey: Open Source Technology in the Age of AI

Friday Jun 13, 2025

Summary of https://www.mckinsey.com/~/media/mckinsey/business%20functions/quantumblack/our%20insights/open%20source%20technology%20in%20the%20age%20of%20ai/open-source-technology-in-the-age-of-ai_final.pdf
Based on a survey of technology leaders and senior developers, the document explores the increasing adoption of open source solutions within AI technology stacks across various industries and geographies.
It highlights that over half of respondents utilize open source AI in data, models, and tools, driven by benefits like performance, ease of use, and lower costs compared to proprietary alternatives. However, the report also acknowledges perceived risks associated with open source AI, including cybersecurity, regulatory compliance, and intellectual property concerns, and discusses the safeguards organizations are implementing to mitigate these issues.
Ultimately, the survey indicates a strong expectation for continued growth in the use of open source AI technologies, often in conjunction with proprietary solutions.
Open source AI is widely adopted and its use is expected to grow, with over 50 percent of respondents using it in data, models, and tools areas of the tech stack. Seventy-five percent of respondents anticipate increasing their use of open source AI technologies in the next few years.
Key benefits driving the adoption of open source AI include lower implementation costs (60 percent of respondents) and lower maintenance costs (46 percent) compared to proprietary tools. Performance and ease of use are also top reasons for satisfaction. Developers value experience with open source tools for their careers and job satisfaction.
Despite the benefits, organizations perceive higher risks with open source AI, particularly regarding cybersecurity (62 percent of respondents), regulatory compliance (54 percent), and intellectual property (50 percent). Organizations are implementing safeguards like guardrails and third-party evaluations to manage these risks.
Organizations show a preference for partially open models (models with open weights but potentially non-OSI-approved licenses or limited data), which may be influenced by the performance of such models and the ability to self-host them for better data privacy and control.
The AI technology landscape is evolving towards a hybrid approach, with most organizations open to using a mixture of open source and proprietary solutions across their tech stack. Popular open source tools are often developed by large technology companies like Meta (Llama) and Google (Gemma).

Thursday Jun 12, 2025

BCG: AI Agents, and the Model Context Protocol

Thursday Jun 12, 2025

Summary of https://www.scribd.com/document/855023851/BCG-AI-Agent-Report-1745757269
Outlines the evolution of AI Agents from simple applications to increasingly autonomous systems. It highlights the growing adoption of Anthropic's open-source Model Context Protocol (MCP) by major technology companies as a key factor in enhancing AI Agent reliability and safety.
The document underscores the need for continued progress in AI's reasoning, integration, and social understanding capabilities to achieve full autonomy. Furthermore, it discusses the emergence of product-market fit for agents in various sectors, while also addressing the critical importance of measuring and improving their effectiveness.
Finally, the report examines the role of MCP in enabling agentic workflows and the associated security considerations.
The open-source Model Context Protocol (MCP), launched by Anthropic, is rapidly gaining traction among major tech companies like OpenAI, Microsoft, Google, and Amazon, marking a shift in how AI Agents observe, plan, and act with their environments, thereby enhancing reliability and safety.
AI Agents are significantly evolving, moving beyond simple workflow systems and chatbots towards autonomous and multi-agent systems capable of planning, reasoning, using tools, observing, and acting. This maturity is driving a shift from predefined workflows to self-directed agents.
Agents are demonstrating growing product-market fit, particularly coding agents, and organizations are gaining significant value from agentic workflows through benefits such as reduced time-to-decision, reclaiming developer time, accelerated execution, and increased productivity.
While AI Agents can currently reliably complete tasks taking human experts up to a few minutes, measuring their reliability and effectiveness is an ongoing focus, with benchmarks evolving to assess tool use and multi-turn tasks, and full autonomy dependent on advancements in areas like reasoning, integration, and social understanding.
Building and scaling agents involves implementing Agent Orchestration platforms and leveraging MCP to access data and systems; however, this expanded access introduces new security risks, such as malicious tools and tool poisoning, requiring robust security measures like OAuth + RBAC and isolating trust domains.

Thursday Jun 12, 2025

Google/AWS: Building A Secure Agent AI Application Leveraging Google's A2A Protocol

Thursday Jun 12, 2025

Summary of https://arxiv.org/pdf/2504.16902
Explores the critical need for secure communication protocols as AI systems evolve into complex networks of interacting agents. It focuses on Google's Agent-to-Agent (A2A) protocol, designed to enable secure and structured communication between autonomous agents.
The authors analyze A2A's security through the MAESTRO threat modeling framework, identifying potential vulnerabilities like agent card spoofing, task replay, and authentication issues, and propose mitigation strategies and best practices for secure implementation.
The paper also discusses how A2A synergizes with the Model Context Protocol (MCP) to create robust agentic systems and emphasizes the importance of continuous security measures in the evolving landscape of multi-agent AI.
Agentic AI and A2A Protocol Foundation: The emergence of intelligent, autonomous agents interacting across boundaries necessitates secure and interoperable communication. Google's Agent-to-Agent (A2A) protocol provides a foundational, declarative, identity-aware framework for structured, secure communication between agents, enabling them to discover capabilities via standardized Agent-Cards, authenticate, and exchange tasks.
A2A Core Concepts: The A2A protocol defines key elements including the AgentCard (a public JSON metadata file describing agent capabilities), A2A Server and Client (for sending/receiving requests), the Task (the fundamental unit of work with a lifecycle), Message (a communication turn), Part (basic content unit like text or files), and Artifact (generated outputs). Communication flows involve discovery, initiation (using tasks.send or tasks.sendSubscribe), processing, input handling, and completion, potentially with push notifications.
MAESTRO Threat Modeling: Traditional threat modeling falls short for agentic AI systems. The MAESTROframework (Multi-Agent Environment, Security, Threat, Risk, and Outcome), a seven-layer approach specifically for agentic AI, identifies threats relevant to A2A, including Agent Card spoofing, A2A Task replay, A2A Server impersonation, Cross-Agent Task Escalation, Artifact Tampering, Authentication & Identity Threats, and Poisoned AgentCard (embedding malicious instructions).
Key Mitigation Strategies: Addressing A2A security threats requires specific controls and best practices. Crucial mitigations include using digital signatures and validation for Agent Cards, implementing replay protection (nonce, timestamp, MACs), enforcing strict message schema validation, employing Mutual TLS (mTLS) and DNSSEC for server identity, applying strict authentication/authorization (RBAC, least privilege), securing artifacts (signatures, encryption), implementing audit logging, using dependency scanning, and applying strong JWT validation and secure token storage.
A2A and MCP Synergy: A2A and the Model Context Protocol (MCP) are complementary, operating at different layers of the AI stack. A2A enables horizontal agent-to-agent collaboration and task delegation, while MCP facilitates vertical integration by connecting agents to external tools and data sources. Their combined use enables complex hierarchical workflows but introduces security considerations at the integration points, requiring a comprehensive strategy.

Wednesday Jun 11, 2025

Stanford University: Predicting Long-Term Student Outcomes from Short-Term EdTech Log Data

Wednesday Jun 11, 2025

Summary of https://arxiv.org/pdf/2412.15473
Investigates whether student log data from educational technology, specifically from the first few hours of use, can predict long-term student outcomes like end-of-year external assessments.
Using data from a literacy game in Uganda and two math tutoring systems in the US, the researchers explore if machine learning models trained on this short-term data can effectively predict performance.
They examine the accuracy of different machine learning algorithms and identify some common predictive features across the diverse datasets. Additionally, the study analyzes the prediction quality for different student performance levels and the impact of including pre-assessment scores in the models.
Short-term log data (2-5 hours) can effectively predict long-term outcomes. The study found that machine learning models using data from a student's first few hours of usage with educational technology provided a useful predictor of end-of-school year external assessments, with performance similar to models using data from the entire usage period (multi-month). This finding was consistent across three diverse datasets from different educational contexts and tools. Interestingly, performance did not always improve monotonically with longer horizon data; in some cases, accuracy estimates were higher using a shorter horizon.
Certain log data features are consistently important predictors across different tools. Features like the percentage of success problems and the average number of attempts per problem were frequently selected as important features by the random forest model across all three datasets and both short and full horizons. This suggests that these basic counting features, which are generally obtainable from log data across many educational platforms, are valuable signals for predicting long-term performance.
While not perfectly accurate for individual students, the models show good precision at predicting performance extremes. The models struggled to accurately predict students in the middle performance quintiles but showed relatively high precision when predicting students in the lowest (likely to struggle) or highest (likely to thrive) performance groups. For instance, the best model for CWTLReading was accurate 77% of the time when predicting someone would be in the lowest performance quintile (Q1) and 72% accurate for predicting the highest (Q5). This suggests potential for using these predictions to identify students who might benefit from additional support or challenges.
Using a set of features generally outperforms using a single feature. While single features like percentage success or average attempts per problem still perform better than a baseline, machine learning models trained on the full set of extracted log features generally outperformed models using only a single feature. This indicates that considering multiple aspects of student interaction captured in the log data provides additional predictive power.
Pre-assessment scores are powerful indicators and can be combined with log data for enhanced prediction.Pre-test or pre-assessment scores alone were found to be strong predictors for long-term outcomes, often outperforming using log data features alone. When available, combining pre-test scores with log data features generally resulted in improved prediction performance (higher R2 values) compared to using either source of data alone. However, the study notes that short-horizon log data can be a useful tool for prediction when pre-tests are not available or take time away from instruction.

Wednesday Jun 11, 2025

World Bank Group: From Chalkboard to Chatbots – Evaluating the Impact of Generative AI on Learning Outcomes in Nigeria

Wednesday Jun 11, 2025

Summary of https://documents1.worldbank.org/curated/en/099548105192529324/pdf/IDU-c09f40d8-9ff8-42dc-b315-591157499be7.pdf
This is a Policy Research Working Paper from the World Bank's Education Global Department, published in May 2025. Titled "From Chalkboards to Chatbots: Evaluating the Impact of Generative AI on Learning Outcomes in Nigeria," it details a study on the effectiveness of using large language models, specifically Microsoft Copilot powered by GPT-4, as virtual tutors for secondary school students in Nigeria.
The research, conducted through a randomized controlled trial over six weeks, found that the intervention led to significant improvements in English, digital, and AI skills among participating students, particularly female students and those with higher initial academic performance.
The paper emphasizes the cost-effectiveness and scalability of this AI-powered tutoring approach in low-resource settings, although it also highlights the need to address potential inequities in access and digital literacy for broader implementation.
Significant Positive Impact on Learning Outcomes: The program utilizing Microsoft Copilot (powered by GPT-4) as a virtual tutor in secondary education in Nigeria resulted in a significant improvement of 0.31 standard deviation on an assessment covering English language, artificial intelligence (AI), and digital skills for first-year senior secondary students over six weeks. The effect on English skills, which was the main outcome of interest, was 0.23 standard deviations. These effect sizes are notably high when compared to other randomized controlled trials (RCTs) in low- and middle-income countries.
High Cost-Effectiveness: The intervention demonstrated substantial learning gains, estimated to be equivalent to 1.5 to 2 years of 'business-as-usual' schooling. A cost-effectiveness analysis revealed that the program ranks among some of the most cost-effective interventions for improving learning outcomes, achieving 3.2 equivalent years of schooling (EYOS) per $100 invested per participant. When considering long-term wage effects, the benefit-cost ratio was estimated to be very high, ranging from 161 to 260.
Heterogeneous Effects Identified: While the program yielded positive and statistically significant treatment effects across all levels of baseline performance, the effects were found to be stronger among students with better prior academic performance and those from higher socioeconomic backgrounds. Treatment effects were also stronger among female students, which the authors note appeared to compensate for a deficit in their baseline performance.
Attendance Linked to Greater Gains: A strong linear association was found between the number of days a student attended the intervention sessions and improved learning outcomes. Based on attendance data, the estimated effect size was approximately 0.031 standard deviation per additional day of attendance. Further analysis predicts substantial gains (1.2 to 2.2 standard deviations) for students participating for a full academic year, depending on attendance rates.
Key Policy Implications for Low-Resource Settings: The findings suggest that AI-powered tutoring using LLMs has transformative potential in the education sector in low-resource settings. Such programs can complement traditional teaching, enhance teacher productivity, and deliver personalized learning, particularly when designed and used properly with guided prompts, teacher oversight, and curriculum alignment. The use of free tools and local staff contributes to scalability, but policymakers must address potential inequities stemming from disparities in digital literacy and technology access through investments in infrastructure, teacher training, and inclusive digital education.