ibl.ai

ibl.ai is a generative AI education platform based in NYC. This podcast, curated by its CTO, Miguel Amigot, focuses on high-impact trends and reports about AI.

Listen on:

Episodes

Wednesday Feb 19, 2025

Artificial Analysis: State of AI in China – Q1 2025

Wednesday Feb 19, 2025

Summary of https://artificialanalysis.ai/downloads/china-report/2025/Artificial-Analysis-State-of-AI-China-Q1-2025.pdf
Artificial Analysis's Q1 2025 report analyzes the state of AI, particularly focusing on the advancements in language models from both the US and China. The report highlights that Chinese AI labs have significantly closed the gap in AI intelligence, now rivaling top US models.
Open-source models and reasoning capabilities are becoming increasingly common in China. The study also examines the impact of US export controls on AI accelerators and how companies like NVIDIA are adapting.
Specific NVIDIA and AMD hardware specifications are provided for various AI accelerators. The analysis includes a breakdown of leading AI firms in both countries, along with their respective AI strategies and funding.
Here are five interesting takeaways from the source:
Chinese AI labs have largely caught up to US AI labs in language model intelligence. Several Chinese models are now competitive with top US models, and Chinese AI labs are no longer laggards.
Open weights models are closing in on frontier labs. Models from DeepSeek and Alibaba have approached o1-level intelligence. Chinese AI startups, supported by Big Tech firms and the government, have developed some of the world’s leading open weights models.
Reasoning models are becoming commonplace. Chinese competitors, led by DeepSeek, have largely replicated the intelligence of OpenAI's o1 reasoning models within months of their introduction. Several AI labs in China now have frontier-level reasoning models.
US export controls restrict the export of leading NVIDIA accelerators to China based on performance and density thresholds. The H20 and L20 fall below these thresholds and can be freely exported.
Early 2025 has seen Chinese AI labs prolifically releasing frontier reasoning models. Labs such as Alibaba, DeepSeek, MoonShot, Tencent, Zhipu and Baichuan are included.

Tuesday Feb 18, 2025

OWASP: LLM Applications Cybersecurity and Governance Checklist

Tuesday Feb 18, 2025

Summary of https://genai.owasp.org/resource/llm-applications-cybersecurity-and-governance-checklist-english
Provides guidance on securing and governing Large Language Models (LLMs) in various organizational contexts. It emphasizes understanding AI risks, establishing comprehensive policies, and incorporating security measures into existing practices.
The document aims to assist leaders across multiple sectors in navigating the challenges and opportunities presented by LLMs while safeguarding against potential threats. The checklist helps organizations formulate strategies, improve accuracy, and reduce oversights in their AI adoption journey.
It also includes references to external resources like OWASP and MITRE to facilitate a robust cybersecurity plan. Finally, the document highlights the importance of continuous monitoring, testing, and validation of AI systems throughout their lifecycle.
Here are five key takeaways regarding LLM AI Security and Governance:
AI and LLMs present both opportunities and risks. Organizations face the threat of not using LLM capabilities, such as competitive disadvantage and innovation stagnation, but must also consider the risks of using them.
A checklist approach improves strategy and reduces oversights. The OWASP Top 10 for LLM Applications Cybersecurity and Governance Checklist helps leaders understand LLM risks and benefits, focusing on critical areas for defense and protection. This list can help organizations improve defensive techniques and address new threats.
AI security and privacy training is essential for all employees. Training should cover the potential consequences of building, buying, or utilizing LLMs, and should be specialized for certain positions.
Incorporate LLM security into existing security practices. Integrate the management of AI systems with existing organizational practices, ensuring AI/ML systems follow established privacy, governance, and security practices. Fundamental security principles and an understanding of secure software review, architecture, data governance, and third-party assessments remain crucial.
Adopt continuous testing, evaluation, verification, and validation (TEVV). Establish a continuous TEVV process throughout the AI model lifecycle, providing regular executive metrics and updates on AI model functionality, security, reliability, and robustness. Model cards and risk cards increase transparency, accountability, and ethical deployment of LLMs.

Tuesday Feb 18, 2025

ETS: 2025 Human Progress Report

Tuesday Feb 18, 2025

Summary of https://www.ets.org/human-progress-report.html
The 2025 ETS Human Progress Report explores the evolving landscape of education and career advancement across 18 countries. It reveals a rise in the Human Progress Index, highlighting improvements in education, skill development, and career growth but also emphasizes uneven progress.
The report underscores the growing importance of "evidential currency"—skills-based credentials—as a pathway to opportunity and success in a rapidly changing job market. Key findings suggest a significant concern among Gen Z regarding technological obsolescence and a strong global consensus on the necessity of continuous learning.
The report advocates for skills-based hiring practices, AI literacy, and partnerships between educational institutions, governments, and employers to build a more adaptable, equitable workforce. The study highlights a global truth that over 80% agree continuous learning is essential for success.
Skills, especially AI literacy, are redefining work. By 2030, most people expect digital skill wallets and verified resumes to be the norm. Nearly two-thirds of people are seeking credentials in essential skills like AI literacy, problem-solving, creativity, communication, and technical skills.
Gen Z is worried about remaining relevant in the face of rapid technological changes driven by AI and automation. 65% of Gen Z workers express this concern.
Skills credentials, including those in AI, improve career trajectory. A large majority of people say certifying their skills improves their chances of securing better jobs. The report notes that 86% of people say certifying their skills improves the chance of securing a better or higher-paying job and improves their overall career trajectory.
"Evidential Currency," especially regarding AI skills, is becoming essential for meeting competition expectations and breaking down systemic barriers. As the job market becomes more competitive, credentials and real-time skill assessments continue to rise in value. The demand for AI skills has increased significantly.
Continuous learning, particularly in AI and related fields, is essential. Most respondents agree that continuous learning is essential for success. The 2025 report highlights that over 80% agree continuous learning is essential for success.

Tuesday Feb 18, 2025

University College London: How Human-AI Feedback Loops Alter Human Perceptual, Emotional and Social Judgements

Tuesday Feb 18, 2025

Summary of https://www.nature.com/articles/s41562-024-02077-2
This research investigates how interactions between humans and AI can create feedback loops that amplify biases.The study reveals that AI algorithms, trained on slightly biased human data, not only adopt these biases but also magnify them.
When humans then interact with these biased AI systems, their own biases increase, demonstrating a concerning feedback mechanism. The researchers found this effect to be stronger in human-AI interactions than in human-human interactions, and that humans often underestimate the influence of AI on their judgments.
The study demonstrated that using an AI system like Stable Diffusion can increase social bias. Critically, the study shows that accurate AI can improve judgement, while flawed AI amplifies human biases.
Here are five key takeaways from the provided study on human-AI interaction:
AI systems can amplify biases present in human data. When AI algorithms are trained on data that contains even slight human biases, the algorithms not only adopt these biases but often amplify them.
Human interaction with biased AI increases human bias. Repeated interaction with biased AI systems leads humans to internalize and adopt the AI's biases, potentially creating a feedback loop where human judgment becomes increasingly skewed. This effect is stronger in human-AI interactions than in human-human interactions.
The perception of AI influences its impact. Humans may be more susceptible to bias from AI systems if they perceive the AI as superior or authoritative. The study showed that even when interacting with an AI, if participants believed they were interacting with a human, the bias learned was less than if they knew it was an AI.
Humans underestimate AI's biasing influence. People are often unaware of the extent to which AI systems affect their judgments, which can make them more vulnerable to adopting AI-driven biases.
Accurate AI improves human judgment. The study also demonstrated that interaction with accurate AI systems can improve human decision-making, suggesting that reducing algorithmic bias has the potential to enhance the quality of human judgment.

Monday Feb 17, 2025

University of California Irvine: What Large Language Models Know and What People Think They Know

Monday Feb 17, 2025

Summary of https://www.researchgate.net/publication/388234257_What_large_language_models_know_and_what_people_think_they_know
This study investigates how well large language models (LLMs) communicate their uncertainty to users and how human perception aligns with the LLMs' actual confidence. The research identifies a "calibration gap" where users overestimate LLM accuracy, especially with default explanations.
Longer explanations increase user confidence without improving accuracy, indicating shallow processing. By tailoring explanations to reflect the LLM's internal confidence, the study demonstrates a reduction in both the calibration and discrimination gaps, leading to improved user perception of LLM reliability.
The study underscores the importance of transparent uncertainty communication for trustworthy AI-assisted decision-making, advocating for explanations aligned with model confidence.
The study examines how well large language models (LLMs) communicate uncertainty and how humans perceive the accuracy of LLM responses. It identifies gaps between LLM confidence and human confidence, and explores methods to improve user perception of LLM accuracy.
Here are 5 key takeaways:
Calibration and Discrimination Gaps: There's a notable difference between an LLM's internal confidence in its answers and how confident humans are in those same answers. Humans often overestimate the accuracy of LLM responses, and are not good at distinguishing between correct and incorrect answers based on default explanations.
Explanation Length Matters: Longer explanations from LLMs tend to increase user confidence, even if the added length doesn't actually improve the accuracy or informativeness of the answer.
Uncertainty Language Influences Perception: Human confidence is strongly influenced by the type of uncertainty language used in LLM explanations. Low-confidence statements lead to lower human confidence, while high-confidence statements lead to higher human confidence.
Tailoring Explanations Reduces Gaps: By adjusting LLM explanations to better reflect the model's internal confidence, the calibration and discrimination gaps can be narrowed. This improves user perception of LLM accuracy.
Limited User Expertise: Participants in the study generally lacked the expertise to accurately assess LLM responses independently. Even when users altered the LLM's answer, their accuracy was lower than the LLM's.

Monday Feb 17, 2025

Stanford University: The Labor Market Effects of Generative Artificial Intelligence

Monday Feb 17, 2025

Summary of https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5136877
This research paper explores the impact of Generative AI on the labor market. A new survey analyzes the use of these tools, finding that they are most commonly used by younger, more educated, and higher-income individuals in specific industries.
The study finds that approximately 30% of respondents have used Generative AI at work. It investigates the efficiency gains from using Generative AI and its role in job searches. The paper aims to measure the large-scale labor market effects of Generative AI and the wage structure impacts of such tools. Finally, the researchers intend to continue tracking Generative AI and its effect on the labor market in real-time.
Here are the key takeaways regarding the labor market effects of Generative AI, according to the source:
As of December 2024, 30.1% of survey respondents over 18 have used Generative AI at work since these tools became available to the public.
Generative AI tools are most commonly used by younger, more educated, and higher-income individuals, as well as those in customer service, marketing, and IT.
A survey found that workers use generative AI for about one-third of their work week, which is equivalent to an average of 7 tasks per week. Generative AI has been used to assist workers in doing tasks more quickly.
Workers using Generative AI spend approximately 30 minutes interacting with the tool to complete a task, which they estimate would take 90 minutes without it, suggesting that Generative AI can potentially triple worker productivity.
The impact of LLMs can be a substitute for some forms of labor while also acting as a productivity-enhancing complement for other forms of labor.

Monday Feb 17, 2025

Hugging Face: Fully Autonomous AI Agents Should Not Be Developed

Monday Feb 17, 2025

Summary of https://arxiv.org/pdf/2502.02649
The paper argues against developing fully autonomous AI agents due to the increasing risks they pose to human safety, security, and privacy.
It analyzes different levels of AI agent autonomy, highlighting how risks escalate as human control diminishes. The authors contend that while semi-autonomous systems offer a more balanced risk-benefit profile, fully autonomous agents have the potential to override human control.
They emphasize the need for clear distinctions between agent autonomy levels and the development of robust human control mechanisms. The research also identifies potential benefits related to assistance, efficiency, and relevance, but concludes that the inherent risks, especially concerning accuracy and truthfulness, outweigh these advantages in fully autonomous systems.
The paper advocates for caution and control in AI agent development, suggesting that human oversight should always be maintained, and proposes solutions to better understand the risks associated with autonomous systems.
Here are five key takeaways regarding the development and ethical implications of AI agents, according to the source:
The development of fully autonomous AI agents—systems that can write and execute code beyond predefined constraints—should be avoided due to potential risks.
Risks to individuals increase with the autonomy of AI systems because the more control ceded to an AI agent, the more risks arise. Safety risks are particularly concerning, as they can affect human life and impact other values.
AI agent levels can be categorized on a scale that corresponds to decreasing user input and decreasing code written by developers, which means the more autonomous the system, the more human control is ceded.
Increased autonomy in AI agents can amplify existing vulnerabilities related to safety, security, privacy, accuracy, consistency, equity, flexibility, and truthfulness.
There are potential benefits to AI agent development, particularly with semi-autonomous systems that retain some level of human control, which may offer a more favorable risk-benefit profile depending on the degree of autonomy and complexity of assigned tasks. These benefits include assistance, efficiency, equity, relevance, and sustainability.

Monday Feb 17, 2025

University of Cologne: AI Meets the Classroom – When Does ChatGPT Harm Learning?

Monday Feb 17, 2025

Summary of https://arxiv.org/pdf/2409.09047
This paper explores the effects of large language models (LLMs) on student learning in coding classes. Three studies were conducted to analyze how LLMs impact learning outcomes, revealing both positive and negative effects.
Using LLMs as personal tutors by asking for explanations was found to improve learning, while relying on them to solve exercises hindered it.
Copy-and-paste functionality was identified as a key factor influencing LLM usage and its subsequent impact. The research also demonstrates that students may overestimate their learning progress when using LLMs, highlighting potential pitfalls.
Finally, results indicated that less skilled students may benefit more from LLMs when learning to code.
Here are five key takeaways regarding the use of Large Language Models (LLMs) in learning to code, according to the source:
LLMs can have both positive and negative effects on learning outcomes. Using LLMs as personal tutors by asking for explanations can improve learning, but relying on them excessively to solve practice exercises can impair learning.
Copy-and-paste functionality plays a significant role in how LLMs are used. It enables solution-seeking behavior, which can decrease learning.
Students with less prior domain knowledge may benefit more from LLM access. However, those new to LLMs may be more prone to over-reliance.
LLMs can increase students’ perceived learning progress, even when controlling for actual progress. This suggests that LLMs may lead to an overestimation of one’s own abilities.
The effect of LLM usage on learning depends on balancing reliance on LLM-generated solutions and using LLMs as personal tutors, and can vary depending on the specific case.

Monday Feb 17, 2025

MIT Sloan: AI Detectors Don't Work – Here's What to Do Instead

Monday Feb 17, 2025

Summary of https://mitsloanedtech.mit.edu/ai/teach/ai-detectors-dont-work
AI detection software is unreliable and should not be used to police academic integrity. Instead, instructors should establish clear AI use policies, promote transparent discussions about appropriate AI usage, and design engaging assignments that motivate genuine student learning.
Thoughtful assignment design can foster intrinsic motivation and reduce the temptation to misuse AI. It is also important to employ inclusive teaching methods and fair assessments so all students have the opportunity to succeed. Ultimately, the source promotes the idea that human-centered learning experiences will always be more impactful for students.
Here are the key takeaways regarding AI use in education, according to the source:
AI detection software is unreliable and can lead to false accusations of misconduct.
It is important to establish clear policies and expectations regarding if, when, and how AI should be used in coursework, and communicate these to students in writing and in person.
Instructors should promote transparency and open dialogue with students about AI tools to build trust and facilitate meaningful learning.
Thoughtfully designed assignments can foster intrinsic motivation and reduce the temptation to misuse AI.
To ensure inclusive teaching, use a mix of assessment approaches to give every student an equitable opportunity to demonstrate their capabilities.

Monday Feb 10, 2025

Anthropic: Which Economic Tasks Are Performed with AI? Evidence from Millions of Claude Conversations

Monday Feb 10, 2025

Summary of https://assets.anthropic.com/m/2e23255f1e84ca97/original/Economic_Tasks_AI_Paper.pdf
This research paper uses data from four million conversations on the Claude.ai platform to empirically analyze how artificial intelligence (AI) is currently used across various occupational tasks in the US economy.
The study maps these conversations to the US Department of Labor's O*NET database to identify usage patterns, finding that AI is most heavily used in software development and writing tasks. The analysis also examines the depth of AI integration within occupations, the types of skills involved in human-AI interactions, and how AI is used to augment or automate tasks.
The researchers acknowledge limitations in their data and methodology but highlight the importance of their empirical approach for tracking AI's evolving role in the economy. The findings suggest AI's current impact is task-specific rather than resulting in complete job displacement.
Here are some surprising facts revealed by the analysis of AI usage patterns in the sources:
AI is not primarily used for automating entire job roles, but rather for specific tasks within occupations. While there is a lot of discussion about AI replacing jobs, the data suggests that AI is more commonly used to enhance human capabilities in specific tasks. This is reflected in the finding that only about 4% of occupations use AI for at least 75% of their tasks.
The peak AI usage is in mid-to-high wage occupations, not in the highest wage brackets. It might be expected that AI would be adopted most in the highest-paying professions, but the analysis shows that occupations requiring considerable preparation, such as those needing a bachelor's degree, and those with mid-to-high salaries are seeing more AI use. This could be because these roles involve tasks that are well-suited to current AI capabilities.
AI is being used for both augmentation and automation almost equally. While there's a lot of focus on AI replacing human work, the study found that 57% of AI interactions showed augmentative patterns (enhancing human capabilities), while 43% demonstrated automation-focused usage (performing tasks directly). This reveals that AI is serving both as an efficiency tool and a collaborative partner.
Cognitive skills are highly represented in AI conversations, but not necessarily at an expert level. Skills like Critical Thinking, Reading Comprehension, and Writing are prevalent, however, the analysis only captures whether a skill was exhibited in the AI's responses, not whether that skill was central to the user's purpose or performed at an expert level. For example, active listening appears as a common skill because the AI rephrases user inputs and asks clarifying questions, rather than because users are seeking listening-focused interactions.
There is a clear specialization in how different AI models are used. For instance, Claude 3.5 Sonnet is more used for coding and software development, while Claude 3 Opus is preferred for creative and educational work. This suggests that different models are not interchangeable, but rather are being adopted to meet specific needs in the economy.
A significant portion of "non-work" interactions still mapped meaningfully to occupational tasks. For example, personal nutrition planning related to dietitian tasks, automated trading strategy development related to financial analyst tasks, and travel itinerary planning related to travel agent tasks. This suggests that AI is influencing a variety of tasks, even in informal contexts.
AI usage is not evenly distributed across all sectors. The study found the highest AI usage in tasks associated with software development, technical writing, and analytical roles. Occupations involving physical labor and those requiring extensive specialized training showed notably lower usage.