This Week in AI — Live Ranking

half the internet is terrified of AI. we are on the other half, taking this into our daily life, trying to understand better.

we use AI in everything we do — so every monday we read the whole week of it and work out what actually happened. something real, from heavy users. it makes our day if what we produce makes someone find AI more interesting for their life.

read our version of what happened this week in AI. free.read this week →

Hugging Face Daily Papers2 HR AGO/ primary source

BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language

A new neuroscience model, BrainJanus, unifies brain activity, vision, and language, enabling two-way conversions between them.

What happened

BrainJanus is the first model to integrate brain activity with visual and linguistic sensory inputs within a unified framework.
It allows for bidirectional mapping, converting images/text to brain activity and vice-versa, by quantizing continuous brain data into 'tokens'.

Why it matters

This unified approach marks a significant advance in neuroscience, moving beyond separate encoding and decoding tasks.
It could lay foundational groundwork for future technologies that better understand and interact with human perception and thought.

— the story beneath

Hugging Face Daily Papers2 HR AGO/ primary source

Xiaomi-GUI-0 Technical Report

Institution: Xiaomi Research | Authors: Wanxia Cao, Chengzhen Duan, Pei Fu, Pengzhi Gao, Niu Lian arXiv Links arXiv | PDF AI summary Abstract A native multimodal GUI agent trained in real-device environments demonstrates superior performance and stability compared to traditional benchmark-based approaches. Generated by Qwen/Qwen2.5-Coder-32B-Instruct 摘要：与传统的基于基准的方法相比，在真实设备环境中训练的本机多模式 GUI 代理表现出卓越的性能和稳定性。由 Qwen/Qwen2.5-Coder-32B-Instruct 生成 Abstract Generated by Qwen/Qwen2.5-Coder-32B-Instruct Gr...

Hugging Face Daily Papers2 HR AGO/ primary source

PolyFlow: Continuous Topology Embedding Flow Matching for Artist-style Mesh Generation

Tencent Hunyuan's new PolyFlow technology promises to greatly accelerate high-quality 3D mesh generation by enabling parallel processing.

What happened

Tencent Hunyuan unveiled 'PolyFlow,' a new 3D mesh generation technique that significantly speeds up model creation.
PolyFlow converts discrete mesh data into a continuous representation, allowing for parallel processing via a Transformer-based framework.

Why it matters

This could drastically reduce the time and resources needed for 3D content creation, potentially lowering barriers for businesses in virtual spaces.
Faster and more precise 3D model generation could lead to richer, more dynamic digital experiences in industries like gaming, design, and virtual commerce.

Hugging Face Daily Papers2 HR AGO/ primary source

PhotoQuilt: Training-Free Arbitrary-Resolution Photomosaics via Bootstrapped Tiled Denoising

University of Toronto researchers developed PhotoQuilt, a new framework to efficiently generate high-resolution photo mosaics, overcoming diffusion mo

What happened

University of Toronto team launched PhotoQuilt, a framework for generating high-resolution photo mosaics.
The system uses 'bootstrap tiled denoising' to maintain both overall structure and individual tile detail.

Why it matters

This innovation addresses a key challenge in AI image generation, where previous models struggled with large-scale mosaic creation.
It demonstrates progress in AI's ability to handle complex image synthesis efficiently, potentially impacting future visual content creation tools.

Hugging Face Daily Papers2 HR AGO/ primary source

AVTok: 1D Unified Tokenization for Holistic Audio-Video Generation

New AI technology, AVTok, unifies audio and video generation, making content creation more natural and synchronized.

What happened

AVTok, a new unified tokenizer, integrates audio and video AI generation processes.
It uses a dual-stream transformer architecture to efficiently encode audio-visual pairs into one compact data representation.

Why it matters

This integration addresses the challenge of out-of-sync and unnatural content in AI-generated video with sound.
It's an important step toward future large-scale multimodal AI models that can handle audio and video together more effectively.

Hugging Face Daily Papers2 HR AGO/ primary source

Evolution Fine-Tuning: Learning to Discover Across 371 Optimization Tasks

Institution: Minnesota NLP | Authors: Young-Jun Lee, Seungone Kim, Minki Kang, Alistair Cheong Liang Chuen, Zerui Chen arXiv Links arXiv | PDF AI summary Abstract Evolutionary fine-tuning enables large language models to develop cross-task problem-solving capabilities by learning from search trajectories, demonstrating improved performance on mathematical conjectures and optimization tasks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct 摘要进化微调使大型语言模型能够通过学习搜索轨迹来开发跨任务问题解决能力，从而提高数学猜想和优化任务的性能。由 Qwen...

— the rundown

MemLearner: Learning to Query Context memory for Video World Models

A new AI technique, MemLearner, improves video generation consistency by enhancing how models recall past information.

Hugging Face Daily Papers2 HR AGO/ primary source

DOPD: Dual On-policy Distillation

A new AI learning method, DOPD, enhances how large AI models transfer knowledge by solving the 'privilege illusion' problem.

Hugging Face Daily Papers2 HR AGO/ primary source

Dockerless: Environment-Free Program Verifier for Coding Agents

ByteDance researchers developed "Dockerless," a new system to verify AI-generated code without needing a traditional execution environment.

Hugging Face Daily Papers2 HR AGO/ primary source

BlockPilot: Instance-Adaptive Policy Learning for Diffusion-based Speculative Decoding

A new research technique, BlockPilot, accelerates AI model inference by dynamically optimizing block sizes, improving speculative decoding.

Hugging Face Daily Papers2 HR AGO/ primary source

LUMOS: A Semantic Operating-System Layer for Accessibility-Grounded AI Agents

University of Texas Dallas researchers developed LUMOS, a new layer enabling AI to interact with operating systems more efficiently by converting visu

Hugging Face Daily Papers5 HR AGO/ primary source

Drop-Then-Recovery: How Redundant Are Vision-Language-Action Models?

Institution: LLM-Drop | Authors: Guoheng Sun, Kaixi Feng, Shwai He, Xiaochuan Gong, Yexiao He arXiv Links arXiv | PDF AI summary Abstract Research reveals that language backbones in Vision-Language-Action models are highly redundant for robotic manipulation tasks, while vision and action pathways are more critical, suggesting need for deliberate capacity allocation in future architectures. Generated by Qwen/Qwen2.5-Coder-32B-Instruct 摘要研究表明，视觉-语言-动作模型中的语言主干对于机器人操作任务来说是高度冗余的，而视觉和动作路径则更为关键，这表明在未来...

Hugging Face Daily Papers10 HR AGO/ primary source

One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining

Yandex Research has found new optimization methods to make training large AI models more efficient by tackling gradient delay.

Hugging Face Daily Papers15 HR AGO/ primary source

MirrorPPR: Exemplar-Based Portrait Photo Retouching

Institution: DENG Lab @ SJTU | Authors: Zhihong Liu, Zheng Li, Jiachun Jin, Siqi Kou, Yitao Jian arXiv Links arXiv | PDF AI summary Abstract Exemplar-based portrait retouching framework using Diffusion Transformer with LoRA adaptation and self-augmented training data achieves superior quality and identity preservation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct 摘要基于示例的肖像修饰框架使用具有 LoRA 适应和自我增强训练数据的 Diffusion Transformer，实现了卓越的质量和身份保留。由 Qwen/Qwen2.5-Coder-32B-Instruct 生成 Abstract Generated by Q...

Hugging Face Daily Papers8 HR AGO/ primary source

Delayed Verification Destabilizes Multi-Agent LLM Belief: Instability Thresholds and Optimal Corrector Placement

New research uncovers that verification delays in multi-agent LLM systems can cause unstable information and 'belief' oscillations.

Hugging Face Daily Papers10 HR AGO/ primary source

LLM Program Optimization via Retrieval Augmented Search

University of Pennsylvania researchers developed new LLM-based methods, RAS and AEGIS, significantly optimizing C++ and Python code performance.

Hugging Face Daily Papers12 HR AGO/ primary source

SWE-Together: Evaluating Coding Agents in Interactive User Sessions

Meta's new SWE-Together benchmark evaluates AI coding assistants on interaction efficiency, not just final code accuracy.

Hugging Face Daily Papers12 HR AGO/ primary source

How Good Can Linear Models Be for Time-Series Forecasting?

New research shows data preprocessing can be more effective and cheaper than complex AI models for time series forecasting.

Hugging Face Daily Papers27 HR AGO/ primary source

A Gravitational Interpretation of Fine-Tuning Reversion

Institution: Mohamed Bin Zayed University of Artificial Intelligence | Authors: Samuele Poppi, Nils Lukas arXiv Links arXiv | PDF AI summary Abstract Post-alignment safety degradation arises from geometric properties of training history, where fine-tuning reversion follows a persistent direction defined by early training dynamics. Generated by Qwen/Qwen2.5-Coder-32B-Instruct 摘要对齐后安全性下降是由训练历史的几何特性引起的，其中微调回归遵循早期训练动态定义的持久方向。由 Qwen/Qwen2.5-Coder-32B-Instruct 生成 Abstract Generated by Qwen/Qwen2.5-C...

Hugging Face Daily Papers15 HR AGO/ primary source

RocketSmith: Agentic Additive Manufacturing of High-Powered Rockets

Carnegie Mellon University researchers successfully used an AI system called RocketSmith to automate high-powered rocket design and manufacturing.

Hugging Face Daily Papers15 HR AGO/ primary source

Weekly AI brief for practitioners

A practitioner-first AI newsletter by dera.ai — what changed, why it matters, what to try next

By dera.ai • Every Monday, free

We respect your privacy. Unsubscribe at any time.