Topic

Multimodal AI

AI systems that understand text, images, audio, and video

Total Articles
15
Coverage Period
15 days
Oct 2025
Last Updated
7 days ago

Coverage Timeline

No timeline data available

All Articles(15)

Showing 15 of 15 articles1 filter active
DeepSeek AI Unveils DeepSeek-OCR: Vision-Based Context Compression Redefines Long-Text Processing - infoq.com

DeepSeek AI Unveils DeepSeek-OCR: Vision-Based Context Compression Redefines Long-Text Processing - infoq.com

DeepSeek AI Unveils DeepSeek-OCR: Vision-Based Context Compression Redefines Long-Text Processing infoq.com

Chinese AIDeepSeek
Multimodal AI
1 weeks agoAsian AI Companies
DeepSeek-OCR Turns Text Into Vision, Slashing AI Costs - eWeek

DeepSeek-OCR Turns Text Into Vision, Slashing AI Costs - eWeek

DeepSeek-OCR Turns Text Into Vision, Slashing AI Costs eWeek

Chinese AIDeepSeek
Multimodal AI
1 weeks agoAsian AI Companies
Ring’s CEO says his cameras can almost ‘zero out crime’ within the next 12 months

Ring’s CEO says his cameras can almost ‘zero out crime’ within the next 12 months

Jamie Siminoff has returned to Ring, the company he founded, with a renewed focus on its mission statement to "Make neighborhoods safer." Talking to The Verge ahead of the release of his new book Ding Dong, Siminoff says he believes the new wave of AI could finally help him fulfill that vision. "When I left,

Multimodal AI
1 weeks agoThe Verge
Sources: Multimodal AI startup Fal.ai already raised at $4B+ valuation

Sources: Multimodal AI startup Fal.ai already raised at $4B+ valuation

Fal.ai announced its previous round at a $1.5 billion valuation in July.

Funding & BusinessMultimodal AI
1 weeks agoTechCrunch
DeepSeek drops open-source model that compresses text 10x through images, defying conventions

DeepSeek drops open-source model that compresses text 10x through images, defying conventions

DeepSeek , the Chinese artificial intelligence research company that has repeatedly challenged assumptions about AI development costs , has released a new model that fundamentally reimagines how large language models process information—and the implications extend far beyond its modest branding as an optical character recognition tool. The company's DeepSeek-OCR model , released Monday with full open-source code and weights , achieves what researchers describe as a paradigm inversion: compressin...

Chinese AIDeepSeek
Developer ToolsMultimodal AI
1 weeks agoVentureBeat
Google Fi to add AI-enhanced audio and RCS web messaging

Google Fi to add AI-enhanced audio and RCS web messaging

The tech giant says calls will now use AI-enhanced audio to reduce background noise and improve voice clarity, even when speaking to someone on a landline or older device.

Multimodal AI
1 weeks agoTechCrunch
DeepSeek-OCR: New Open-source AI Model Goes Viral On GitHub - Dataconomy

DeepSeek-OCR: New Open-source AI Model Goes Viral On GitHub - Dataconomy

DeepSeek-OCR: New Open-source AI Model Goes Viral On GitHub Dataconomy

Chinese AIDeepSeek
Developer ToolsMultimodal AI
1 weeks agoAsian AI Companies
Deepseek's OCR system compresses image-based text so AI can handle much longer documents - the-decoder.com

Deepseek's OCR system compresses image-based text so AI can handle much longer documents - the-decoder.com

Deepseek's OCR system compresses image-based text so AI can handle much longer documents the-decoder.com

Chinese AIDeepSeek
Multimodal AI
1 weeks agoAsian AI Companies
World's largest open-source multimodal dataset delivers 17x training efficiency, unlocking enterprise AI that connects documents, audio and video

World's largest open-source multimodal dataset delivers 17x training efficiency, unlocking enterprise AI that connects documents, audio and video

AI models are only as good as the data they're trained on. That data generally needs to be labeled, curated and organized before models can learn from it in an effective way. One of the big missing links in the AI ecosystem has been the availability of a large high-quality open-source multimodal dataset. That changes today with the debut of the EMM-1 dataset which is comprised of 1 billion data pairs and 100M data groups across 5 modalities: text, image, video, audio and 3d point clouds. Multimo...

Enterprise AIFunding & Business
1 weeks agoVentureBeat
Uber is turning its app into an AI training ground

Uber is turning its app into an AI training ground

In its quest to become the ultimate app for “flexible work,” Uber launched today a new pilot to allow its US drivers and couriers to earn extra money by performing “microtasks” to train AI models. These tasks include audio voice recording, capturing and uploading images, and submitting documents in certain languages. The prompts will vary,

Consumer AIMultimodal AI
1 weeks agoThe Verge
All Windows 11 PCs Will Get These Advanced Copilot AI Features

All Windows 11 PCs Will Get These Advanced Copilot AI Features

All Windows 11 users will soon be able to talk to the Copilot AI assistant more easily via voice, and Copilot Vision can understand the context of your screen.

Microsoft
Consumer AIDeveloper Tools
1 weeks agoWired
Google’s AI video generator is getting better editing and more audio

Google’s AI video generator is getting better editing and more audio

Google is making videos created with the AI filmmaking tool Flow even more realistic — and harder to identify as AI-generated at first glance. The company announced Wednesday that users can add in and change the shadows and lighting of their AI videos. The expanded editing features in Flow are tied to the Veo 3.1

Multimodal AI
1 weeks agoThe Verge
Google releases new AI video model Veo 3.1 in Flow and API: what it means for enterprises

Google releases new AI video model Veo 3.1 in Flow and API: what it means for enterprises

As expected after days of leaks and rumors online, Google has unveiled Veo 3.1 , its latest AI video generation model, bringing a suite of creative and technical upgrades aimed at improving narrative control, audio integration, and realism in AI-generated video. While the updates expand possibilities for hobbyists and content creators using Google’s online AI creation app, Flow , the release also signals a growing opportunity for enterprises, developers, and creative teams seeking scalable, cust...

Consumer AICreative AI
2 weeks agoVentureBeat
Google releases Veo 3.1, adds it to Flow video editor

Google releases Veo 3.1, adds it to Flow video editor

Veo 3 already has edit features such as adding reference images to drive a character, providing the first and last frame to generate a clip using AI, and the ability to extend an existing video based on the last few frames. With Veo 3.1, Google is adding audio to all these features to make the clips more lively.

Multimodal AI
2 weeks agoTechCrunch
The Startup Using AI to Translate Documents Into Data

The Startup Using AI to Translate Documents Into Data

If you’ve ever uploaded a picture of a receipt to an expense report or read a PDF of a book online, you’ve likely used optical character recognition , a decades-old technique that converts images of typed, handwritten or printed text into text that’s editable on a computer. OCR might not sound like the sexiest market. But it’s interesting enough for Andreessen Horowitz , one of the most prolific backers of young AI startups in the last two years, to lead a new round of funding for Reducto , a st...

Funding & BusinessMultimodal AI
2 weeks agoThe Information