Author: Martin Anderson
-
The Challenge of Captioning Video at More Than 1fps
The ability for machine learning systems to recognize the events that occur inside a video is crucial to the future of AI-based video generation – not least because video datasets require accurate captions in order to produce models that adhere to a user’s request, and that do not excessively hallucinate. An example of a captioning
-
Why AI Video Sometimes Gets It Backwards
If 2022 was the year that generative AI captured a wider public’s imagination, 2025 is the year where the new breed of generative video frameworks coming from China seems set to do the same. Tencent’s Hunyuan Video has made a major impact on the hobbyist AI community with its open-source release of a full-world video
-
The Road to Better AI-Based Video Editing
The video/image synthesis research sector regularly outputs video-editing* architectures, and over the last nine months, outings of this nature have become even more frequent. That said, most of them represent only incremental advances on the state of the art, since the core challenges are substantial. However, a new collaboration between China and Japan this week
-
Nearly 80% of Training Datasets May Be a Legal Hazard for Enterprise AI
A recent paper from LG AI Research suggests that supposedly ‘open’ datasets used for training AI models may be offering a false sense of security – finding that nearly four out of five AI datasets labeled as ‘commercially usable’ actually contain hidden legal risks. Such risks range from the inclusion of undisclosed copyrighted material to
-
Rethinking Video AI Training with User-Focused Data
The kind of content that users might want to create using a generative model such as Flux or Hunyuan Video may not be always be easily available, even if the content request is fairly generic, and one might guess that the generator could handle it. One example, illustrated in a new paper that we’ll take
-
Enhancing the Accuracy of AI Image-Editing
Although Adobe’s Firefly latent diffusion model (LDM) is arguably one of the best currently available, Photoshop users who have tried its generative features will have noticed that it is not able to easily edit existing images – instead it completely substitutes the user’s selected area with imagery based on the user’s text prompt (albeit that
-
Shielding Prompts from LLM Data Leaks
Opinion An interesting IBM NeurIPS 2024 submission from late 2024 resurfaced on Arxiv last week. It proposes a system that can automatically intervene to protect users from submitting personal or sensitive information into a message when they are having a conversation with a Large Language Model (LLM) such as ChatGPT. Mock-up examples used in a
-
Automating Copyright Protection in AI-Generated Images
As discussed last week, even the core foundation models behind popular generative AI systems can produce copyright-infringing content, due to inadequate or misaligned curation, as well as the presence of multiple versions of the same image in training data, leading to overfitting, and increasing the likelihood of recognizable reproductions. Despite efforts to dominate the generative
-
A Forensic Data Method for a New Generation of Deepfakes
Although the deepfaking of private individuals has become a growing public concern and is increasingly being outlawed in various regions, actually proving that a user-created model – such as one enabling revenge porn – was specifically trained on a particular person’s images remains extremely challenging. To put the problem in context: a key element of
-
The Future of RAG-Augmented Image Generation
Generative diffusion models like Stable Diffusion, Flux, and video models such as Hunyuan rely on knowledge acquired during a single, resource-intensive training session using a fixed dataset. Any concepts introduced after this training – referred to as the knowledge cut-off – are absent from the model unless supplemented through fine-tuning or external adaptation techniques like