Author: Martin Anderson

  • The Road to Better AI-Based Video Editing

    The Road to Better AI-Based Video Editing

    The video/image synthesis research sector regularly outputs video-editing* architectures, and over the last nine months, outings of this nature have become even more frequent. That said, most of them represent only incremental advances on the state of the art, since the core challenges are substantial. However, a new collaboration between China and Japan this week

  • Nearly 80% of Training Datasets May Be a Legal Hazard for Enterprise AI

    Nearly 80% of Training Datasets May Be a Legal Hazard for Enterprise AI

    A recent paper from LG AI Research suggests that supposedly ‘open’ datasets used for training AI models may be offering a false sense of security – finding that nearly four out of five AI datasets labeled as ‘commercially usable’ actually contain hidden legal risks. Such risks range from the inclusion of undisclosed copyrighted material to

  • Rethinking Video AI Training with User-Focused Data

    Rethinking Video AI Training with User-Focused Data

    The kind of content that users might want to create using a generative model such as Flux or Hunyuan Video may not be always be easily available, even if the content request is fairly generic, and one might guess that the generator could handle it. One example, illustrated in a new paper that we’ll take

  • Enhancing the Accuracy of AI Image-Editing

    Enhancing the Accuracy of AI Image-Editing

    Although Adobe’s Firefly latent diffusion model (LDM) is arguably one of the best currently available, Photoshop users who have tried its generative features will have noticed that it is not able to easily edit existing images – instead it completely substitutes the user’s selected area with imagery based on the user’s text prompt (albeit that

  • Shielding Prompts from LLM Data Leaks

    Shielding Prompts from LLM Data Leaks

    Opinion An interesting IBM NeurIPS 2024 submission from late 2024 resurfaced on Arxiv last week. It proposes a system that can automatically intervene to protect users from submitting personal or sensitive information into a message when they are having a conversation with a Large Language Model (LLM) such as ChatGPT. Mock-up examples used in a

  • Automating Copyright Protection in AI-Generated Images

    Automating Copyright Protection in AI-Generated Images

    As discussed last week, even the core foundation models behind popular generative AI systems can produce copyright-infringing content, due to inadequate or misaligned curation, as well as the presence of multiple versions of the same image in training data, leading to overfitting, and increasing the likelihood of recognizable reproductions. Despite efforts to dominate the generative

  • A Forensic Data Method for a New Generation of Deepfakes

    A Forensic Data Method for a New Generation of Deepfakes

    Although the deepfaking of private individuals has become a growing public concern and is increasingly being outlawed in various regions, actually proving that a user-created model – such as one enabling revenge porn – was specifically trained on a particular person’s images remains extremely challenging. To put the problem in context: a key element of

  • The Future of RAG-Augmented Image Generation

    The Future of RAG-Augmented Image Generation

    Generative diffusion models like Stable Diffusion, Flux, and video models such as Hunyuan rely on knowledge acquired during a single, resource-intensive training session using a fixed dataset. Any concepts introduced after this training – referred to as the knowledge cut-off – are absent from the model unless supplemented through fine-tuning or external adaptation techniques like

  • Towards LoRAs That Can Survive Model Version Upgrades

    Towards LoRAs That Can Survive Model Version Upgrades

    Since my recent coverage of the growth in hobbyist Hunyuan Video LoRAs (small, trained files that can inject custom personalities into multi-billion parameter text-to-video and image-to-video foundation models), the number of related LoRAs available at the Civit community has risen by 185%. Despite the fact that there are no particularly easy or low-effort ways to

  • The Secret Routes That Can Foil Pedestrian Recognition Systems

    The Secret Routes That Can Foil Pedestrian Recognition Systems

    A new research collaboration between Israel and Japan contends that pedestrian detection systems possess inherent weaknesses, allowing well-informed individuals to evade facial recognition systems by navigating carefully planned routes through areas where surveillance networks are least effective. With the help of publicly available footage from Tokyo, New York and San Francisco, the researchers developed an