Author: Martin Anderson

  • Even State-Of-The-Art Language Models Struggle to Understand Temporal Logic

    Even State-Of-The-Art Language Models Struggle to Understand Temporal Logic

    Predicting future states is a critical mission in computer vision research – not least in robotics, where real-world situations must be considered. Machine learning systems entrusted with mission-critical tasks therefore need adequate understanding of the physical world. However, in some cases, an apparently impressive knowledge of temporal reality could be deceptive: a new paper from

  • How to Train and Use Hunyuan Video LoRA Models

    How to Train and Use Hunyuan Video LoRA Models

    This article will show you how to install and use Windows-based software that can train Hunyuan video LoRA models, allowing the user to generate custom personalities in the Hunyuan Video foundation model: https://www.unite.ai/wp-content/uploads/2025/01/loras-hunyuan-AE2.mp4 Click to play. Examples from the recent explosion of  celebrity Hunyuan LoRAs from the civit.ai community. At the moment the two most

  • Cooking Up Narrative Consistency for Long Video Generation

    Cooking Up Narrative Consistency for Long Video Generation

    The recent public release of the Hunyuan Video generative AI model has intensified ongoing discussions about the potential of large multimodal vision-language models to one day create entire movies. However, as we have observed, this is a very distant prospect at the moment, for a number of reasons. One is the very short attention window

  • Estimating Facial Attractiveness Prediction for Livestreams

    Estimating Facial Attractiveness Prediction for Livestreams

    To date, Facial Attractiveness Prediction (FAP) has primarily been studied in the context of psychological research, in the beauty and cosmetics industry, and in the context of cosmetic surgery. It’s a challenging field of study, since standards of beauty tend to be national rather than global. This means that no single effective AI-based dataset is

  • The Rise of Hunyuan Video Deepfakes

    The Rise of Hunyuan Video Deepfakes

    Due to the nature of some of the material discussed here, this article will contain fewer reference links and illustrations than usual. Something noteworthy is currently happening in the AI synthesis community, though its significance may take a while to become clear. Hobbyists are training generative AI video models to reproduce the likenesses of people

  • A Personal Take On Computer Vision Literature Trends in 2024

    A Personal Take On Computer Vision Literature Trends in 2024

    I’ve been continuously following the computer vision (CV) and image synthesis research scene at Arxiv and elsewhere for around five years, so trends become evident over time, and they shift in new directions every year. Therefore as 2024 draws to a close, I thought it appropriate to take a look at some new or evolving

  • Bridging the Space Between in Generative Video

    Bridging the Space Between in Generative Video

    New research from China is offering an improved method of interpolating the gap between two temporally-distanced video frames – one of the most crucial challenges in the current race towards realism for generative AI video, as well as for video codec compression. In the example video below, we see in the leftmost column a ‘start’

  • The Elusive Definition of ‘Deepfake’

    The Elusive Definition of ‘Deepfake’

    A compelling new study from Germany critiques the EU AI Act’s definition of the term ‘deepfake’ as overly vague, particularly in the context of digital image manipulation. The authors argue that the Act’s emphasis on content resembling real people or events – yet potentially appearing fake – lacks clarity. They also highlight that the Act’s

  • Improving  Green Screen Generation for Stable Diffusion

    Improving Green Screen Generation for Stable Diffusion

    Despite community and investor enthusiasm around visual generative AI, the output from such systems is not always ready for real-world usage; one example is that gen AI systems tend to output entire images (or a series of images, in the case of video), rather than the individual, isolated elements that are typically required for diverse

  • Can AI World Models Really Understand Physical Laws?

    Can AI World Models Really Understand Physical Laws?

    The great hope for vision-language AI models is that they will one day become capable of greater autonomy and versatility, incorporating principles of physical laws in much the same way that we develop an innate understanding of these principles through early experience. For instance, children’s ball games tend to develop an understanding of motion kinetics