Abstract

Scalable Vector Graphics (SVG) is an important image format widely adopted in graphic design because of their resolution independence and editability. The study of generating high-quality SVG has continuously drawn attention from both designers and researchers in the AIGC community. However, existing methods either produces unstructured outputs with huge computational cost or is limited to generating monochrome icons of over-simplified structures. To produce high-quality and complex SVG, we propose OmniSVG, a unified framework that leverages pre-trained Vision-Language Models (VLMs) for end-to-end multimodal SVG generation. By parameterizing SVG commands and coordinates into discrete tokens, OmniSVG decouples structural logic from low-level geometry for efficient training while maintaining the expressiveness of complex SVG structure. To further advance the development of SVG synthesis, we introduce MMSVG-2M, a multimodal dataset with two million richly annotated SVG assets, along with a standardized evaluation protocol for conditional SVG generation tasks. Extensive experiments show that OmniSVG outperforms existing methods and demonstrates its potential for integration into professional SVG design workflows.

Paper: https://arxiv.org/abs/2504.06263

Code: https://github.com/OmniSVG/OmniSVG/

Weights: https://huggingface.co/OmniSVG/OmniSVG

Project Page: https://omnisvg.github.io/

Demo: https://huggingface.co/spaces/OmniSVG/OmniSVG-3B

  • rizzothesmall@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    5
    ·
    11 days ago

    I am very into this if it can take a non-vector graphic as input and work to that. OpenAI’s attempts at that have been complete dickfarts

    • paraphrand@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      11 days ago

      This is the first time I’ve seen a model target SVG drafting. Anything you have seen previously about unicorns or whatever was just someone experimenting with interesting edge case usage of models not designed for this purpose.

      Feeding a language model a bunch of vector art does not seem productive to me. So it makes sense that something like GPT4 sucks at it.

      • GenderNeutralBro@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        5
        ·
        11 days ago

        Hard to judge quality when what we’re seeing is practically a pixel-perfect recreation. The tricky part of automated vectorization is detecting and plotting curves in such a way that it scales correctly. Bad implementations will use too many elements, or include straight lines that should be parts of curves, etc. Those errors would not be visible in those low-res rasterizations.

          • GenderNeutralBro@lemmy.sdf.org
            link
            fedilink
            English
            arrow-up
            3
            ·
            11 days ago

            Just gave it a try. I couldn’t get coherent results from img-to-svg with a few different tests of low-res pixel art and high-res cartoons. txt-to-svg also gave me incoherent blobs even with simple prompts. Something must be wrong there. Is it working for anyone else?

            I might just try installing it locally when I get home.

  • outhouseperilous@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    2
    ·
    10 days ago

    Okay, you let me tie this into a soreadsheet ir something to geberate charts, and there’s finally a use case for this that i like.

    Im not sure it’s worth needing a 5080 to make ultra pretty graphs, but, you know; smoke em if you got em.