For anyone interested in recreating this, there is a very nice paper titled "Differentiable Drawing and Sketching" [arXiv](https://arxiv.org/abs/2103.16194) with a easy-to-use implementation ([github](https://github.com/jonhare/DifferentiableSketching))


Also, "Differentiable Vector Graphics Rasterization" [github](https://github.com/BachiLi/diffvg)


Wow this is really cool! I had actually been looking for something similar but couldn't find it and thus made this.


This is a POC I had for neural rendering. The model is just trying to minimize the L2 distance between this output and a ground truth image (in this case the celeb dataset). What you are seeing are the validation steps during a training run. Try to follow a single shape as it converges. The shapes can start out in any formation but a 4x4 grid looks very interesting. There are lots of possibilities to expand on this concept. I am considering writing a short manuscript just to get the ideas out there.


I'm assuming that the shapes are not discrete when computing the loss, right? I'd imagine that they are fuzzy and go on to infinity but then become discretised for the output, right?


Absolutely correct.


You should also show what the image with the continuous shapes looks like. Maybe they look closer to faces.


I love coming up with these differentiable analogs of discrete things. Well done.


Are you able to share the code?


It is still very much WIP. I will try to set something up soon! I can DM you once I do.


This is awesome man


I can't be the only one seeing this. https://youtu.be/dSGdbsf8UjQ


scrolled way to far for this


Cool stuff! Reading the comments here, I thought you might find these things interesting: [(Improved) SPIRAL](https://deepmind.com/research/publications/2019/Unsupervised-Doodling-and-Painting-with-Improved-SPIRAL) — a generative image drawing model using reinforcement learning. [NVDiffRast](https://nvlabs.github.io/nvdiffrast/) — a PyTorch / TensorFlow library for differentiable rasterizing. [ES-CLIP](https://es-clip.github.io) — a framework that allows non-differentiable image rendering pipelines to match reference images or textual target descriptions encoded through CLIP using evolutionary strategies.


The thumbnails of this video look particularly good, it's obviously producing enough information to match low spatial frequency visual information. It would be interesting to me if there's any advantage to doing it in stages; applying "smaller" shapes in some way to fill in details that the larger ones exclude, along the lines of how painters block out colours and move to finer brushstrokes over time. The only way I can think to do that though is adding a smooth cutoff term based on shape area in the loss, and then iteratively shrinking the cutoff.


Not trying to discount research or your work but I am not sure where is the novelty here OP... I mean this doesn't look like it accomplishes anything new which cant be done by even the basic architecture neither does it provide any insights into a phenomenon... I mean something like this could be done by any flavor of GAN / VAE with the most basic of loss function... also, even though calling it "differentiable 2D rendering" is not completely wrong but it would be equivalent to calling a cat "proto-lion"...


Sometimes the application is just as important as the process. Machine learning is an extension of humanity's exploration into art and culture as much as it is about novel software architecture. I think this is a really fantastic demonstration of the perception of the human face reduced down to its most fundamental forms. It's a concept I would never have considered had it not been posted here and I think that that novelty alone entirely justifies its presence in this sub. Top work u/zimonitrome


The results are not super stunning, I know. These images aren't even generated from any distribution nor a generative model. They are just sample to sample for now. My aim was to represent images by simple primitives such as geometric shapes or lines as opposed to the dense pixel outputs given by normal CNNs etc.. The shapes that the model outputs can be drawn with vector graphics instead of raster graphics, but these vectors can also be rendered to a dense pixel n-d array in a differentiable process. I haven't seen many (any?) people do 2d neural rendering.But 3d neural rendering is big. I bet there are interesting "neural rendering in 3d, projected to 2d" projects that I haven't seen yet. An analog to this in 3D can be found at: https://arxiv.org/abs/1612.00404


Don't worry, some of us had the expected "oh that's neat" reaction


You might also want to try a single primitive, like a geon.


That could be very useful. I have had trouble constructing an arbitrary shape though. Maybe I will read up more on this. Sometimes just finding the right vocabulary helps a long way. "Geon" is a first for me. Thanks!


I'm unsure what part of this involve machine learning. This just look like an optimization problem with translation, scaling and rotation of each shape as free parameter


That is really cool, I wonder if there something like that for engineering design. Thanks for sharing.


If Mondrian and Picasso have a child 👶


Is this an open source project by any chance? I'd love to look through the code and learn how you do things like this!


I was wondering what would be the senses?


Everyone turns into a butt face


Those faces look like they were made of sausages


Nice results! Very intriguing abstraction! ​ I did a similar try using the differentiable rasterization. (But in a very different context). I am doing it without relaxation (continuous), but explicitly discrete. Although the algorithm is very stupid.... https://luxxxlucy.github.io/projects/2021\_terpret/index.html


Woah, looks like it could be useful even if it is "stupid" like you call it.






I'd be more interested in the reverse. Taking a face and transforming it into a series of shapes.


That is pretty much what is happening.


Perhaps I misunderstand, but in the video it starts at shapes and goes more towards faces. Is there something I'm missing?


Each "image" is produced by evaluating a model during different time steps in the training phase. The model takes an image as input and tries to estimate what shapes to output in order to re-create the input image.


Neural networks is a stupid name.


Наha fasses