Origami folded by SPC audience, anonymous. Photo credit: Wild Horse

Generative Origami
August 5, 2024

This past weekend we participated in a SPC hackathon whereby we needed to implement Llama. The true purpose was to get our early technical team to solve a problem together. We were named a finalist.

Problem Statement

A fun problem we looked to solve was Generative Origami: the ability to prompt any design of an origami. A point of inspiration is Michal Kosmulski's posts.  After speaking with a handful of ML profs and research scientists, we realized that this was a good problem statement, but a hard problem.

If I gave you a square paper today and instructed you to fold a giraffe, not from memory and without any references, it would be quite challenging. Now, if you had previously folded an elephant or llama in the past, perhaps you would be able to fold something that resembled a giraffe after multiple attempts.

From principles thinking, Generative Origami should be able to generate (or iterate) on any known human origami design or machine algorithmic design. But to create any net new origami instruction would be pushing the frontiers of AI reasoning with mathematics.

Computational Origami in 2024

Such an effort would be comparable to AI reasoning efforts around IMO problems. The challenges of origami is constrained by eucleadian geometry, origami axioms, topology, computation geometry, group theory and more.  However, the mathematics shouldn't be the rate limiting step.

Computationally it can be expensive. For example, evaluating whether a given crease pattern folds into any flat origami is NP-hard But if set up properly, the search space of origami should not be  larger than EXPTIME-compete of AlphaGo.  With the right constraints, one could synthetically create folds then classify whether or not the final fold resembles a giraffe. However, this strict AI approach may not be the most efficient path. 

Another path is to use existing foundation models. However, I conjecture that the majority of origami knowledge are in origami books, whose complete knowledge is not represented in Google Brooks. Advanced origami lives in unpublished trade magazines or kept in hobbyists archives, which also is not represented on the Internet. That means key parts of origami knowledge may not be properly represented in the foundation model.

Zero-shot attempts to generate origami instructions today lead failure.  Ask ChatGPT or Claude directions to fold a giraffe, and the generated text is non-sensical. You can barely follow instructions let make it comprehendible. 

A few shots after uploading Oripa files yields some XML output whereby you can adjust existing coordinates, but most outputs are not foldable, let alone coherent.  Even with multiple prompts, XML uploads, none of the image generation of instructions are coherent.  The only images that are feasible are shapes or animals in the style of an origami fold.

Hackathon Results

Our hackathon proved (without true evals or loss function) that naive implementation of fine-tuning could augment these models with new knowledge. We took the sample of Oripa files as defined output; then we generated input data from Exa and Perplexity and fine-tuned it. If we are able to create true evaluation and loss functions, it would likely be a good basis for a paper worth submitting to conferences.* 

We also attempted to introduce Origami algebraic notation. To the best of our understanding, there is no international algebraic notation standard for origami folding. We cannot emphasize more the importance to establish algebraic notation for origami. 

Kelvin Origami Algebraic Notation (KOAN)

Our proposal is as follows:

If origami experts worldwide are able to adopt algebraic notation similar to that used in Chess, and these experts are willing to participate in RLHF notation, we can then truly make progress to solve Generative Origami. 

Implications

There are great MIT courses sharing the implications on self-folding robots, bending sheet metal, to protein fold, but solving Generative Origami could solve other fringe problems that require visual reasoning, such as making crochet, basketweaving, pottery, etc.

Economic Models

Most origami books are bounded by 75 years of copyright. Arguably, the editing or curation of the set of origami is probably most protected since origami started in the 7th century and has been passed on from generation to generation, culture to culture. The majority of modern origami designs today can be folded in a video without violating copyright.

As mentioned earlier, the majority of novel breakthrough origami are published in regional Origami newsletters, and it seems like the most advanced inventions are proudly shared in a video or picture, but the method is kept in personal archives or personal knowhow.

From our review, the most advanced origami folds are taught by video, with little notation or guidance. Ultimately, we propose that the true funding of this project could be fueled by crytpo enthusiast, such that there is a ledger whereby those that contribute a design should be rewarded. And those that provide a helpful github push variance of one elephant design could be algorithmically determined.

Purpose of Origami

Ultimately, the goal is to provide widespread enjoyment of origami, which includes learning, teaching, and inventing. Without doubt, origami is a good medium to teach mathematical or computer science concepts.

It is also amazing to see the accomplishment made from 1990s to 2020s with the arrival of comptuer aided origami design, mostly led by Robert Lang. But in this decade, we shall see how far Generative Artificial Intelligence can further the art of Origami. 

Art, Hobby, Creativity

Regardless of my advocacy of using algebraic notation for origami, enthusiasts will still continue to generate web pages, blog posts, and video tutorials of origami. It is likely this volume of content will exceed that of books.

However, relying on this data set alone may not lead to meaningful gains. A lot of effort will be needed to structure the data and make the data coherent. Furthermore, determining angles, vertexes, and folds of an image, let alone multiple frames of image within a video, is  approximate at best, and will lag the accuracy of true algebraic notation.

 Algebraic notation is text, with logical meaning, even semantic embeddings, that would fit well with foundation model training. With this progress, we hope to further origami art, origami hobby, and origami creativity.


* If there are any volunteers, please write to me as we grow our hobbyist team to ship Generative Origami.