Origami AI Revisited
In early 2023, I presented my idea for a roadmap of using AI in origami, so I thought I’d revisit the topic briefly now and see what has changed during the last year.
One thing we can clearly see is how quick the progress is, especially in the hot field of generative AI. When I first tested origami-themed image generation less than two years ago, the images I got were of really poor quality most of the time, and usually resembled actual origami little. Today, state-of-the-art models can generate quite convincing models, some of which could fool even a trained eye. Given that origami is just a narrow and very specific field, and of little commercial interest, it is not the focus of these models, and thus the possibilities would be much greater if a model was trained on origami specifically.
Overall, I think I would not change much in the roadmap I laid out in the above-mentioned post. Machine Learning models improve quickly, but the ways I see they could be used in origami are mostly the same as before. However, tools, both those used for applying AI models and those used for building them, have improved, which probably makes creating origami-specific models more approachable than before. One issue that often arises when trying to train a model for a specific task, and which I mentioned as a potential roadblock, is the amount of data needed for supervised learning (which happens to be necessary for many practical applications). Meanwhile, there are companies that you can outsource data labeling to. This costs money, of course, but it means that one could build a dataset for origami-related AI much quicker than a single developer ever could. Another possibility would be setting up a data labeling platform and asking the origami community to volunteer the time needed to label the data set. Both options mean that labeling even a large set of data needed for building an origami-specific AI model seems to me much more feasible than it did a year ago.
On the other hand, the high computational costs of generative AI have driven most providers to introducing paid subscriptions for their state-of-the-art models, so getting access to the really powerful tools for free seems to be no longer possible. There are, of course, also models you can self-host, but while also powerful, they are often inferior to the most capable ones, due in part to the necessity to limit model size in order to fit it into the memories typically available on consumer-grade GPUs.
Another factor which I see gaining prominence is the possibility to fine-tune existing models. As of early 2024, many commercial AI model providers make it possible, and this opens new possibilities for creating domain-specific (e.g. origami-oriented) models within a reasonable budget. While fine-tuning has been known much earlier, its broad availability in commercial offerings makes many tasks more feasible in practice.
As AI tools become more powerful, so do the controversies surrounding their use. Many things that were predicted as a theoretical possibility, are becoming practical and cheap quite fast. These include people losing their jobs due to automation and deepfakes being used for political propaganda, or “just” for personalized scams. The dispute over the use of copyrighted materials in Machine Learning is gaining steam, and first attempts at regulating this issue are taking place.
On a lighter note, Sora, a new video generation model released just two days ago, has used a video involving origami airplanes for their main website banner. The airplanes are oddly-shaped, but the demonstration is still impressive.
Revisiting the roadmap for AI in origami, I considered what interesting thing I would do today in this field if I had a little more time, and came up with the following idea: I’d build an origami model recognizer which could provide the name and author of a model given a model’s image. This task is completely feasible, and would provide actual value for the origami community, similarly to what Spot the Creator Facebook group provides. Of course, no system is perfect, so I think we’d still have interesting discussions among human experts for the more difficult cases, but for common models I think the success rate could be quite decent.
The fact such a project is feasible is shown by the existence of Brickognize, a site that recognizes the exact model of a Lego brick given a picture. I happen to have been work colleagues with the author, so I know that while making the project work well required significant effort, it was possible in his spare time, and would be easier today since the tooling available is more advanced.
Most of the effort in such a project boils down to preparing the right set of training data. For an origami recognizer, I can quickly think of several ways of getting such data:
- Gilad’s Origami Database contains thousands of origami models, usually including the name, designer, and at least one image
- Origami groups on flickr are a huge repository of origami images. While the data is not as neatly labeled as in Origami Database, image descriptions usually contain the necessary data, which would only have to be extracted. This would require some human input which could be outsourced to a commercial data labeling company, or crowdsourced to the origami community. To make things easier, I expect that one could use an off-the-shelf algorithm or a generic LLM with the appropriate prompt to roughly preprocess the data, reducing the amount of human work required.
- Posts with the hashtag
#origami
extracted from Instagram via the API could play a similar role.
There are caveats, though: copyright and terms of service. I am not a lawyer, so this would require a better analysis. Certainly, building an origami model recognizer should have the goal of helping the origami community rather than upsetting its members.
As for copyrights, the case for a search engine (and this is what we’re building here) seems much more clear than for e.g. generating images in a particular artist’s style. Since search engines have existed for many years, it seems the rules are mostly clear, and there is much less controversy about a search engine being able to find a piece of information than about generative AI generating new images based on copyrighted material. As for flick and instagram terms of service, one would have to check whether such use would be permitted. Note that many images on flickr are licensed under Creative Commons licences which should make it much easier to identify what kind of use is allowed. Regarding Origami Database, we’d probably have to ask Gilad since the site lacks a clear terms of service page.
This was my quick update on AI in origami. Given how fast things move, I’ll probably be revisiting this topic more in the future.
Comments