By: abhijoshi
Date: Sept. 12, 2024, 10:30 a.m.
Tags: hackathon
I have always been curious of the power of Artificial Intelligence and Machine Learning. Back in 2020 I completed a course online by MIT, MIT 6.86x Machine Learning with Python. Aside from being one of the best course I have ever done (I have written about this on another occasion), it really delved deep into what machine learning actually was, and how the field has moved from things like support vector machines, which are essentially a single node neural network, to the more complex architectures we see today. This foundational understanding meant I was well equipped to understand machine learning, but it also numbed me to the applications a little bit, as I thought ML was something that was good at doing relatively simple things like recommendation, sorting, etc. Basically, that it is good at doing things that are mundane for humans, not necessarily creative things. However with the recent boom in generative AI, and the seemingly magical and original creations from things like ChatGPT, DALL-E etc, I was once again curious on how I could learn to make use of the newest advancements in the technology.
The Encode Club AI Hackathon therefore seemed like the perfect opportunity to explore my curiosity then. The focus of the hack was quite broad, with bounties to cover a large range of projects ranging from "adding value to the gaming and entertainment industry" to "Blockchain powered AI apps". However all apps had a to somehow infuse the power of Artificial Intelligence to solve the particular problem at hand.
The idea of a hackathon is intimidating. Creating something from scratch, no matter what the timeline always seems like a daunting task, but even more so when the timeline is 2 days. Not to mention that this was my first hackathon ever! I was really nervous. However, as soon as I got to the venue and met my team (and met my teammates) I was completely at home. Talking with my team reminded me why I wanted to join the hack in the first place, to meet like minded people and create something innovative and fun together. Which is exactly what we did!
I had an idea to incorporate one of our sponsor's, Stability.ai's, technology into a fun game. What we aimed to do was create a game revolving around the idea of image generation using prompts. The simple idea would be to give the player an image, say one of a apple sitting on a table. The player would then need to recreate the image by developing a prompt that would generate an image using AI. We would then calculate the similarity between images and allow the user up to 3-5 tries to improve their score. The idea was simple, and fun and we went off trying to execute.
I was chiefly in charge here of the image similarity algorithm. Which seemed simple to me on the surface, but as soon as I got started, the challenge was vast an deep. Image similarity detection is apparently a hard problem, and it has not actually been accurately solved. Approaches have evolved from trying to detect the similarity in the actual data the image encodes, i.e. SSIM, or we move on to trying to detect features of a target image, and see how close these features are to another image, SIFT. SIFT is a technique that approaches computer vision and proper machine learning, as it tries to identify which features uniquely characterize an image, and checks if these features are sufficiently present in another target image. The final approach would be to try to identify/generate a prompt from the image and then do a more standard text based comparison. This would involve a model that first is able to create a vector of words describing the image, and a secondary model which then can compare those vectors to determine similarity/similar sentiment (we don't even have to compare sentiment, we can simply try to detect if the similar keywords are present in both image vectors). Unfortunately, this approach seemed difficult to achieve in the timeframe of the hack, and so a combination of the SSIM and SIFT algorithms were used. This would allow us to detect if the general shapes in the images were similar, due to the SSIM algorithm, and then the actual features of the image would be compared using SIFT.
In general, the algorithm developed was middling. It tended to detect if there were major differences between images provided, however smaller details were lost at times making the algorithm not very effective for detecting similarity when the image tended to be of the correct object. Furthermore, I do not believe colour was taken into account at all. Which is a major factor in determining the similarity of images in our case.
Another major challenge was to be able to integrate my Python based code with the wider code base which was written in Typescript and deployed using node. I decided that the most straightforward way for me to make this integration would be through exposing a REST API that could be queried for the score of an image. I used the Python library Fast API to accomplish this. It was the first time I had worked with it, but it allowed me to quickly make a simple API that expose a POST endpoint, where the Node.js based backend could send a query of two images that needed to be compared (base64 encoded strings), and we I would be able to run my Python algorithm and send back a score. The whole system worked really well. I containerized both the Python code and the Node.js server, ran them as Docker containers on the same machine, and things worked like a charm. Plus, it allowed for a clean separation of the different functions of the app, and also almost replicated a "microservice" where one container was only responsible for one thing, and at the end of the day, could be scaled up or down depending on load and usage requirements.
Finally to end off the hack, I helped in deploying the whole project to make it live. We deployed using a simple Docker based architecture, having a pods for Frontend, Backend and Python. In the furture, this setup could easily be transferred to k8s to make it more resilient and production ready, not to mention easy to maintain.
Overall I enjoyed myself throughly and would definitely do something like this again. The environment of the hackathon was amazing, a competetive and innovative environment, where ideas were flowing and things were being built. Apart from hard skills in development, I also learned a lot about how to collaborate with team members to deliver on a project in a short amount of time, as well meeting lifelong friends and collaborators that think like me!
Thank you Encode club for organizing the hackathon, and thanks to my teammates for the amazing time working together to build something really cool!
The codebase for the app can be found here: Picasolana