Have you ever seen a snippet of code, and not known what it does? Or wrote some code on paper and wanted to run it without retyping it? CodeSnap is here!
Just take a picture, and upload it to our website! Our automated algorithm will parse the image, display the code's output, and give you a short summary of how the code works. Currently, CodeSnap accepts Java code that is to be placed in the main method - all code taken in the picture must compile correctly if inserted directly into the main method with no external libraries.
We used a variety of software to develop this application. The website was built with Flask for the backend and vanilla JS for the frontend. We designed the website in Figma then coded it using bootstrap. As for the actual CodeSnap algorithm, it was largely programmed in a mix of Python and Bash scripting. We used Python, interfaced with the Google Cloud Vision API to read text from the input image using NLP artificial intelligence models. To overcome the drawbacks of these OCR models, we employed syntactical correction algorithms that fix pedantic syntax errors in the code (e.g. missing semicolons, mismatched curly braces); however, we intentionally left runtime errors unresolved for some of the applications listed below. The output json file is then automatically processed into a Java file, again using Python. An API call is then made to ChatGPT, prompting a summary of the Java code. The final component used Bash to compile and execute the java file. A bash script was also created to simplify this pipeline into a single executable file. A Flask API server was then made in Python, which accepts requests to the /process-file subpage and passes it to the executable shell script. This API is currently hosted on Google Cloud and can be interacted with via a JSON request to https://devfest2023-je62xqayyq-ue.a.run.app/process-file.
We ran into several obstacles throughout the hacking period. We explored many different models for OCR, but found that in general they are rather poor for transcribing code; this makes sense given that these models were trained on human languages, thus syntax like a lone curly brace appears unusual to the model -> it is more likely to read it as comma or period, for example. Thus, we had to develop syntax correction algorithms to ensure the code compiled in spite of some minor issues. Another major challenge was connecting the front-end and back-end. Once we had both ends of production working by themselves, we found that integrating the two was its own even more daunting problem. We ran into countless issues, including CORS errors, finding a suitable deployment service, hosting, and sending/receiving large base64-encoded data. Nevertheless, we persevered and eventually came up with unique solutions that made this project possible.
We are proud of our resiliency, persisting through all the challenges we faced. Given the short time allotted, we take pride in our ability to work together as a team and develop a software with real world applications.
Over the course of this project, we acquired numerous soft and hard skills, including teamwork, full stack development, optical character recognition, and unix programming.
In this first round of development, we demonstrated the promising potential of CodeSnap, as it can read handwritten (or typed) Java main-method code and produce it's output. In the future, we plan to expand this to interpreting entire class structures in Java, as well as parsing other languages including C, C++, Python, R, and Bash. We are most excited about developing OCR for Python and Bash, which are indent sensitive, posing a far more difficult challenge.
CodeSnap © . All rights reserved. Powered by Bootstrap.