✨ Introducing Genie, an AI software engineering model
All Articles
✦

How Mythbusters helped us scale our RAG.

Engineering LLMs
April 4, 2023   —
Author
Alistair Pullen Co-founder & CEO @AlistairPullen

When we first started working on Buildt late last year YCombinator (YC) made it very clear what we had to do first: build an MVP, no matter how bad, and get it out there.

For us as a team, making a scrappy MVP was no trouble at all – we’ve been doing it for years, although the product we were building was much more difficult to turn around in such a short period of time. As a result, we designed everything with YC’s mantra in mind: “Build things that don’t scale”. Fortunately, or unfortunately, depending on the way you look at it, this approach didn’t last long.

When we first launched our MVP in January we had an overwhelming reaction to our product, the first Tweet we did had over 200k views within a few hours and the signups to our extension spiked, we were processing over 250m OpenAI tokens per day (which we couldn’t afford) and that’s when we first realised that we would actually have to introduce some scalability far earlier than we anticipated.

During that manic period we also had a great deal of interest from larger organizations, CTOs, Engineering Managers and PMs were reaching out to give feedback, ask for features and onboard Buildt into their companies. It was clear at this point we had to change our engineering focus to support these larger codebases and the demand we were getting.

Pulley’s CTO reached out to me personally on Twitter saying that he was keen to use Buildt within the organisation, but couldn’t get it to work. After closer inspection it was clear that their codebase was an order of magnitude larger than anything we’d indexed before and that we had to rethink a great deal of our product architecture. What follows is how we went about making the product work for Pulley, and how those changes allow for further enterprise expansion.

Novel solutions to index large codebases

Building a product like Buildt is far more technically challenging than we ever imagined. On the surface the problem sounds relatively simple: create some kind of vector index for a codebase, and perform searches on it. However, when you apply further constraints to the problem such as high accuracy, high reliability, low latency, and privacy-centred the problem becomes far more complex. We made the conscious engineering decision of storing the vector index on the end user’s device, rather than in the cloud to maximise privacy and reduce latency. This decision alone has created a huge amount of work to get the product working on very large codebases, as the performance of the end user’s device becomes a bottleneck.

When we started indexing enterprise scale codebases these challenges compounded, and the sheer number of vectors we were creating became difficult to store using our initial vector database so it became clear we had to change. A further difficulty to add to our woes was that because we are currently a VS Code extension we have to ensure that our product works across the board with no outside dependencies. Therefore we can’t simply use a local vector database such as Chroma, Weaviate or Redis because they all rely on a Docker container running in the background, and we didn’t want Docker to be a dependency for Buildt to run – we wanted a simple plug and play solution where it’d just work no matter how large your codebase whilst staying in the Node ecosystem.

Our initial solution was to use the library hnswlib-node which is a high-performance vector index written in C++ with Node bindings, however, this brought an additional problem: architectural and platform dependency. Because this library is effectively a wrapper for the complied C++ code whenever you install it on a machine it dynamically chooses which version to install given your CPU’s architecture, however, because VS Code extensions are Webpacked in CI the architecture of hnswlib-node will be that of whatever architecture the CI is running on, rather than that of the client machine.

When we realised this we were a little disheartened, all of this in the name of privacy. It would have been so much easier from an engineering perspective to simply store all of these vectors in a pinecone instance and call it a day, but we’d made it this far. It became clear we had two options: compile a different variant of the VS Code extension for every CPU architecture out there, or come up with an architecture-agnostic solution. As co-founders, we actually had a difference of opinion here where our CTO, who has an Android background was happy with having architecture-specific variants of the product as that is normal in the Android ecosystem, however, I hated the idea of having so many versions of the same extension because the surface area for further bugs was so much higher in my mind, especially concerning reproducibility.

In order to solve this problem we took an approach I remember seeing on the show ‘Mythbusters’ on the Discovery channel in my childhood; when the co-presenters had a difference of opinion on how something should be built, they would spend a couple of days working on a prototype of their solution and then they would compare the solutions. We did the same, meaning I had to somehow make this library architecture agnostic. My initial instinct was to use Web Assembly (WASM); we already use WASM in the project and it brings the added benefit of running in the browser (which I’m sure our angel investor Amjad Masad of Replit will be delighted about) – and it’s architecture agnostic. This ended up being the solution we went with in the end as it means we have a singular version of the extension we can ship to all CPU architectures and OS’s. We’re going to open-source our WASM version of the hnswlib-node library later in the week as it may be useful for others who want a lightweight local vector DB with no external dependencies or architectural ties.

This was just one of the many challenges we faced; parsing such large codebases also had to be rethought, we were doing much of this in a single pass, which meant memory usage became a problem. Chunking the problem helped but further optimisation was also required, we had to ensure that the backend was capable of serving multiple concurrent HTTPS streams of embeddings traffic, some of which had to stay open for far longer than streams normally do due to the size of the codebases involved. There are still some significant hurdles we have to overcome but so far we have made huge progress over the past weeks. Pulley now has Buildt and is providing valuable feedback as to their use cases, and we intend to onboard more companies of a similar size in the near future.

Our next upgrade

There are still so many things we need to work on, one of which is better AST traversal. Currently, our extraction of snippets from the codebase is done on an AST level, meaning we have to write an implementation for each programming language we support, however, we’re going to move away from this and try to produce a much more language-agnostic solution so that we a) don’t miss anything in your codebase, and b) support more languages out of the box.

We also intend to bring private cloud to enterprises, this would allow them to have a private shared repository of their embeddings remotely which will make it much easier for teams of engineers to work from the same corpus, rather than having duplicated embeddings on each of their machines, and finally, we’re going to start utilising Open Source embeddings meaning we can provide an extra layer of security by self-hosting as well as providing an on-prem solution that has already been requested by some of our enterprise partners.

Genie is the highest scoring software engineering model