Vibe Code Awareness Month

Disclaimer: At Tonic, our engineers are committed to staying current with the latest advancements in the tech industry, ensuring we deliver the best solutions for our clients. If you're considering integrating AI into your application or platform and want to better understand our approach, please reach out to learn more about our governance process and how to responsibly implement AI. The following article is an opinion-piece based on an engineer’s ventures in their personal projects. [Dramatic music] This is his story.

July is Vibe Code Awareness Month, or so I've told myself.

July is the month I give into the vibes, let the vibes flow through me. I'm optimistic. I have hopes and dreams. I think maybe this time AI will be different. This time AI won't hurt me. It's ok, the hype on the internet is real. (And the internet wouldn't lie to me.) Hopes and dreams.

The task I've set myself is simple: For all my personal-project coding and dabbling this month I will use an LLM. I've tried my best to live up to this standard, but sometimes I can't help myself and I just have to change a variable here, maybe move a line down there. Maybe refactor that function. But I'm doing my best.

I have Github's Copilot and that'll be the lens through which most of my AI assisted coding is focused. I've played around with the different models offered by Copilot, but it seems that ol' reliable is Claude Sonnet 3.7 so (unless a new model is released before the end of July) assume that's the LLM I'll be using for this experiment.

Now, a little background on me: I'm rounding the corner on my third trip around a decade sized software development block (ahem, over 20 years of software development). I got my first taste of web development as a wee lad with Microsoft FrontPage. FrontPage was a gateway to experimenting with table layouts in Dreamweaver, and in college I got hooked on the party stuff, getting my first taste of Ruby on Rails. I knew I needed help when I lost ten hours to a Lua bender trying to get my Neovim config just so.

Needless to say, I've got some experience with the interwebz.

Now for my thoughts and feelings about my personally sanctioned Vibe Code Awareness month: The meat and potatoes, if you will (and I will). I've got opinions, and I'm going to share 'em. I'm going to break my experience into two different sections, the first focused on my experience with using AI in legacy codebase personal applications, the second on greenfield projects.

Legacy

To set the scene, this specific project is going on eight years of continuous development. Any proper legacy codebase is going to be full of unreadable garbage, but it's probably going to be unreadable garbage that you yourself wrote three years ago. (First rule of legacy codebases, don't git blame legacy codebases). And that's ok. The second rule of legacy codebases is that all the code probably props up an important task and any unintended change will turn into a barrage of bugs from parts of the application you didn't remember existed. So it's best to tread lightly.

And that leads me to my first lesson.

The first thing I learned by flipping the Vibe code switch, cranking it to 11, and turning it loose was that it doesn't tread lightly. As a generative model, LLMs have a bias towards generating new code. When using agentic mode with it defaulted to 'write', it tends to make a lot of changes rather quickly. If you're not paying close attention, unintended code might make its way into your changes. My recommendation with vibe coding legacy: ask first, ask again, refine your prompt, ask for it a final time, and then finally flip the switch to 'write' and watch the magic happen.

Using agentic mode and letting the LLMs grep your codebase and make changes can lead to a host of understanding problems. Both on my end and its end. Reviewing code changes is hard and copiloting with an LLM means your new responsibility is full time code reviewer.

When you ask an LLM why it made a change it LLMs a post-hoc rationale for why it did what it did (it can't just be all like "because statistics dictate that this token comes after that token comes after that token ...", you get it). So when an LLM writes code, you have to be the one who reviews it and you have to be the one who chooses a reason for it coming into being. You, as the engineer, need to ensure the change makes sense in the context in which it's being written so we're not overburdening the legacy code and functionality is maintained. And that is often just as difficult a task as writing the code in the first place.

So with every change the LLM makes you have a decision: understand on its behalf or lazily accept the code as is and move on.

… Ok, "lazily" might be a bit harsh.

But we need to talk about the output of these incredible models because hot-damn, they can be insanely good sometimes. This past month has shown me that they are really good at transforming code from A to B. Got an old class based React component? Make it functional in minutes! Got a bunch of CSS randomly scattered across files? Centralize it with a few tippy-taps of the keyboard in the prompt. Got an old Java app that would actually get maintained if it were written in Python? Vibe code it until the integration test suite passes.

I've found it’s rather useful to treat the LLM like a pair programming intern buddy. In the prompt, switch it over to 'ask' (sometimes called 'plan') mode and over communicate. Spending a little extra time being meticulous with your prompts can be helpful in guiding it to make just the right change. Asking it to research a broad topic in the codebase and return files and references for me to review is a good place to start. Then ask, based on that context, to dive in and actually make a change that I want made.

I like to leave it in ask mode and have it come up with a to-do list to make the change, show me where in the code it would make the change, review it, then have it write. This way, I get two chances to review. I know, I know, but Claude Code does this by default! Don’t trust it! I’ve been bitten multiple times by Claude creating its own plan and then changing it half way through implementation. I’ve found it works better to have it spell out a plan for you in one prompt, then explicitly tell it to execute on that result in the next. After a month, you'd think I'd be cool with just one-shotting it, but I have some trust issues and they are not totally unfounded.

One time I challenged Claude to convert an entire React class component to a functional one. There was a function in the class component that wasn't used anywhere (classic legacy codebase!) and should have been deleted. Instead, the LLM decided to keep it and call the function in a new bit of code it added that caused a weird rendering bug. That was a whole hour lost trying to sort it out.

Another recurring issue is it has no handle on code style guides, it wings it every time, sometimes mixing and matching styles. "But wait", you're thinking, "this could just be fixed with .rules files!" And sure, we totally could, but .rules files are practically documentation and we know how devs are with documentation, even in regards to personal projects. (For any PM reading this article, devs are exceptional at writing documentation, we write the best documentation, just magnificent documentation, we never forget to update our documentation either.) And also, .rules files can’t capture some of the nuance of style (vibes maybe?) that a legacy code base evolves into. If you’ve spent enough time in a legacy codebase, you know what I mean. Old conventions that were normal 7 years ago that probably should be maintained even if they are a bit odd by today's standards.

Another thing we need to discuss: Mo' code, mo' problems. When you decide to write code you're also creating an obligation to yourself to take some level of responsibility for what thou hath wrought into the world. (Or at least, you should take some responsibility.) The more code you write, the more code you have to maintain. With LLMs, I'm seeing a future that could teeter to over-burdened and burnt-out devops because of the speed in which you can churn out code. And just because we can create more doesn't mean we should.

As LLMs decrease the barrier to entry on writing new code, we need to be even more vigilant in discerning what should be added to the codebase.

Fields of Green

I like to putz around my terminal, pushing pixels here and there, writing the occasional Lua script, maybe building a quick Rails project to scratch an itch. This month, I took one of my new little side projects and tried to vibe code a clone. This exercise has been eye opening, to say the least. I might describe it as a bit of an existential crisis, even?

I have a small project management tool that I use to manage certain parts of my life. I have a bit of a weird obsession with managing tasks, see, I've been burdened with an inability to keep focus most of the time until I just can't stop keeping focus the rest of the time. If I don't write down my personal to-dos in a bullet journal, keep a list of work to-dos in a custom built app, keep all my writing and notes in yet another custom app, and manage all this broadly with another custom built project management app, then I'll get nothing done and will inevitably be found six hours deep into a YouTube rabbit hole learning about niche cultural groups or maybe I'll be found on a bike somewhere pedaling away from my responsibilities.

Anywho.

I built a custom project management application in Ruby on Rails that's got projects and tasks and to-dos and organizes in a way that I find cool and pleasing. Everything is right where I put it and I find it a joy to use. I've maybe dumped 30-40 hours into it and I'm very proud of what I've created. This past weekend, I spent about four hours vibe coding my way to a decently functional clone. And, holy shit, I'm scared for my job.

Let me explain. I spent a few minutes writing out a project document that outlines the data model that I wanted implemented, the tools that it should be restricted to (I don't want it pulling random gems), and some features that it should have. I then told it to read the document, use a document called todo.md to keep itself organized and fully turned it loose.

Vibezzz cranked to over 9,000. In about 15 minutes it had run a bunch of rails generators, built out the data model, come up with a full CSS style based on 80's retro unix terminals, thrown together a signup, written about half the views and even started up the rails server so I could take a gander at what it had created. It was far from perfect, but in 15 minutes it had done what usually takes me an hour of hammering out rails commands in the terminal.

I realized it hadn't built out all the views for some reason, I threw another prompt in there and in about 2 minutes it had. I then spent the next three hours just reviewing the UI and throwing quick prompts into Copilot (aside: Zed’s AI extension powered by Copilot and is super sick) with bugs and it was just fixing them. Honestly, the biggest problem I ran into was being rate limited by Copilot.

Oh, rate limiting and building an application with no regard for general security concerns. So two problems. Oh, and LLMs don't seem to care about code reuse?

So rate limiting, security issues, and it's not DRY. Also it doesn't seem to care about doing things the "new" way, i.e., it wouldn't do things the Rails 8 way, it was constantly using old patterns or creating new patterns that it then didn't apply, even when I fed it documentation on Rails 8 updates. So we're being rate limited, have no security, nothing is DRY, and we keep seeing outdated or unused patterns.

So nothing is perfect. But the app itself worked surprisingly well given how little time I put into it. You could login, view a list of projects, invite people to those projects, assign tasks and to-dos to different users, export all your data, delete all your data, logout. It had the idea of a current_user and for the most part, it was using it correctly.

That was, until it wasn't.

This was where a lot of the security issues stemmed from, having no other guidance, it just assumed all data should be available to everyone logged in. Was this an issue with me not setting up my prompts correctly? Probably. But if you didn't have some basic knowledge of securing data in a web application, you might not have thought to test that out and improve your prompt. Sure, I then prompted it to better secure those endpoints, but it required me knowing something about security to get it to that state. It also liked to write insane view files, it's like the wild west in there, not a single partial rendered, everything a unique snowflake. The controllers and models got a bit better, generally keeping things tight without too much cruft. It didn't seem to have any knowledge of updates to Hotwire and continued to write things like it was Rails 6. While this worked, it bothered me. But my general take away from the code was that it was, sigh, fine. Just fine.

Now, an afternoon vibe coding my way to a cool demo (not even MVP) is pretty impressive. But it does leave me worried about a few things.

First, what if these LLMs get much, much better? Second, what if these things don't get any better?

Conclusion

If these things get much, much better, I'm out of a job. These things are impressive and very useful. Today. What if we're just starting up the LLM S curve? What if LLMs are only 20 or 50% as good as they're going to get? You just slap a good prompt in there, give it full root access to your machine, tell it to deploy it out to the world, and profit. No need for any more software engineers, just turn the computer loose and it'll be better than most engineering teams can accomplish. If you can dream it, you can have it deployed into the cloud in a matter of minutes. Maybe with all the shareholder value created, our LLM overlords will see fit to distribute the wealth among us peasants? Maybe.

Counter argument. If these things don't get any better (or much better), what if we're at the top of the technology S curve? I'm worried about the mediocre AI slop finding its way into our software applications, the applications and algorithms that we often depend on that shape our very lives. If this is as good as it gets, then the burden of ensuring this slop doesn't find its way into our most important systems falls on us as software engineers, developers, and programmers.

All the more reason to double check your AI generated code, make sure you can get other folks' eyes on AI generated PRs, and all the more reason to have a deep understanding and knowledge of the code being deployed to your systems.

Innovative technology is still a team sport, one that requires people of different skillsets and knowledge bases to play on.

Because at the end of the day, the code that blames to me is my responsibility regardless of if I wrote it or not.