Coding With AI Feels Like Magic—Until It Doesn’t

Assistants, Not Overlords: How AI Is Really Changing Developer Work

Apr 19, 2025

I probably don’t need to introduce AI to anyone at this point. Every other day, there’s a new AI tool making headlines, another version of a large language model (LLM) being released, and endless conversations about how AI is supposedly taking everyone’s job.

The pace of advancement in AI is wild - and whether you love it or hate it, it’s something you simply can’t ignore. (While I was working on this post, OpenAI released 5 (FIVE!) models 😀 and deprecated one - GPT4.1 family, o3 and o4-mini, and deprecated 4.5-preview )

Some people are all-in: “AI all the things!” Others remain skeptical: they tried a tool once (maybe ages ago), it didn’t work well, and they wrote off AI as overhyped garbage. But the truth is probably somewhere in between. Blindly embracing AI is irresponsible. Pretending it doesn't matter is a mistake.

As a developer, I’ll admit it’s a little scary to see what AI is already capable of. But it’s also exciting. Used wisely, these tools can seriously level up how we work and what we can build.

Will AI take our jobs? Maybe not (just yet?). But the person who knows how to use AI effectively? They might. Or the company that enables its team to harness AI tools? They could easily outpace the one that doesn’t.

In this post, I want to explore the current AI landscape - what’s out there, what’s working (and what’s not), and how you can start using AI to your advantage. I’ll also share some real examples from my own experience.

Where AI Falls Short

Finishing the work is still up to you

There was this tweet I saw a few months back that really stuck with me (because I have experienced the same). And honestly, it still holds true.

A lot has changed in just a few months (I'd argue we're closer to 75/25 now), but the core idea hasn’t shifted.

AI can take you pretty far, sometimes even with a single prompt. You might get a full draft of code, a blog post, a marketing plan, whatever. But that last 25%? That’s where things start to fall apart. Bugs pop up. The AI rewrites sections it shouldn’t touch. It gets stuck in loops trying to "fix" something and makes things worse. And of course, it’s still on you to review, maintain, and deploy the result.

My AI Moment:
I was building a game where ChatGPT had to detect collisions between moving squares in JavaScript. When a collision happened, the squares were supposed to stop moving. It did detect the collision, but couldn’t get them to stop. It wrote an overly complicated for loop for collision detection that never exited at the right time. After several rounds of trying to explain what I wanted, I gave up.
The Workaround:
Claude nailed the fix on the first try.

AI is not perfect, and the remaining part is the most frustrating.

AI Forgets Things

LLMs rely on a concept called the context window - basically, the amount of text (in tokens) they can remember at once. If your input exceeds that limit, things start to get fuzzy… In coding tasks, if your whole codebase can’t fit into the context window, the AI won’t understand how everything fits together, and its responses will reflect that.

The result? Incomplete suggestions, inconsistent logic, or solutions that don’t actually work with the rest of your code.

My AI Moment:
I was working on a report with ChatGPT, going back and forth with specific formatting and content requirements. At first, it followed them pretty well. But after a while, it started ignoring the earlier instructions and only focused on the newer ones - like it had selective memory.
The Workaround:
I copied the last good version of the report into a new chat window and continued my fine-tuning there. Fresh context, clean slate. And it worked.

The increased context window leads to increased accuracy, fewer hallucinations, and more coherent and longer responses and conversations. But of course, a longer context window also has trade-offs - increased memory usage, slower responses, and too much unrelated data in the context can affect the output.

Hallucinations

Sometimes AI just... makes stuff up. And it does it confidently.

You ask for a citation, it gives you one… with a legit-looking author and journal that don’t exist. You ask for a solution, and it invents an API or method that has never been real. This is known as hallucination, and it’s one of the most dangerous parts of working with AI.

Especially if you’re not already an expert in the topic you’re asking about.

My AI Moment:
I was asking ChatGPT some pretty specific Swift questions. It came back with clean, well-structured code - but the kicker? It used Apple APIs that never existed. Totally made-up functions, wrapped in confidence.
The Workaround:
Sometimes there isn’t one. You just have to think and find another solution. 😄

Undeterministic Output

Ask the same question multiple times - you’ll likely get different answers. That’s because most LLMs are non-deterministic by default. While variety can be great for creative tasks, it’s a headache when you need consistent results, like in coding or debugging.

If you expect repeatable outcomes, you’ll need to apply extra layers: prompt engineering, temperature control, or even external validation steps to make things reliable.

Overdoing It

Ever ask an AI to change one thing, only to get a completely rewritten version of your text or code? Yup. Been there.

This is where AI’s eagerness becomes a liability. It tries to be helpful, but often over-corrects or reinterprets your request in ways you didn’t ask for. You’ll need to build defensive strategies—like validating outputs against a schema, running prompts multiple times and comparing results, or even using another AI to double-check the first one.

My AI Moment:
I was testing different LLMs for translation tasks. Some didn’t support the language at all and just bailed. Others gave translations, but they were wrong. A few actually got the translation right… but then decided to explain why they made certain choices, adding commentary I never asked for.
The Workaround:
Be crystal clear in your prompt: tell the LLM to only output the translation and skip the extra fluff. And of course, always validate the result.

The Models Are Changing (Even You Notice It)

Even with a well-defined prompt, online LLMs can shift underneath you. Suddenly, that rock-solid output you were getting yesterday? It's weird, broken, or just different today.

My AI Moment:
I was using the ChatGPT API—probably an older, cheaper model like GPT‑3—for a simple find‑and‑replace task in text. For several days, it worked flawlessly. Then, out of nowhere, it started to fail: either it only replaced the first few instances, or it modified unrelated parts of the text. No warning, no version bump—just a silent model update that broke everything.
The Workaround:
Always validate the output. If possible, lock your code to a specific model version.

Always be defensive: monitor for regressions, and assume that even reliable prompts might break when the model evolves or gets replaced. Just because it worked yesterday doesn’t mean it will tomorrow.

It Doesn’t Always Understand the Task

Sometimes the issue isn’t the AI, it’s us. We ask vague, overly complex, or poorly defined questions, and then we’re surprised when the result doesn’t meet expectations. The AI may latch onto part of the request and ignore the rest, leaving you with a half-baked solution or something completely off-track.

The fix? Break down your tasks. Decompose large problems into smaller, sequential steps. Give the AI clear, focused instructions. And when possible, have it validate its own output, for example:

“You received 5 items to process. Does your response include all 5?”

Where AI does magic

It’s easy to focus on where AI falls short, but let’s be real: sometimes it feels like straight‑up magic. When it works well, it really work - and for engineering teams, the impact is already huge. Below, we’ll walk through some examples.

Ask AI for Help (or for Search)

When you’re stuck on a pesky bug, need a regex refresher, or want a second pair of eyes on your SQL, typing a quick prompt into ChatGPT can save you minutes or hours.

My AI Moment:
When tackling programming questions, I've found that AI assistants can provide answers much faster than traditional search methods. Tools like Perplexity offer concise, real-time responses with source citations, streamlining the information-gathering process. For more in-depth research, platforms like Claude or Gemini Deep Research can delve into topics and generate comprehensive reports or studies.

Caution (Hallucinations):
AI can confidently propose incorrect fixes. Always review generated snippets before merging into your codebase.

AI in the IDE

Tools like GitHub Copilot, Gemini in Android Studio, Continue with local and AWS Bedrock-hosted LLMs, or Sourcegraph Cody feel like autocomplete on steroids. They can:

Suggest entire code blocks as you type
Autocomplete functions using your project’s patterns
Predict your next steps (e.g., scaffolding a new REST endpoint)
Warn about potential bugs or security mis‑configurations

My AI Moment:
When you’re dealing with repetitive tasks, sometimes all it takes is one example. Show an AI assistant what you want - just once - and it can figure out the pattern and do the rest for you.
Caution (Data Privacy): IDE plugins often send snippets to the cloud. Beware of exposing proprietary logic!

AI for Prototyping

Spinning up a quick UI mock or backend workflow no longer takes days. Prompt the AI to:

Scaffold a React component with Tailwind styles
Generate CRUD endpoints in your favorite framework
Mock up sample data fixtures for manual testing

My AI Moment: Check the vibe-coding section for more details 😉
Caution (Cost & Rate Limits): Rapid prototyping via API calls can rack up charges. Monitor usage if you’re not on an unlimited plan.

Prompt chaining

Prompt chaining turns a single LLM into a micro‑pipeline. For example:

Prompt #1: “Generate 100 test usernames.”
Prompt #2: “Filter out names shorter than 6 characters.”
Prompt #3: “Format the remaining as SQL insert statements.”

My AI Moment:
I once built a data‑cleanup flow where one prompt normalized CSV headers, another validated types, and a third generated corresponding JSON schemas.

Context Is King

AI thrives on context. The more you feed it - file structures, recent commits, open pull requests - the fewer hallucinations you’ll see. Next‑gen assistants (e.g., Cursor, Windsurf, Claude Code, or OpenAI Codex CLI ) already pull live project metadata instead of relying on huge prompt dumps.

I asked ChatGPT to collect all the recent models and their context length.

Model Context Protocol (MCP)

The MCP is an open standard developed by Anthropic that enables AI models to interact directly with external tools, systems, and data sources. Instead of embedding extensive context into each prompt, MCP allows AI assistants to access structured data and functionalities through a standardized interface. This facilitates more efficient and scalable integrations between AI models and software environments.

Before MCP, there were already ways to let LLMs call into APIs, but MCP tries to standardize this.

MCP operates on a modular client–server architecture, where

Resources: Provide access to static or queryable datasets (e.g., files, documents).
Tools: Expose invokable functions or APIs (e.g., "create task", "fetch database row").
Prompts: Offer context-aware text templates for AI interactions

Example:
With an MCP for Whatsapp you can search and read your personal Whatsapp messages (including images, videos, documents, and audio messages), and even send messages through an LLM.

Agent‑to‑Agent

Agent-to-Agent (A2A) communication is a protocol that enables multiple AI agents to discover, communicate, and collaborate with each other in a structured manner. Developed by Google, the A2A protocol facilitates interoperability among AI agents, allowing them to coordinate tasks, share information, and operate cohesively across diverse platforms and cloud environments.

In an A2A system, agents can assume specialized roles such as:

Planner Agent: Defines tasks and outlines strategies.
Worker Agent: Executes tasks and performs operations.
Reviewer Agent: Validates outputs and ensures quality control.

This structured collaboration enables the development of complex, multi-agent workflows that can adapt to various applications and industries.

Caution (Maturity): Still in early stages—expect occasional miscommunication between agents and longer runtimes.

Local LLMs

Not everything has to run in the cloud. Local models (e.g., Meta’s LLaMA, Mistral) let you:

Keep code and data on‑prem
Avoid API rate limits or downtime
Fine‑tune on proprietary corpora

You can hook local LLMs into existing solutions already, thanks to Ollama and LM Studio, and the OpenAI API compatibility layer.

Example: A financial‑services company can fine‑tune a tiny LLM on internal docs to answer compliance questions offline - no sensitive data ever left their network.
My AI Moment:
When working with senstive data having the IDE hooked into local LLMs is a game changer. You can still enjoy the assistance from the LLM while making sure your data never leaves your computer.

The New Hype: Vibe Coding

What is Vibe Coding?

Vibe-coding is a development approach where you communicate the "vibe" or essence of what you want to build to an AI assistant, which then generates a working prototype based on that vision. Instead of writing code line by line or even function by function, you describe the look, feel, and functionality you want—and the AI handles the implementation details.

Andrej Karpathy co-founder of OpenAI on X...

Why Vibe Coding Matters

Traditional coding requires you to break down your vision into explicit instructions the computer can understand. Vibe-coding flips this on its head—you express your intent in natural language, and the AI translates it into functional code. This represents a massive shift in the development paradigm:

Lower Technical Barrier: People with great ideas but limited coding experience can now bring concepts to life
Rapid Prototyping: What might take days or weeks to build conventionally can be drafted in minutes
Focus on Outcomes: Developers can concentrate on what the software should accomplish rather than implementation details
Experimentation: Testing different approaches becomes significantly cheaper in terms of time and effort
Iterative Refinement: You provide feedback like "make the buttons more prominent" or "add a search functionality in the header," and the AI adjusts accordingly.
Manual Tweaking: Once you're satisfied with the general implementation, you might make manual adjustments to fine-tune the result.

Tools Leading the Vibe-Coding Revolution

Several platforms are pioneering this space:

v0 by Vercel: Creates web applications from natural language descriptions, generating both frontend and backend code.
Lovable: Focuses on UI generation with a strong emphasis on aesthetic and interactive elements, and also integrates with backend services (like Supabase)
Firebase Studio: Released recently, similar to the ones above, it allows you to rapidly build full-stack solutions, and it specifically integrates well with Firebase services. (I tried it right after the release date… it had some issues with Firebase services, but probably the issues are already fixed)
Gemini Canvas, ChatGPT Canvas, Claude (Canvas?): Integrated right into the online chat solutions - from prompts, these can generate working web apps.
Cursor, Windsurf, and Claude Console: AI coding environments that blend traditional IDE capabilities with natural language interaction for creating components or entire applications.

The Limitations and Challenges

You should not miss this video from “Programmers are also human” on YouTube:

Like all AI-assisted development, vibe-coding isn't without its shortcomings:

The 70/30 (or 75/25) Rule: As mentioned earlier in this post, AI often gets you 75% of the way there, but that last 25% can still be challenging. Vibe-coding is no exception.
Technical Debt: The generated code might work, but could be inefficient or difficult to maintain.
Overreliance Risk: Depending too heavily on vibe-coding without understanding the underlying code can lead to problems when you need to debug or expand functionality.
Inconsistency: Sometimes AI just can’t figure out a solution, sometimes it just overrides your whole code…

My Story with Vibe Coding

What did I build?

I have built websites, React components, online games, even from a single prompt.

Check out vibecoded.work, a vibe-coded site I built to collect cool projects made this way. (You can also find some of my projects there)

I also ask creators about the challenges they faced during development, because it’s not always as smooth as it looks.

👋🏻 Have you already (vibe-)coded something with AI?
Head over to vibecoded.work and add you project, share your experience!

Fast Prototyping

AI tools have taken prototyping to the next level.

Cursor is pretty solid at handling multi-file projects (and supports MCPs), but I’ve still run into issues with it in the past. So I decided to explore web-based “Canvas”-style tools instead.

ChatGPT (4o):
It built something usable right off the bat. But once I started requesting edits, the quality tanked. The output got progressively worse with each tweak.

Gemini Advanced (2.5 Pro):
Surprisingly good. It delivered a working version on the first shot, and the edits afterward were pretty solid. It even pulled off some magical CSS animations I probably couldn't have written myself. (see above 😄)

Painful Fine-Tuning and Maintainability

ChatGPT (4o):
After the initial build, I asked it to fix a few bugs. Not only did it fail, it introduced more issues. Eventually, it started altering unrelated parts of the code. By the end, I had a broken mess that barely resembled the original project.

To be fair, OpenAI recently released GPT-4.1, which is supposed to be “made for developers.” Maybe I’ll give that a shot.

Gemini Advanced (2.5 Pro):
It handled most of my fine-tuning requests well. The edits were made in the right places. Once, it did go off the rails and started rewriting unrelated parts, but I figured I’d run into the context window limit. I copied the whole code into a new chat, and everything worked again.

Even when I asked it to add new logic - like Google Analytics tracking for specific events - it nailed it on the first try.

Maintainability? Well…

Is the code maintainable? Maybe for the AI (up to a point), but for a human… not so much.

Everything gets dumped into a single file.
No clear structure, no reusable components.
If you want to tweak something on the UI that’s repeated some other places, the AI often edits the same logic in all the places instead of abstracting it properly.
Adding extra logic that also outside of the AI’s expertise will be difficult

If I want to take this further, I might ask the AI to refactor it into a more structured React app… we will see how it goes.

Were the issues frustrating? Definitely.
Want to feel seen? Watch “Programmers are also human” on YouTube, and you’ll get it. 😄

Conclusion

Where AI (and Coding with AI) Might Take Us

If we use AI wisely, it can seriously speed up our workflows.

We can analyze data faster, structure messy input, generate content, build prototypes, and do it all in record time.

But let’s be real: we shouldn’t expect fully baked, production-ready solutions. AI is an assistant. It can augment your thinking, help shape ideas, and fill in gaps, but it still needs your direction.

The goal is to find that sweet spot where development feels almost effortless. Where the tools you use anticipate your needs and plug seamlessly into your creative flow.

This is AI Programming, Not Product Development

Is AI replacing developers? No.
Product development is more than writing code. It’s about trade-offs, priorities, scalability, user needs, security, edge cases… you get the picture.

Having a prototype running in a chat window is not the same as shipping a stable, scalable product.
Product-minded engineers know the difference.

You Still Own the Details

Security? Still your job.
Hosting and services? Your job.
Budgeting? Also yours.

AI doesn’t care about your cloud bill or whether your code leaks user data. You do.

And if your architecture is a tangled mess? AI can only help so much.
To truly benefit from AI in development, we need to structure our codebases with clean boundaries: modular, well-separated components that AI can safely touch.
This limits the risk, keeps things within the context window, and boosts relevance.

Software Engineers with AI Boost

Senior engineers:
AI can multiply your output if you use it wisely. You’ll prototype faster, dive into new languages with less friction, and test ideas more rapidly.

Play to its strengths: let it write documentation, generate boilerplate, help with tests, and take care of the boring stuff - so you can stay focused on the hard problems.

But the key skill is knowing when to stop, backtrack, and take the wheel again.

Junior engineers:
It’s a bit trickier. Don’t skip the fundamentals. Don’t assume AI is always right. Learn the basics: languages, concepts, architecture, and system design.
AI is powerful, but you need to know what it's doing and why.

In this new era, developers aren’t just coders. We're becoming curators, guides, and quality filters for what AI produces.

We’re the humans in the loop—and that still matters.

Don’t sleep on AI.
Be open-minded. Experiment. Know what tools are out there.

One way or another, AI will be part of your toolkit, just like it’s part of mine. (ChatGPT proofread and polished this article.)

Oh, and did I mention? All of these tools have free plans.
Better to start learning where it helps (and where it doesn’t) right now.

Engineering in the Liminal Space

Discussion about this post