Garbage In, Garbage Out - Achieving Clean Code from AI

Published by Karim Allaouat on 24 September 2025

AI can write code for you in seconds, but can you trust it in production?

Like any good developer knows, the quality of what you put into a system directly correlates to what you get out, be it data, time, tools or goals. That's why "Garbage In, Garbage Out" isn't just a cliché, it's a warning for AI-First development.

Although this applies to the whole Software Development Lifecycle (SDLC), I'll introduce this principle by showing how a simple coding task evolves from messy AI slop to production-ready, clean Java. I'll do this by:

Improving the prompt.
Adding project-specific context.
Discussing how AI-First tools can help us further.

Along the way, I’ll also share how the Daemon AI-First Squad helps companies unlock the full potential of AI in their software delivery.

I'll be using Claude Sonnet 4.1 throughout for consistency. You may find that different models perform better or worse in certain areas.

Who this is for

This post is for developers who want to get more out of their AI coding tools, not just generating code faster, but generating the right kind of code. It’s also for tech leads and engineering managers interested in how AI fits into modern software delivery: powerful, yes, but still in need of human experience and judgement to keep quality high.

What we're dealing with

If you give AI a vague prompt, you'll get a vague, and often ugly, response.

Prompt:

I need some Java code that checks how strong a password is

Result (redacted):

public class PasswordStrengthChecker {
    
    public enum StrengthLevel {
        VERY_WEAK, WEAK, MODERATE, STRONG, VERY_STRONG
    }
    
    public static class PasswordAnalysis {
        private final StrengthLevel strength;
        private final int score;
        private final String feedback;
        
        public PasswordAnalysis(StrengthLevel strength, int score, String feedback) {
            this.strength = strength;
            this.score = score;
            this.feedback = feedback;
        }
        
        public StrengthLevel getStrength() { return strength; }
        public int getScore() { return score; }
        public String getFeedback() { return feedback; }
        
        @Override
        public String toString() {
            return String.format("Strength: %s (Score: %d/100) - %s", 
                strength, score, feedback);
        }
    }
    
    public static PasswordAnalysis checkPassword(String password) {
        if (password == null || password.isEmpty()) {
            return new PasswordAnalysis(StrengthLevel.VERY_WEAK, 0, "Password cannot be empty");
        }
        
        int score = 0;
        StringBuilder feedback = new StringBuilder();
        
        score += checkLength(password, feedback);
        score += checkCharacterVariety(password, feedback);
        score += checkPatterns(password, feedback);
        score += checkCommonPasswords(password, feedback);
        
        StrengthLevel strength = determineStrength(score);
        
        return new PasswordAnalysis(strength, Math.min(score, 100), 
            feedback.length() > 0 ? feedback.toString() : "Password meets all criteria");
    }

  // 100 more lines of this

Does it work?

Yes, it compiles and runs, printing password strength and feedback.
It does catch obvious weaknesses (short length, common patterns, dictionary matches).
On the surface it looks feature-rich, but that’s deceptive.

If we just ignore the specific business and security issues for now (hardcoded common password, false positives, true negatives) and just focus on the code quality and general issues, then there are many things that make this totally inappropriate to put up for review.

General Code Cleanliness Issues

Massive class: everything crammed into one file with multiple responsibilities.
Magic numbers and strings everywhere.
Regex soup: unreadable, unmaintainable patterns inline.
StringBuilder feedback: building English text inline makes it hard to localise or reuse.
Bloat & rigidity: The class tries to be a password policy engine, but isn’t extensible.
Feature envy: helper methods know too much about scoring.
Lack of separation of concerns: validation logic, scoring system, UI feedback all glued together.
Pseudo-cleverness: looks smart, but real-world password strength is better handled with vetted libraries (zxcvbn4j, Passay).

The code "works" but it’s a false sense of quality: verbose, complex, and riddled with smells that will make the codebase less readable and maintainable.

Step 1: Improving the prompt

In the first run, I asked the AI to "check if a password is strong in Java". Unsurprisingly, it gave me a wall of procedural logic in a single class. Technically correct, but the kind of code you’d expect from a rushed junior dev.

So we can tighten the prompt. Instead of just describing the task, we can describe the qualities of the solution we want. We could think about these three points when writing prompts like these.

Scope
Clean code principles
Constraints

Prompt:

 // scope
Write clean, maintainable Java 21 code that checks if a password is strong.
Start with simple checks: minimum length, contains uppercase, lowercase, digit, and special character.
Return a simple result indicating if the password is strong or weak.

// principles
Use clean code principles:
- Small, single-responsibility methods and classes
- Clear, descriptive naming
- No magic numbers
- Proper error handling

// constraints
Don’t overcomplicate, but make it extensible.

Result:

 public enum PasswordStrength {
    STRONG,
    WEAK
}

public class PasswordValidator {
    
    private static final int MINIMUM_LENGTH = 8;
    private static final String SPECIAL_CHARACTERS = "!@#$%^&*()_+-=[]{}|;:,.<>?";
    
    public PasswordStrength validate(String password) {
        if (password == null) {
            throw new IllegalArgumentException("Password cannot be null");
        }
        
        return meetsAllRequirements(password) ? PasswordStrength.STRONG : PasswordStrength.WEAK;
    }
    
    private boolean meetsAllRequirements(String password) {
        return hasMinimumLength(password) &&
               containsUppercase(password) &&
               containsLowercase(password) &&
               containsDigit(password) &&
               containsSpecialCharacter(password);
    }
    
    private boolean hasMinimumLength(String password) {
        return password.length() >= MINIMUM_LENGTH;
    }
    
    private boolean containsUppercase(String password) {
        return password.chars().anyMatch(Character::isUpperCase);
    }
    
    private boolean containsLowercase(String password) {
        return password.chars().anyMatch(Character::isLowerCase);
    }
    
    private boolean containsDigit(String password) {
        return password.chars().anyMatch(Character::isDigit);
    }
    
    private boolean containsSpecialCharacter(String password) {
        return password.chars().anyMatch(ch -> SPECIAL_CHARACTERS.indexOf(ch) >= 0);
    }
}

public class PasswordValidatorDemo {
    
    public static void main(String[] args) {
        PasswordValidator validator = new PasswordValidator();
        
        String[] testPasswords = {
            "Password123!",
            "weak",
            "NoDigitsOrSpecial",
            "nocapitals123!",
            "NOLOWERCASE123!",
            "NoSpecialChar123",
            "Short1!",
            "VeryStrongP@ssw0rd!"
        };
        
        for (String password : testPasswords) {
            PasswordStrength strength = validator.validate(password);
            System.out.printf("Password: %-20s -> %s%n", password, strength);
        }
    }
}

This small change in phrasing transformed the output. By nudging the model to think like a clean-code reviewer, I got:

Smaller methods with clear names.
A separate enum (PasswordStrength) to clarify the result.
No magic numbers or hard to spot literals.
Proper error handling for null input.
A structure that’s easy to extend later with new rules.

The interesting thing here is that I didn’t teach the model anything new, I just gave it the basic expectations a senior engineer would have during code review. The AI filled in the rest.

That’s the essence of Step 1: don’t just prompt for functionality, prompt for quality.

Step 2: Adding Context

Improving the prompt was enough to get us from "messy junior code" to something that looks a lot tidier. But in real projects, code doesn’t live in a vacuum. It exists in a particular context: your stack, company conventions and your existing classes.

Standard AI models won’t know your project setup unless you tell them. Some tools (Copilot for JetBrains) do allow you to select which files you'd like to add to the context. However if not, you need to paste existing code into the prompt. Otherwise, you’ll get code that looks fine in isolation but doesn’t match your patterns.

Prompt:

// scope
You are writing code for the security module in our Java 21 backend.
Implement a password strength validator that checks if a password is strong.
Start with simple checks: minimum length, contains uppercase, lowercase, digit, and special character.
Return a simple result indicating if the password is strong or weak.

// principles
// must add in the prompt if you can't select this in the tool.
Match the conventions shown in these files regarding code quality and frameworks:
- <paste UserService.java>
- <paste UserServiceTest.java>
- <paste InvalidRequestException.java>

// constraints
Don’t overcomplicate, but make it extensible.
Place the class under `com.daemon.security`.

Result (redacted tests and imports):

@Slf4j
@Service
public class PasswordValidator {
    
    private static final int MINIMUM_LENGTH = 8;
    private static final String SPECIAL_CHARACTERS = "!@#$%^&*()_+-=[]{}|;:,.<>?";
    
    public PasswordStrength validate(String password) {
        log.debug("Validating password strength");
        validateInput(password);
        return meetsAllRequirements(password) ? PasswordStrength.STRONG : PasswordStrength.WEAK;
    }
    
    private void validateInput(String password) {
        if (password == null) {
            log.warn("Password validation failed: null password provided");
            throw new InvalidPasswordException("Password cannot be null");
        }
        if (password.isEmpty()) {
            log.warn("Password validation failed: empty password provided");
            throw new InvalidPasswordException("Password cannot be empty");
        }
    }
    
    private boolean meetsAllRequirements(String password) {
        return hasMinimumLength(password) &&
               containsUppercase(password) &&
               containsLowercase(password) &&
               containsDigit(password) &&
               containsSpecialCharacter(password);
    }
    
    private boolean hasMinimumLength(String password) {
        return password.length() >= MINIMUM_LENGTH;
    }
    
    private boolean containsUppercase(String password) {
        return password.chars().anyMatch(Character::isUpperCase);
    }
    
    private boolean containsLowercase(String password) {
        return password.chars().anyMatch(Character::isLowerCase);
    }
    
    private boolean containsDigit(String password) {
        return password.chars().anyMatch(Character::isDigit);
    }
    
    private boolean containsSpecialCharacter(String password) {
        return password.chars().anyMatch(ch -> SPECIAL_CHARACTERS.indexOf(ch) >= 0);
    }
}

Why this works:
By pointing the AI at real project files, we’re giving it a living style guide - it even generated unit tests! Instead of you spelling out tooling ("use Spring, Lombok, @Slf4j") and coding standards ("No magic numbers") each time, the model infers those conventions from the referenced code. This makes the result fit seamlessly into the project - no retrofitting required.

⚠️ Warning
If the other classes you reference are messy, inconsistent, or vibe coded, the AI will copy that style too. Garbage in, garbage out. Choose clean, representative files as your context examples.

Step 3: Context Isn’t Just Code

Adding context through reference files cleaned things up a lot. The AI picked up the use of Spring, Lombok, exception handling, and logging style without you spelling it out. That’s a huge step forward.

But notice what didn’t happen: it still didn’t suggest using a battle-tested library like Passay for password validation. Instead, it reinvented the wheel with a custom validator.

This is the real limitation of "code-only" context. The AI can mimic what it sees in your repo, but it has no broader awareness of your dependency ecosystem, your company’s preferred libraries, or the trade-offs you’d normally consider as an experienced developer.

This is where context-aware tooling makes the difference.

Tools like Cursor or Windsurf can ingest not just your code files, but your project configuration, build files, and dependency graph. With that wider lens, the AI could recognise:

Passay is already in use in another module.
Passay integrates cleanly with the Spring setup.
You should avoid duplicating core functionality with brittle home-rolled checks.

Additionally, you can provide a context manifest file, like claude.md, that describes your project conventions, preferred libraries, coding standards, and common patterns. This lightweight, human-readable style guide can be shared across repos and projects, allowing the AI to produce code that is consistent with your ecosystem even for parts it has not seen, while aligning standards across the business.

Project: Daemon Backend
Language: Java 21
Frameworks: Spring Boot, Lombok
Testing: JUnit 5, Mockito
Logging: SLF4J + Lombok @Slf4j
Package conventions:
  - com.daemon.* for all internal classes
  - services under .service, controllers under .controller, repositories under .repository
Coding standards:
  - Small, single-responsibility classes
  - Clear descriptive method names
  - No magic numbers or strings
  - Use Optional instead of null where appropriate
  - Throw custom exceptions for business logic failures
Common libraries:
  - Passay for password validation
  - Jackson for JSON serialization
  - MapStruct for DTO mapping

The more of your ecosystem you let the AI see (frameworks, libraries, conventions) the better the results. Context here isn’t just about files, it’s about the shape of your whole codebase.

The AI-First Safety Net

Better prompts and context can turn AI from a novelty into a useful coding assistant. But the hardest problems aren’t in the steps above; they lie in making sure the inputs are trustworthy: clean codebases, consistent conventions, proper governance, and the right libraries. These are the areas where vibe-coders or junior teams often stumble.

Out of the box, AI won’t necessarily suggest tried-and-tested solutions like Passay. It won’t enforce your legal, security, or data rules. It will happily echo whatever you feed it, good or bad. Without experienced developers guiding it, AI can easily amplify existing mistakes instead of improving your code.

This is why experienced developers are indispensable in an AI-First workflow. They set the standards, review outputs critically, and decide when to trust the AI versus when to intervene. They shape the ecosystem so the AI can actually produce code that’s safe, maintainable, and aligned with real-world requirements.

That’s exactly what we focus on in the AI-First squad at Daemon: helping teams make AI a force multiplier, not a liability. If you’re wrestling with messy code, shaky governance, or unclear practices - reach out.

Because in the end, AI is only as good as the world you drop it into.

Tags: Artificial Intelligence, AI/ML, Technical Blog, ai-first squad, code, ai-first

Back to Blog