Jan 25, 2026

Closing the Loop

I’ve been coding with agents a lot lately. Codex, Claude Code, that kind of thing. And I’ve noticed a pattern.

TL;DR: Jul automates the feedback loop for agent coding and keeps your work continuously synced — and it tracks traces/provenance (prompts, agent turns, manual edits) so you can audit how code happened, not just who committed it.

The agent writes code. Sometimes it commits, sometimes it doesn’t. I tell it to run tests. Sometimes it does, sometimes it forgets. Sometimes it runs some tests but not all of them. Sometimes it forgets to add tests entirely and the coverage drops without warning. I ask it to review its work. It finds issues. I tell it to fix them. It does. I ask it to commit. It does. I push. CI fails on GitHub because there was something the local tests didn’t catch.

It’s not that the agents are bad. They’re actually pretty good. But they’re inconsistent. And I find myself doing the same manual dance every time: did you commit? did you run the tests? all the tests? did you check coverage? okay now review it. okay now fix that. okay now commit again.

I wanted to automate this.

The Loop I Keep Running

The workflow I’ve landed on looks something like this: write code, checkpoint it, run CI, run a review, apply any fixes, checkpoint again, repeat until it’s ready, then promote to main.

The problem is that “run CI” and “run a review” are manual steps. I have to remember to do them. The agent doesn’t automatically get feedback on whether the tests passed. If I forget to run CI before pushing, I find out fifteen minutes later when GitHub Actions fails. By then I’ve context-switched to something else.

What I want is for the tooling to close the loop. When I save my work, CI should just run. A review should just happen. If there are issues, I should get suggestions I can apply with one command. And if I’m using an agent, it should get this feedback as structured data it can act on—not error messages I have to copy-paste into the chat.

The Handoff Problem

The other thing that’s been bugging me: I’ve been experimenting with cloud agents. Running Codex on a server, coding from my phone, picking up work on different devices.

The handoff is brutal.

Git was designed for “push when you’re done.” But I’m never done. I’m always in the middle of something. I want my work synced continuously, like Dropbox for code. Close my laptop, open my phone, everything’s there. Spin up a cloud agent, it picks up exactly where I left off.

You can push WIP commits, sure. But then you’re polluting your history. You can use branches, but then you’re managing branch sprawl across devices. You can stash, but stash doesn’t sync.

The Provenance Problem

Mitchell Hashimoto has been writing about something he calls “prompt blame.” As a maintainer of Ghostty, he reviews a lot of AI-assisted PRs. The problem: git blame tells you who wrote a line, but not how it came to exist. Was it thoughtfully written? Or was it generated by an agent and submitted without understanding?

His insight: “Current AI tools show all the machine diffs but hide the human ones. You lose the part where the human is actually thinking.”

This got me thinking about provenance more broadly. When I look at code six months from now, I’d like to know which lines came from which prompts, where I stepped in manually, what the agent tried before landing on the final version. Git wasn’t built to track any of that.

What I’m Building

So I started building something. I’m calling it Jul (줄, Korean for “line”—as in a line of code, or a line of work).

The core idea: automate the feedback loop I’m already doing manually, and sync everything continuously so handoff just works.

It’s built on Git. Same refs, same remotes, same hosting. GitHub, GitLab, whatever. But with a layer that adds:

Continuous sync. Your work is always backed up. Run jul sync and your current state gets pushed to a sync ref. You can always recover. You can always switch devices and pick up where you left off. (Eventually this could be automatic on save, but for now it’s a command.)

Checkpoints with automatic feedback. When you checkpoint your work (Jul’s version of commit), CI runs automatically. A review agent analyzes your code and creates suggestions. Everything comes back as structured JSON. If you’re using an agent like Codex, it gets this feedback directly and can act on it.

Traces for provenance. For the Hashimoto problem: every prompt, every agent turn, every manual edit can be captured as “traces” in a side history. jul blame can tell you not just which commit touched a line, but which prompt created it.

Change-Ids that persist. A logical change gets an ID that survives amends, rebases, and squashes. You can jul diff Iab4f... to see the whole change. Eventually: jul revert Iab4f... to undo it, even after it’s been promoted to main.

The “agent” part of Jul is pretty modest right now:

It writes commit messages (not revolutionary, I know)
It runs reviews and creates suggestions (the main value)
It can try to resolve merge conflicts (experimental—you still need to review these carefully)
Eventually: maybe smarter reorganization of commits when you promote

It’s not magic. It’s automation of the stuff I was doing manually anyway.

The Closed Loop

With Jul, the workflow becomes:

checkpoint → CI runs → review runs → suggestions appear → apply or ignore → checkpoint again

If you’re using an agent, it can call jul checkpoint --json and get back structured feedback: what passed, what failed, what suggestions exist. It can apply suggestions with jul apply. It doesn’t need me to copy-paste error messages or parse terminal output.

And when I’m done for the day, I just… stop. My work is synced. Tomorrow, on a different device, I run jul sync and I’m exactly where I left off.

Positioning

Jul sits somewhere between JJ and Graphite.

Like JJ, it’s rethinking the local workflow—checkpoints instead of commits, continuous sync instead of push-when-ready. But unlike JJ, it’s not trying to replace Git. It’s a layer on top.

Like Graphite, it accepts that Git won and builds on it. It has stacked changes. It keeps refs/heads/* as the canonical published state. But unlike Graphite, it’s not focused on team code review workflows. It’s focused on the human+agent feedback loop.

Maybe the honest pitch is: Jul is to agents what Graphite is to teams.

Where It’s At

I’m still figuring this out. It’s already useful, and the core loop is there—but there are still rough edges, and I’m not ready to recommend it blindly yet.

If you’ve found yourself doing a similar manual dance—agent writes code, you orchestrate the review, you remember to run tests, you push and pray—I’d be curious how you’re handling it.

Because right now, we’re all just glue between the agent and the tooling. And I think the tooling can be better.

ENGINEERED
OUT OF OFFICE*