#18 Jun 05, 26

"Just Ask Codex": Inside OpenAI's Own Working Culture | Romain Huet, OpenAI

Romain Huet: 00:00

We at OpenAI noticed that, like, virtually everyone at the company is using the Codex app. And Germany is actually one of the top five markets in the world for Codex Momentum. Germany? Yeah. Those models are so so good that we're still too shy with them. We're still not using them to the the maximum of their abilities. It's very likely it could work for, like, one day, two days, maybe three days or more completely uninterrupted and come back to you with, I'm done. Everything is up to date, and I tested everything.

Christoph Magnussen: 00:43

Wir sind mit dem ganzen Podcastteam nach Paris gefahren und geflogen für diesen Podcast. Wir haben die Möglichkeit, mit Romain von OpenAI, der die Codex App verantwortet, zu sprechen. Die Codex App ist momentan eine der heißesten Anwendungen, wenn's AI Harnesses geht, also Agent Harnesses, Sachen, die man machen muss um Agents zu steuern. Das ist der heiße Scheiß momentan, wenn's KI geht. Ihr bekommt hier bei AI to the DNA wirklich mal einen Blick von Tooltourismus hin zu Champions League, denn ihr seht jetzt, wie OpenAI selbst intern diese Super App nutzt, die mittlerweile von über 4 Millionen Weekly Users genutzt werden. Und das hat sich innerhalb der letzten Wochen verzehnfacht in Deutschland. Das heißt, wir haben's hier wirklich in den frühen Stunden mit einer neuen heißen Anwendung zu tun. Und ihr könnt jetzt dabei sein, freut euch auf ein Deepdive aus Paris OpenAI Codex bei AI to the DNA. I ask you to say the typical sentence even if I'm the host because this is what everyone knows from you from YouTube. What's what you usually say?

Romain Huet: 01:42

I usually say something like, hey, everyone. I'm Romain From OpenAI. From

Christoph Magnussen: 01:47

OpenAI. Indeed. Romain, welcome to the podcast. We're in Paris at your at your office and you're not only from OpenAI, you're one of the key people behind Codex. Yeah.

Christoph Magnussen: 02:00

That's what I want to chat with you about today.

Romain Huet: 02:03

Yeah. I'm excited. Thank you Christoph for having me and thank you for making the trip to Paris too.

Christoph Magnussen: 02:07

Very very welcome. We had a very nice sightseeing to a lie last night. I I took the team and said like, we at least have to do some walk around the city. So it's it's very beautiful and and it was a good good start.

Romain Huet: 02:17

That's awesome.

Christoph Magnussen: 02:18

The goal for this podcast for me is always to like take people along and explain them that we don't wanna be tool tourists jumping from tool to tool, but more becoming like really more like a champions league player when it comes to tools. And I want to challenge you right away with that. I made the first video about Codex last May and that was when Codex was not an app but like a tool you were using internally. And I tried it and I was like, didn't work the way I wanted it. Then I opened when you released the app, the Codex app again.

Christoph Magnussen: 02:51

Used it with the model I think 5.2 and then 5.3-codex. I was like, okay, this is working out. It was kind of the same moment I had with Claude in December, November. And then I realized with GPT-5.5, holy smokes, this is really working now. How it is that Codex is not just a better IDE for developers, but now changing into a new layer of work.

Christoph Magnussen: 03:17

Is that planned or did it just happen?

Romain Huet: 03:19

Well, I think we it's a great great observation, by the way. We we thought about, like, coding for a long time. Right? I think even the name Codex at OpenAI goes back, like, six years now with the when the first, like, models were able to output code. And when we started, like, this new generation of, like, Codex product a year ago, we always had this vision of, like, agentic coding taking place, like, in the cloud, like agentic delegation with agents being able to write code.

Romain Huet: 03:46

The reality is that the models were not quite good enough at the time. You could still do, like, pair programming, which was very popular in 2025. The idea of having an AI that you can, like, talk to for, like, architectural questions, maybe, like, writing some files or some functions. But you could not really delegate complex, like, features to build from scratch or, like, let alone a complex refactoring of a of a code base, and now we are there. So we we really made, like, strong progress on our stack of models.

Romain Huet: 04:17

And as you as you pointed out, I think 5.2 and 5.3-codex were the the models that had the turning point. I could even see the very best engineers at OpenAI late last year were still writing some code. Mhmm. And as of 5.2 and then GPT 5.3 Codex, this is the moment they realized that the models were so good that they did not have to write any code by hand either. And so that's when we really started to manifest this agentic coding, the model being able to take on very very long tasks.

Romain Huet: 04:48

It could be like a five minute task but it could be like a five hour task or even like days of work uninterrupted. And as such, we also realized that, well, the interface we've been using to write code is no longer the correct one. Right? Because if you pair programming in an IDE, it makes sense because you're talking to the model, you're outputting a function, and you're kind of like in the flow. But if you start delegating tasks to a model that you really trust to do the right thing for like many hours at once, you know, Like what we saw last year was like terminal based interfaces where you have like 10 or 12 tabs in your terminal, and each of those agents is trying to get your attention at some point.

Romain Huet: 05:28

We knew we had to build something better for this new world of like delegating to agents, and that's how the Codex app came to be. It was really this realization that, like, the models had become so good that we needed to have, a command center Mhmm. To be able to, like, not just delegate tasks to agent, but also give them the right permissions, review their work, and and really like be in the seat of a manager of agents, if you will.

Christoph Magnussen: 05:52

The command center is a much better word. I used the word agent inbox last year to explain it, and it was very early days. And a lot of people in my company were so afraid of using the terminal because like we have psychologists at Blackboat, we have consultants enablement, people who do enablement and they were like do do I need to write code now? So what if you need to describe in a typical let's say German corporate company or Mittelstand Codex to someone who knows ChatGPT. And says, well, I have ChatGPT I asked ChatGPT.

Christoph Magnussen: 06:26

What should I do with Codex because I don't write code? And you're not allowed to to to use the word code. You have to describe it in a work way to them. Yeah.

Romain Huet: 06:35

I think like if they've been used to chat GPT, the great news is that they've already like already mastered the first step of like, oh, I can talk to an AI and this model is very smart. And it's able to actually have a very personalized response for me. And maybe if they've used ChatGPT before, they may have connected their calendar or they may have connected with their Gmail. You know? So they they've had that personalization and that experience so far.

Romain Huet: 07:00

What is very magical with the Codex app for literally anyone now is that you have not just access to all of the services that you have online, but also with the right safeguards, of course, and approvals, you also have access to your computer. And what that means now is that any kind of everyday task that you do on a computer, the Codex app is able to do it for you. So for some people, that's writing code, and that's where we started. But for some others, that's like, well, I have presentations to make every time I meet a new prospect or a customer. Or maybe it's like I work in finance and I'm manipulating Excel sheets every day.

Romain Huet: 07:38

Or maybe it's like, you know, I'm I'm working in recruiting and I need to have like a pipeline of candidates that's triaged. Like, all of these things are actually tasks that we all do on a computer and most of them take time. They usually connect data sources from multiple places. They try to visualize things. There are actions to click on.

Romain Huet: 07:58

And then you realize that like something like Codex, which was initially made for coding, can be unlocking all of that and really give you like more power and like higher productivity for what you're trying to accomplish.

Christoph Magnussen: 08:11

I'm always trying to not to over hype apps unless I have a strong feeling and I'm a heavy user I would say it but but you can challenge me on that one what a heavy user defines but I I observed throughout the last weeks before the podcast Usually the first app I open in the morning is Codex also to prepare the podcast. So I have a very neat presentation ready on the on the phone. It's made on in the on the Blackboat design. So people listening to the podcast now they cannot see it but where is it? I'll show it later on the on YouTube.

Christoph Magnussen: 08:47

So it even uses the Blackboat design Very cool. That we have. Even day day mode, night mode, we we you can switch and do all the stuff. And to me, the question is, since you are so strong when it comes to product, I mean, the the the job that you had before OpenAI was more developer related using But the my feeling is and and help me to understand it better is that you are very deep product person at the same time you're a very deep engineering person and this is how you bring this product to life. Is is this the way you work on that on on the app or is it more testing, experimenting?

Christoph Magnussen: 09:28

How do you work on that?

Romain Huet: 09:29

Yeah. I think it's a great question. I think, like, the the Codex team is very much, like, iterating quickly, like, part of the culture of that team, like, you know, building quickly, but also iterating quickly based on user feedback. So you might have seen many of us on x, you know, every single day trying to engage with developers, with builders of any kind. And by the way, I think that's what I love about Codex.

Romain Huet: 09:50

It's the idea that, like, everyone is a builder. And maybe that's another way to, like, you know, have a shorter descriptions. Your previous question, which is like, how would you describe Codex to anyone when now anyone can build? And build can mean many things. For engineers, it means it means like building apps or products.

Romain Huet: 10:07

But for everyone else, it could be like you're building something that

Christoph Magnussen: 10:10

That's actually a picture.

Romain Huet: 10:11

Your to your work. That's very exciting. Things that were usually limited. So like, oh, I need to have access to an engineer to do this thing. Well, now you can.

Romain Huet: 10:20

And so the way we iterate on the Codex app is very much like on one side benefiting from all of this feedback from the users that we talk to every day, but also on the other side trying to manifest the capabilities of the models that changing so quickly. Right? It used to take us more than a year to build a new model at OpenAI. Mhmm. And you've seen the pace of change recently.

Romain Huet: 10:43

Right? Like now, on average, I believe since GPT-5 and 5.1, I think it's been like a model every five weeks. That's the pace now. And and those models you've noticed yourself, like when we went from 5.4 to 5.5 in five weeks also, these are not just like incremental tiny changes. Like for some people, they've seen like, wow.

Romain Huet: 11:03

I've I've noticed like glimpses of AGI there. Like I'm I'm noticing something that no other model was able to do before. And and so we're trying to see like how can we manifest these features into Codex while at the same time talking to talking to users on the other side. And by the way, the reason why we we try to make it so much better on everyday work is because we at OpenAI noticed that like virtually everyone at the company is using the Codex app day in, out. It went quite quickly far beyond the engineering team.

Romain Huet: 11:38

And so we noticed there was something there, When our CFO, for instance, Sarah Friar, is using Codex nonstop. And I'm like, she's mastering like her you know, how we do finances and Excel and all of that, like, on Codex. And I'm like, there's there's surely something here for others to also benefit from.

Christoph Magnussen: 11:55

Do you train other teams, or is it just happening one on one in the kitchen?

Romain Huet: 12:00

I think there's a bit of both. Like, we do some enablement, but I think it's like the natural curiosity. Of course, like, we have an edge where, like, when you when you have, like, Codex teammates at the company, we often, like, refer to to each other as, like, we're Codex build, you know, like, how passionate we are about Codex. It's very often if you walk around like OpenAI HQ that every few minutes you would have you would hear something like, just ask Codex. If someone has like a crazy idea of a very complex task, be like, have you asked Codex?

Romain Huet: 12:30

Yeah. Let me ask Codex. And and you would overhear that quite often. And so I think no matter which team you're on, we try to kind of blend people as much as we can. That's why we have a strong like in person culture at the office.

Romain Huet: 12:43

So we're naturally well equipped to get everyone on board, you know.

Christoph Magnussen: 12:49

I when I when I have the feeling for for a new software or tool, like back in the day, mentioned Salesforce for example. We had the same in the company. I always said like, look take a look in Salesforce back. Is there something so we don't need to talk? Or then later with Slack.

Christoph Magnussen: 13:04

And you see that people change the behavior. What I see now as a challenge and I want to disconnect it a little bit from the tool is that the way we used to work from the ChatGPT moment in terms of chat answer, chat answer, chat answer to delegating jobs to agents and then to coordinate an agent harness is by far more complex and more challenging. What would you explain to someone who says like help me to understand the difference between a pure chatbot and in harness. Mhmm. No matter if it's Codex or any other harness and why it's so important to have a close look into these harnesses also for companies to be prepared for the next year's coming.

Romain Huet: 13:49

Right. I think if you rewind like a couple years, right, when people were starting to deploy ChatGPT and chatbots at scale, if you remember the conversations at the time, they were like, oh, does the model hallucinate? Does it have like, does it have enough in the training data to help my company? Or I should probably do, like, rag in order to, like, get my documents and my context into the flow of the conversations.

Romain Huet: 14:16

These were kind of the main questions at the time. What's very fascinating now is that thanks to reasoning progress first, the model has been able to call tools. And so when it's able to call tools, it's able to look up something on the Internet. It's able to look up something in one of your data sources within a company. And all of that, as we mastered and polished it over the over time, we we refer to it as the kind of harness of the model.

Romain Huet: 14:42

So it's the idea that, like, maybe two years ago, the battle was on the model layer Mhmm. And we would just, like, get higher and higher intelligence at every step of the way. Now when you look at the model like GPT 5.5, the intelligence is extremely high. I would even say that like GPT 5.5 is already smarter than me and like most of us at OpenAI. But what makes the difference between GPT 5.5 being a good employee versus someone else within the company, it's really like how much context does it have on how to do the job correctly.

Romain Huet: 15:14

How much access does it have to the context of all the documents we've written, the conversation with customers, the code base we have, the conventions we have in place, things like that. Mhmm. And so all of that refers to the harness. It's the idea that like, okay. There's the model layer, but on top of that, there is the environment that the model lives in.

Christoph Magnussen: 15:34

Mhmm.

Romain Huet: 15:34

The list of tools it's able to call to, whether it's like read a file, write a file, whether it's like connecting to like the web for a search, connecting to my datasets. And all of that harness, I think, is what matters when you want to really manifest this agentic capability of the model. Not just doing a task, writing code or writing a document, but the ability to cross check with multiple sources and check its own work. So for instance, in the case of writing code, what makes the model so good, it's like, okay, I'm gonna I'm gonna write some code, but I'm gonna test that code. In fact, better yet, I'm gonna open it in the browser.

Romain Huet: 16:11

I'm gonna take a screenshot of what I've done. I'm gonna make sure it complies with the design, for instance. Well, no. Not quite yet. Okay.

Romain Huet: 16:18

Let me go back to it. Let me edit this piece of code. Let me recheck again. All of that, we essentially refer to the harness, the ability to have a model, its environment, its tools, and all of that. And so to me, that's where really we we push the frontier with with the model like GPT 5.5.

Romain Huet: 16:37

And that's why when it comes to when it comes to models and Codex, that's why we also wanted to, like, have a great foundation and make sure, like, ChatGPT itself, for instance, is now benefiting from that Codex harness already. The new agents we're building within ChatGPT are on top of the Codex harness. And what I find very compelling as a developer is that we open source all of this. So the Codex CLI, so the interface in the terminal, that's open source.

Christoph Magnussen: 17:05

Open source.

Romain Huet: 17:06

Yeah. But this Codex harness that I that I refer to, that's also open source.

Christoph Magnussen: 17:10

Oh, wow.

Romain Huet: 17:11

And the Codex app server also connecting that harness to our API is also open source. And I think that's quite unique, and I think, frankly, we don't talk, like, enough about this.

Christoph Magnussen: 17:20

I I shared it a couple of times, but I I I think many people don't realize that most of the other big players are closed source

Romain Huet: 17:28

Mhmm.

Christoph Magnussen: 17:28

Unless something leaks, then it's also open source, but well, that's not planned then. And and and you open source it. You also shared last year the model instructions in in in parts of it. Think not everything because when you refer to the harness coordinating stuff, I think many of the things in the background are related to simple prompts getting more complex if it's many prompts. Right?

Christoph Magnussen: 17:53

Yep. Is it something you can also, like, get a glimpse into or is this pure secret how you do it?

Romain Huet: 17:58

No. I think, like, what's very interesting now that you have this harness layer and the models be becoming, like, so smart. So we we bring now back into the training the harness as well. So like now that we have this like open source harness that people can introspect and they can see like what kind of tools do we have in there, how how do things connect together. We also bring this back into the trainings.

Romain Huet: 18:21

The model is even better when when we have this, like, reinforcement learning techniques where the model gets a reward for achieving a task. We also do so with the right, like, harness in place, so we we get even better gains. But to your point on prompts, I think what I find fascinating now is how much even just a year ago, we were still talking so much about prompting techniques or prompt engineering guidance. You know, if you wanted to get the best out of the model, usually, you had to kind of prompt the model a certain way. I remember even in the early days of coding, we would even say like, oh, you are a staff senior engineer, have very high IQ.

Romain Huet: 19:00

And as such, you may notice differences in how the code is written.

Christoph Magnussen: 19:04

All of that

Romain Huet: 19:05

is really gone. What what I love about the models to date is that there's even less to learn to use them versus a year ago, and yet they are so much more powerful. When I tell when people ask me like, oh, can I do this thing with Codex? I'm like, can. Like, just talk to Codex the exact same way you just talk to me.

Romain Huet: 19:25

All of the same words. You don't even have to have your prompt perfectly written. Because the models are so smart. We don't have to give them, you know, as much, like, care in the prompting. You know?

Romain Huet: 19:37

So so, yeah, I think that's that's very that's very cool. I think we have other parts of the Codex, like, suite where we sometimes have very specific prompt. So for instance, when we put Codex in the form of code review. Code review is really, really powerful with Codex because it's able to find bugs or vulnerabilities and things like that. Well, for this, we we give Codex a very specific prompt.

Romain Huet: 19:58

Like, your job is to go after, like, what could potentially be a bug and how to stack rank them and how to give feedback on these bugs. And so by giving the model this different personality, it can behave differently. But as you talk to Codex for your everyday work, what I find magical is like I barely type anymore. I just dictate to Codex. And that's my prompt.

Romain Huet: 20:21

And oftentimes, people are surprised by this, especially if they're still in the kinda ChatGPT mode of thinking. But, like, when I start talking to Codex, I don't even know where I'm gonna end up sometimes. I'm just brainstorming out loud. I might change my mind a couple times over the course of that minute where I'm still talking to Codex for what I'm trying to accomplish, and yet I trust Codex to figure it out.

Christoph Magnussen: 20:44

So And it's how how would you describe the percentage of is it 30% or 40% model intelligence and then 60% the smart ways of routing through different prompts in the system prompt? Or how would you

Romain Huet: 20:59

I think it's like part model intelligence, part context. I think it's really the context of being able to access these tools. Like, if it's a let's say, if it's a coding task, and if I'm asking Codex to build a new feature or, like, completely change the behavior of an existing part of my app, like, the intelligence of the model is pretty high, of course. But I think what matters is, alright. It the the user is asking me something here.

Romain Huet: 21:27

Let me look at the entire code base. Let me piece things together. Okay. I understand how the whole system works. The user's asking me to build this new feature, but I saw over there that they had the design system.

Romain Huet: 21:38

So sure enough, I can probably reuse these components over here and bring them over there. Like, that understanding of the whole picture by being able to read your files, connect your context. Same thing if it's like a if it's a kind of like a knowledge task almost. If I'm asking like, hey, you're gonna be my chief of staff every morning at 7AM. I wanna have a report of everything that matters to me.

Romain Huet: 22:03

All the emails I should be answering to first thing in the morning. I also want you to draft the response because you know my style of writing. You've seen me write before. You know what matters to me. We've talked enough about these documents and what matters for this coming quarter.

Romain Huet: 22:18

These are my priorities. And based on all of that, you know, the model will come up with something quite magical. And of course, it's part intelligence because the model is smart, but it really comes down to this connects to this context. Right? Like, the ability to connect to all of these tools to know what really matters to Christoph versus like the the raw IQ of the model.

Christoph Magnussen: 22:39

And this is something that I that I'm observing a lot and I was asking, I think pretty much since the video last year, how would we organize the data internally? And I really brainstormed a lot about it. And I think in March, I stumbled upon something I never realized, although I'm so long into tech, that Google, all of Google is basically one repository. So for people who are not technical, repository is technically folders and text files where all the code is. Yep.

Christoph Magnussen: 23:08

I think 2,000,000,000 lines of code. And then I also stumbled upon that you guys in one podcast someone mentioned that you are also just you have one major repository OpenAI. So this is a very and I thought like that must be a very smart way to organize it. So what we did at Blackboat now is we have a huge repository called the Blackboat Operating System where apps are running, internal apps are running, the external website is running like many things are running. Is this the way to organize your I mean, I I I think of it in in layers.

Christoph Magnussen: 23:39

Think in like one box is the repository where all the logic is. The second would be the layer of work conditions meaning databases, clients and stuff that you don't have in the repository. And then you have the agent layer working on that or the human layer. Yep. Because the artifact in the end, the fourth layer is just a PowerPoint presentation, the website, the code, the whatever it is.

Christoph Magnussen: 24:01

And you can then recollect it reconnect it to learn. So to me, this is the very simple way to think of it. Is it that clear or is it more messy? What is your best practice on that for a company who thinks like, how how do I need to change?

Romain Huet: 24:16

Yeah. No. I think you're you're pointing like all the right things. And I think like monorepos have actually become quite popular now for many years at tech companies even before AI was so good. And I think there are many properties that are like elegant about monorepo.

Romain Huet: 24:32

You oftentimes have, like, many, many teams working in, different products, especially, like, at large companies like Google that you mentioned and ours now, but many others. And I think oftentimes, like, you ship a new feature of an app, but it also requires an API update on a different service and everything. And, like, the orchestration across, like, dozens and dozens of repos is hard. So I think, like, the mono repo has this, like, nice property where, like, all the engineers can work collectively on one code base, even though they might actually touch, like, only a fraction of it. So interestingly, this was already becoming popular before AI, and AI and an army of agents are actually benefiting from that as well because they have the complete picture of the whole code base.

Romain Huet: 25:14

Right? If there's one API chance change to make and one front end change to make, it's all in one place so they can see the whole picture. But I think that's right. I think it's the idea that, like, you have now empowered engineers, but also any kind of employees to do more by delegating to agents. And it's really this idea of like, okay, like what is the best way to structure the company so that these agents can do their best work?

Romain Huet: 25:42

One thing for instance I love is that like our company relies a lot on Slack. And I'm always telling my teammates like, try to always avoid like private channels unless something is sensitive, of course, and has to be discussed in private. Because everything else, like, let's say, a new idea for a project or a new prototype we're building, if it's all in public channels, I can simply ask Codex. Hey. Catch me up on that project.

Romain Huet: 26:07

I have not had the chance in the past week to, like, figure out what was happening. Sure enough, like, dozens of people have chimed in and, like, started to create documents or maybe some artifact of code. Catch me up. What's the latest? What decisions have been made?

Romain Huet: 26:22

What are the decisions still to be made? And I think that's cool because Codex can access all of these contacts too.

Christoph Magnussen: 26:28

Is it that you have certain rules saying, like, okay. This was Romain, the human, and this was Romain, the agent? Or how do you do it?

Romain Huet: 26:37

You mean, like, within Slack?

Christoph Magnussen: 26:38

Yeah. Like, is there something where I'm like, I'll give you an example. I prepared I prepared an enablement internally that we will have within the next two weeks. Mhmm. And within one prompt, I prompted Codex and said, hey.

Christoph Magnussen: 26:50

Well, I'm thinking of doing an enablement for the whole team, building agent harnesses, understanding in each domain how it works. I want you to inform my colleague from HR Yep. My colleague from training, write them the email in my style, prepare the presentation, make it in HTML, upload it to our internal sites tool, just share it with the two of them, and just make them a note in the end that this whole thing was even the even the find the dates in the calendar, set everything, and then quick note, it was done by Christoph, the agent.

Romain Huet: 27:21

Yeah. Yeah. They So that they realize it. Exactly. So on Slack is already what's happening.

Romain Huet: 27:24

So for instance, last night, my I received a Slack from my colleague Jason who was working on something over the weekend. And I saw this was like a note sent by his agent because that's already part of the signature on on Slack. I was like, alright. So he had been working on this thing, and I just noticed that, like, his agent or Jason prompted his agent to give me an update about it. But I think what's even more powerful is not so much the the writing because that's usually, like, on Slack in a conversation manner.

Romain Huet: 27:50

It's pretty rare that we use, like, an agent for this except for, like, a small status update. Yeah. It's more for the context gathering. Right? Like, if I wanna do my very best work, I gotta build a very complex understanding of what's happening and keep the the latest in my mind.

Romain Huet: 28:06

And I think like Codex can really be magical for this.

Christoph Magnussen: 28:08

A lot of people pointed to AI and and did that and saying, okay, we become more efficient and and behind closed doors, I didn't get the question from top level years ago. Who can we let go if we do that with AI? And I said like, I think it's a wrong question to ask because you usually don't have one task connected to one FTE. That's a very Mhmm. Mathematical way of thinking and the world is not like that.

Christoph Magnussen: 28:34

I said work is way more messy. I refer to it and say how about you leverage, you truly leverage the work. And I have the feeling with agent harnesses exploding in the last month, we now have the tool to do that. But we what we also need on the other side is once you give all the context and all the stuff is trust security build around. Mhmm.

Christoph Magnussen: 28:54

And also, like, because, I mean, in the end, it's trust that people buy when they when they relate on a company. Tell us a little bit about how you build it into the product because I see these little nuances saying, okay, I can use the auto mode and I know Codex asked for this and that. I, for example, also use of course, know Cursor quite well and I know Claude Code quite well. Py and so on, even OpenClaw and Hermes like agents. And we experiment a lot on what is the model or the harness asking back if it comes to trust.

Christoph Magnussen: 29:29

How do you build it in the product because you guys go quite far with computer use or

Romain Huet: 29:35

what I

Christoph Magnussen: 29:36

read last night even locked out close and locked out Mac that the model can use, but that's pretty new I think at least as on an x working on a Mac.

Romain Huet: 29:46

Oh, when someone was talking to us from a phone?

Christoph Magnussen: 29:49

Yeah. Yeah. Like from the phone and then still use my Mac even if it's at home, stuff like that. Yeah. You need a lot of built in trust and security.

Christoph Magnussen: 29:56

Yeah.

Romain Huet: 29:57

How does it work? I think there's a few ways that we we think about this. Well, first of all, like, we we care about safety. Like, you know, it's it's paramount to us. Like, we wanna make sure that everything we do and ship has to be, like, extremely safe for for the users.

Romain Huet: 30:10

That's why we also wanna put like the technology in the hands of people all the time as early and often as we can. But in the case of like Codex, first, starts with like the model intelligence. Right? Like, with a model as smart as g p d 5.5, by default, the model is already very, very rigorous in its thinking and its chain of thought in terms of using the right tools at the right time. But on top of that, you mentioned auto review.

Romain Huet: 30:35

I think that's what I recommend to most people to use. Actually, by default, the model, when you are in this mode will always operate in the sandbox. Right? And it will always, like, execute these commands in the sandbox that are kind of like constraints to It's

Christoph Magnussen: 30:48

been sandbox to someone who's not a developer. Yeah.

Romain Huet: 30:51

So it's the idea that, like, instead of having, like, complete access to your entire OS and file system and everything, it will work on a limited scope Mhmm. By default. And only if and when you the model or the agent might need more, it would ask for your permission. Right? And that's very important to us.

Romain Huet: 31:10

By the way, I think another interesting note on that on that point is that, you know, the the way we designed the sandboxing is using a lot of the primitives that exist today on macOS. Right? Because it's a Linux based OS and has, like, a lot of the nice properties. But people were asking us a couple months ago, like, what what is taking you so long to ship the Codex app on Windows? And now we did.

Romain Huet: 31:34

Like, the Codex app works on Windows, but it took us a few weeks even though it's the same code base as the Mac app. Well, the one reason, the one and only reason Mhmm. We took longer was because we wanted to get Windows sandboxing right. The ability for the Windows users to also have the same properties and be able to have an AI agent that runs on their machine in a very safe way. And it turns out the Windows operating system does not have as many core primitives.

Romain Huet: 32:02

And so we really wanted to get it right, very, very complex architecture. But we wanted to spend the time there to make sure that people are like like, oh, I installed the Codex app and then now my hard drive is like, everything has been deleted. That would be a terrible experience. Right? So we wanted to get this sandboxing right.

Romain Huet: 32:18

And so that's the that's the sandbox

Christoph Magnussen: 32:20

Mhmm.

Romain Huet: 32:20

Keyword right there. And and you also mentioned, like, computer use. So that's coming soon, to Europe. It's something I'm the most impressed by in recent weeks. I'll show you a demo, later if you have not seen it yet.

Christoph Magnussen: 32:34

For the ones listening now, we will do the demo and show it on YouTube. So that's where you could find it then because I'm really excited about that.

Romain Huet: 32:40

Yeah. I think what is very magical about computer use and and why does it even exist? Why do you need such thing? Well, we talked about the fact that you have, like, dozens and dozens of plug ins where you can connect your sources online. We've talked about the fact that it can access with this, like, safe guardrails, the files on your computer, whether it's writing code or an Excel sheet or something.

Romain Huet: 33:04

But what if you would like your agent to complete the task for you and there's no, like, application for it and there's no API for it? But all there is to do, frankly, is, like, hundreds of copy and paste around different applications that live on your laptop. Well, that's exactly where computer use come into play. You can actually ask your agent and say, hey. Could you open my iOS simulator, for instance?

Romain Huet: 33:30

Can you, like, test my iPhone app on every possible screen size? Please feel free to click around and test screenshots all of this because I have no time to do it by myself, but, you know, go ahead and try. If there's any bugs, we should fix them and things like that. So it's very, very powerful because you can do QA. You can do all of these tasks or even the toil that has to be part of your days.

Romain Huet: 33:51

Well, once again, if you're asking such thing, computers would ask you, are you okay with me taking control of your iOS simulator right now? And you'll be like, yes, just for this time. Yes, always, because that's an app that I don't really have any worries about. So at every step of the way, whether it's like connecting to a data source online, reading your files locally, or accessing apps on your computer, we try to have all of the right defaults and the right guardrails so that the user is always in control to say, yes. I'm approving this so that I'm approving this agent to to use this app or read my files.

Christoph Magnussen: 34:28

What are the heavy users, like, you talk about them doing at the moment? What did you see and where are they? Like, give us a little bit view into also, like, numbers of growth where you see Codex at the moment. I have a I have a strong gut feeling. I'll share with the company because we, of course, we are independent from from from all the players, but we have a very close look at all of them.

Christoph Magnussen: 34:49

And I'm currently pushing the company a lot into heavy users and and see that more and more of the people using it a lot use Codex, but like what do you observe when you see heavy users and where are they typically?

Romain Huet: 35:01

Yeah. It's a great it's a great question. I mean, like, we've been very fortunate to see the growth, like, since the beginning of the year. I think on average, like, since the beginning of this year, like, the Codex growth has been multiplied by eight. And Germany is actually one of the top five markets in the world for Codex Momentum.

Romain Huet: 35:17

Germany? Yeah. Germany has done 10 x on on Codex Momentum. It's I'm really, really fast.

Romain Huet: 35:22

I think AutoScout24, for instance, like, we're talking to them, like, a week ago. They've deployed now, like, Codex to many parts of their company. More than a thousand people use it and great results. So I think what's very interesting around the the momentum is to see, like, on one side, some of the very best engineers in the world, the most prolific engineers in the world now kind of getting into Codex or getting into GPT 5.5 Mhmm. To achieve very complex things.

Romain Huet: 35:51

You know, of course, like Peter Steinberger from OpenClaw is one of the big Codex users, but I've seen many others like Mitchell Hashimoto from HashiCorp and and Ghostty also moving to Codex. Many, like, top leading open source developers moving to Codex. But then at the same time, you also see the momentum on companies starting to adopt it very fast. Right? Like, it took us only two weeks to go from, like, 3,000,000 users to 4,000,000 users on Codex.

Romain Huet: 36:17

And we can see the adoption of these

Christoph Magnussen: 36:19

2,000,000 users weekly. Yeah.

Romain Huet: 36:21

Wow. Codex now. And and when you look at GPT 5.5, the model alone, like, was the the fastest to adopt of any model launch. Like, you know, it was like three x more used in the first week than any of our past models. So we can really see that we now have a combination of a product like Codex and a model like GPT 5.5 that is delivering real real value for for people.

Romain Huet: 36:49

And we've seen of course, engineers are using it to like, you know, delegate complete complete tasks to Codex, like more and more complex tasks even, like they they completely rely on it. But I think what I find very exciting too is this like, you know, number of people flocking to Codex and starting to use it for non coding tasks. And though so we've made the onboarding now much easier as a result of that. Because, of course, if you're someone who's working on, you know, documents and spreadsheets and presentations, you don't really wanna see like a git diff or a pull request. You're like, I don't even know what this word means.

Romain Huet: 37:26

I'm just trying to do work. And so we've completely reimagined the onboarding so that you have a very elegant interface. You can pure

Christoph Magnussen: 37:34

toggle and say, don't show me the code.

Romain Huet: 37:36

Yeah. As pure as a ChatGPT interface. Mhmm. But what I find very exciting is that the Codex app is able to morph itself depending on what you're trying to accomplish. Right?

Romain Huet: 37:46

If you're generating an image and you're like, hey, I would like to have a new, like, image for the podcast. I would like, hey, I would like to rethink the design. It should be something very visual. And by the time you love this design, maybe you're like, alright, GPT 5.5, take this design and make it happen. And boom, now it's like a coding task.

Romain Huet: 38:04

But maybe you're gonna be like, hey. I'd like to see the stats of viewership of my podcast, and all of a sudden, it should be a graph. You know? Like and so we're trying to make sure that, like, the artifacts morph themselves depending on what you're trying to accomplish. That to me is like the real exciting part of Codex.

Romain Huet: 38:18

It's an app that's gonna feel like universal for everyone.

Christoph Magnussen: 38:21

Yeah. I like I I I saw it when I prepared the video that you can even choose your own colors now. I'm like, oh, I remember that moment with Slack back in the day when that So that really gave me the hint that you kind of prepare it to be the the app. Are there any plans or anything you can share in order to make the experience, like, more one again or be more clear and say, like, people say, because you have agents now in ChatGPT, a little more playful. Not my personal style and preference, I have to say, but my people use it quite often.

Christoph Magnussen: 38:52

And then you have this more professional app. And when I explain it like I kind of do you know the German company Bosch? Mhmm. Of course. They have the green Bosch for at home, the small tiny ones.

Christoph Magnussen: 39:02

This is the ChatGPT. It looks nice. You can use it. It's easy. And there there's this blue line.

Christoph Magnussen: 39:08

More powerful, you can carry around, more professional. This is how I explain these two. Is there a plan to, like, more merge it or anything you can share?

Romain Huet: 39:17

Yeah. I think, like, what I can share is, like, you you you now see kind of, like, on one side, we have the great fortune to have like 900,000,000 active users of ChatGPT. And it's the brand that most people have, you know, basically equalized to AI when they think about like a a chat interface to like help them in their daily lives, personal lives, or work lives. On the product that's, like, very powerful, but still essentially, like, connecting to your services online and for which you have a chat interface. And then on the other side, you now have like a very fast growing product with Codex that has like now more than 4,000,000 users on a weekly basis to do like any kind of work.

Romain Huet: 40:03

But really this idea of like delegating task, Codex is like my my teammate doing my work. I can like control all of these agents in one place. And and you see that like on ChatGPT, you mentioned the agents. These are like agents you can share across the team. For now, Codex is still mostly single player.

Romain Huet: 40:21

Also, you know, you can create your agents. You can create your sub agents inside Codex, but you're not quite easily able to share them with your team.

Christoph Magnussen: 40:28

Yeah. We have them in the repo and then access to that.

Romain Huet: 40:32

We should we should do something easier there. So, obviously, like, we wanna make sure that there's a path onto which, like, users could benefit from both sides. Right? And I think there are still many questions in flight, but we want to make sure that like when people are at work or, you know, doing anything on a computer, the power of the Codex app and the momentum we're having with it that's so amazing right now should benefit to many more users. And how do we progressively reveal some of that power to the people who are using ChatGPT?

Romain Huet: 41:05

So it started already behind the scenes with a few steps. Right? Like, the like, of course, the models are now shared, by the way, which was not the case even two months ago. Right? Like, two months ago, we had coding specific lines of models.

Romain Huet: 41:20

We called it like GPT-3-codex. Yeah. Meanwhile, on ChatGPT, you were maybe on GPT 5.2. So this was like a different lines of model. Now we've completely reunified that.

Romain Huet: 41:30

As of GPT 5.4, the same exact models that are like In good. Are in both products.

Christoph Magnussen: 41:36

So we won't see a dedicated 5.5-codex? We

Romain Huet: 41:40

No. We we are now, like, on the main line where, like, all of the reinforcement learning techniques we've done for coding, and it turns out they're also benefiting any kind of task. Also in the same model where we're also making the personality improvements and things like that as a kind of conversational partner. Now the second thing we've also made behind the scenes is that most of the new features, even all new features now in ChatGPT are built on the of this Codex harness I mentioned. So not just the model itself, but like access to all of these tools.

Romain Huet: 42:12

And so we now have this great foundation where like two products are sharing the same models and the same harness. And I think the ultimate goal is to is to think through like, what is the very best, most powerful experience we can give to any user at any given time? And how do we make sure that like someone in ChatGPT, who's trying to accomplish some task, could have the power that we today only have in the Codex app, right? Because it has access to the files and things like this. So we're very much thinking through that.

Romain Huet: 42:44

I think there were a few steps along the way. But we're very excited to see the momentum on Codex, and we're very much thinking about how do we bring that power to many more people.

Christoph Magnussen: 42:52

I'm we I'm definitely part of that as I'm a little bit

Romain Huet: 42:56

You you are. You are. Codex, but it's really good.

Christoph Magnussen: 42:58

It's really good. I have to admit. I I wouldn't say it in a not sponsored podcast video because you may love it every day.

Romain Huet: 43:05

I appreciate that.

Christoph Magnussen: 43:05

I have one last nerd topic I want to cover. I'm thinking a lot about security within the companies and and to the data nerds, there is this topic of SQL injection when it comes to databases saying you separate the data you put into the rows and baskets from the ones that's structuring the database. The term prompt injection which was coined by who was it actually? Simon Willison, I think. I believe so.

Christoph Magnussen: 43:35

Romain Huet: 43:36

talked about it very early.

Christoph Magnussen: 43:38

And means or explaining to the people listening, when the model receives the message, it's not just the user message, but it always Yep. Received the system prompt and the user message as a so called payload together. Mhmm. And this is by design, I think, and it's very hard to change that. Meaning, is the only way to have guardrails in a company that you build for yourself the way to put extra prompts in or are there other ways you put into place to to solve that?

Christoph Magnussen: 44:14

Is there anything that that's known from the research side to work on that?

Romain Huet: 44:20

Yeah. I think there's a few things that that happened. Right? I think like before we had reasoning and the ability for the model to have a chain of thought Mhmm. It meant that like everything inside the prompt from the system prompt to the user message all the way down was gonna lead to the model creating tokens, right, as an output.

Romain Huet: 44:39

And every token was committed to. So there was no going back. Like, this was the response of the model, and you were kind of hoping for the best, and you could maybe inject something in there. What has really, really changed with the introduction of reasoning. Yeah.

Romain Huet: 44:53

So starting with, like, our first series with o1, o3 and then ultimately this entire GPT-5 family, is that the models have a chain of thought. And they're able to kind of like understand the query before committing to an output. And that has drastically lowered already the ability to do like injections and things like this because the model is like, wait a second. Why am I trying to do this thing? Is the user really asking for this?

Romain Huet: 45:18

Because this seems like not the right thing to accomplish. So all of that has already changed like a lot. So so that's one. And I think on top of that, I think when you're kind of like managing a product, a code base, or maybe like just like any kind of task at at a company, we we have like multiple products now that can help. Right?

Romain Huet: 45:39

Like I think, for instance, GPT 5.5 is outstanding at code reviews. So I really recommend companies who have not tried Codex yet or maybe they are using different kind of products already today. I think I usually tell them like, turn on code review. Don't even change all of your tools just yet if you're not convinced, but turn on code review and see what happens. More often than not, they're, like, shocked by what they're discovering.

Christoph Magnussen: 46:03

You posted that on x. Right? That you can now also, put it in your cloud code.

Romain Huet: 46:07

Yeah. I mean, like, why not? Right? Like, that's my team who did this. We thought, well, you know, we look into this fortunate position where, like, we live in this industry where the tools are so great.

Romain Huet: 46:16

Mhmm. And of course, Codex is an amazing tool, but there are some other great tools out there. And so we thought like, there's no point in trying to convince someone who likes their tool today to like force them to change. What I thought was more interesting is the ability for them to have a quick taste of what's on the other side. Right?

Romain Huet: 46:33

So if you're using another tool today, but if you wanna have a taste of Codex, could it find some bugs? Could it find anything compelling in your code base that gives you a taste of what the model can do? Right? Especially something as powerful as GPT 5.5. We've had companies like Datadog, for instance, that kind of actually replayed all of their past incidents with CodeReview.

Romain Huet: 46:57

And they were able to notice, like, that a pretty big fraction of these incidents could have been avoided by simply turning on CodeReview. Codex would just have found these bugs. And I remember this I think it was, like, one incident that would have four would have been avoided, but this was like three months ago. So three generations of GPT-5 models ago, I'm pretty sure by now, like 5.5 most likely would have caught more than half. You know?

Romain Huet: 47:22

And so I think that there are these things. And, you know, we we can also do like with GPT 5.5, we also, like, encourage people to use it for code review, but also for scanning for vulnerabilities in their applications, like making sure that, like, everything is safe and robust. And for the very most sophisticated actors, maybe let's say like in the banking system, if you really want to battle test your your code base and your products, and if for some reason, GPT-5.5 we sometimes refuse to like red team really hard something because you're literally putting yourself in the shoes of a hacker trying to hack your own product. We also have GPT-5.5-cyber, which lets you actually red team. So you have to be a trusted actor on this one.

Christoph Magnussen: 48:12

It's a special product made for that.

Romain Huet: 48:14

Yeah. It's not so much that the capabilities are vastly different. That's not the case. It's more about like the refusals and what can it let you go do Mhmm. To bypass some safeguards because we trusted you as a safe actor.

Romain Huet: 48:26

You're actually trying to red team your own financial system, for instance. So, like, that's that's that's thing. It's we have this, like, now umbrella that you might have seen a week ago like Daybreak.

Christoph Magnussen: 48:36

Daybreak.

Romain Huet: 48:36

So this kind of like collectively is our suite of like security product. So broadly speaking on the theme of security, I think like if we were to summarize, there's like the models are far more intelligent than they used to be. And I think as such, like, their chain of thought and the reasoning capabilities make it much less likely to be like, you know, injecting things in prompts like leading to the right to the wrong outcome. But also when you start turning on a lot of these like security features and really leveraging the models capabilities for your own good, protecting your products, your features, your users, your database, you can really really go a long way to have something that's really robust.

Christoph Magnussen: 49:21

Nice. Very nice. Is there anything that comes to your mind where I have a blind spot that I forgot you to ask and I should foresee it when I think of agents throughout the next year?

Romain Huet: 49:34

Wow. I mean, could talk for hours about this topic, so it's hard to find one. Maybe one thing that has been my kind of like sparkle of like filling the AGI in the past couple weeks that we we could close on given how big of a deal that is, but I think people have not really used it yet. It's this command that we introduce in Codex called slash goal. And I find it quite compelling because it's the simplest thing you could imagine for people listening.

Romain Huet: 50:08

At the core, the idea is to give Codex with this command goal a goal. So what does that mean? Well, you could have an extremely, extremely ambitious goal. Mhmm. And it could be like, hey.

Romain Huet: 50:21

I have these, like, 10 millions of lines of code that have not been touched in the past five years. I have no idea, but I'm pretty sure dependency is out of date. I'm pretty sure there are some security vulnerabilities in there. And I have no engineer to work on that, but I'm pretty worried. Can you go ahead and migrate this whole thing end to end and make sure it's completely polished?

Romain Huet: 50:44

That's the goal. And what will happen next is that GPT 5.5 would relentlessly pursue that goal until it's complete and would not be, like, interrupted along the way. It's very likely it could work for, like, one day, two days, maybe three days or more completely uninterrupted and come back to you with, I'm done. Everything is up to date, and I tested everything.

Christoph Magnussen: 51:08

I switched it on a couple of days ago. It's for for people listening to me, it was a little bit still a hack turning going into the Codex CLI Right. Switching on slash experimental, and then I could choose goals.

Romain Huet: 51:21

Right.

Christoph Magnussen: 51:22

And now I could tell it in the UI. So I'm I'm I'm pitching for you to to have a special toggle Yes. There, switching to goals and see that because what I find interesting, this is something that brings together psychology of humans being able to set goals, which is not an easy thing, to an intelligence that also does not know like, I know when the goal is reached, it needs to be, like, kind of measured. Right? And and I guess the model will then figure with some agents, am I there yet?

Christoph Magnussen: 51:55

Am I there yet? Am I there yet?

Romain Huet: 51:56

And once again, calling their tools to make sure the work is complete correctly. If not, they'll come back They

Christoph Magnussen: 52:00

come back.

Romain Huet: 52:01

And try again.

Christoph Magnussen: 52:01

Okay.

Romain Huet: 52:02

And so we are now making this, like goal obviously very elegant in the Codex app as well. You'll be able to like use goal as like Nice. Without leaving the Codex app. It already works today, but like Yeah. We're gonna make the UI even better.

Romain Huet: 52:14

But what I find fascinating about this is that for people listening, I think it's to me like one of these frontiers where like you're really starting to rethink what these models are able to accomplish. Like, many companies, I'm sure many of those listening, maybe all of them, have a backlog of bugs

Christoph Magnussen: 52:32

Oh, yeah.

Romain Huet: 52:32

That, like, you have you know you have to fix

Christoph Magnussen: 52:35

you never touched because it's just nobody's there can do it.

Romain Huet: 52:37

Imagine there's, like, 50 bugs sitting in your backlog that you know you have to fix, but, you know, they are not that big a deal just yet, so no one was able to prioritize these bugs. Well, you can also connect, as we said, Codex your tools. So maybe it's linear, maybe it's Jira, maybe it's something else. But what if you say one evening, slash goal, Codex, can you pick up all of these bugs that we triaged this week, and they're all flagged with this particular label of, I don't know, UI polish or something like that. Crush them all.

Romain Huet: 53:11

I'm going to bed. I wanna make I wanna wake up tomorrow morning with a bug free application. I mean, are Mhmm. Maybe it's not gonna be flawless yet, but chances are Codex will be able to solve completely a very large majority of these bugs. And I'm like, wow.

Romain Huet: 53:27

This is quite incredible. Like, with the ability in just like one command, is like Mhmm. Take this up. That like Codex will gather the context, traverse the code base multiple times, checks its own work, call some tools and actually deliver that.

Christoph Magnussen: 53:43

I tested it on my Apple's reminders app because I have a list of ideas for later that's beyond 100 points and I gave it like five of these saying, goal, can you kind of do something with that? And it worked quite well. I was quite amazed that it like took the Apple reminders and said, oh, pull it in and I like figure a plan and but I only tested this for a couple of days. I'm not I'm not there yet to say like really deep dive into into that, but that's a good point. I almost forgot about that.

Romain Huet: 54:11

No. And think like one thing that I noticed for myself and I think that I'm sure is the case for most listeners is that those models are so so good that we're still too shy with them. Mhmm. We're still not using them to the the maximum of their abilities. Right?

Romain Huet: 54:28

And I think to me, like, setting an ambitious goal is a reminder, like, damn. Yes. That's true. We can actually send these very complex goals and these very complex tasks. And when you start seeing Codex and GPT 5.5 do so well with those, you're like, okay.

Romain Huet: 54:45

I can trust this model for much more than I thought. I can actually delegate now most of the toil in my day to day, preparing for meetings, catching me up. I can actually prepare customer conversations because I can let Codex do, like, trustworthy research on each of them. Mhmm. I trust the code being written more and more.

Romain Huet: 55:02

And, like, now it frees up time for, like, okay. I wanna be more creative. I wanna think about, like, more problems my customer have. Well, like, what else could I solve for them? And so it's really like freeing up some of your time by building that trust so you can do more on the side, which I find compelling.

Christoph Magnussen: 55:17

So there was I I could go on for hours as you said with you and then continue that conversation. Or we do a second round next year and say where we are there, and then we compare. I would say that would be very interesting.

Romain Huet: 55:28

Yeah. I'm sure it might have have to be, like, sooner than a year. Yeah. Probably. It's the pace of everything changing this fast.

Romain Huet: 55:34

You know, like, I'm sure things are gonna get very exciting in a few months.

Christoph Magnussen: 55:37

Very happy to come to Paris again and and Next

Romain Huet: 55:40

time, I'll meet you. Oh, perfect. Okay. I'm overdue to go to Germany. Okay.

Romain Huet: 55:43

I lost most of my German. I practiced in school, but

Christoph Magnussen: 55:46

Then let's let's get back to that. So thank you very much for giving us that insight and taking the time.

Romain Huet: 55:52

Of course. Yeah. Thanks, Christoph. Amazing conversation.

Christoph Magnussen: 55:55

And for everyone now who's still excited and listening, we are now switching to a demo on you tube and showing how Romain himself is using Codex and you can switch to YouTube for that.

Romain Huet: 56:07

Thank you.

Episode Video

Macher und Gäste

Gastgeber

Christoph Magnussen

Gründer & Geschäftsführender Gesellschafter Blackboat

Gast

Romain Huet

Head of Developer Experience, OpenAI