Robin and the Black Forest Labs team built one of the most globally downloaded open source AI models from their hometown in Germany. After his PhD, Robin could have gone anywhere, but he chose Freiburg (the sunniest town in Germany, for what it's worth) to build with focus over hype.
They recently released their next model family, Flux 2, capable of image synthesis, text-based editing, and for the first time, multi-reference image editing. This allows users to combine images from multiple sources and modify them with a single text prompt.
Alexandre caught up with Robin at Slush to talk about training runs at 2am, why unconventional choices matter, and building outside the noise.
Good to see you. Good to see you. Yeah. Last year I interviewed Robin here in Helsinki. So a lot has changed. What's been new with you? What has changed, I mean. We've grown up as a company. We, we're still like small, we're 50 people. But I mean, it feels, it feels so intense, it feels so insane. We have released more models In the meantime, we've started an office in San Francisco and we are working towards our next release. It's our next base model family of models. It will be able to do image synthesis, image editing based on text instructions and for the first time real really good like multi reference image editing, which means you are able to combine multiple images and then modify them. Based on the text prompt, yeah, when it's 2:00 AM and you guys are training these new models and these new paradigms and you're looking at your loss curve, like what is it that you're actually looking for and what is that feeling actually feel like what I ended up doing quite a lot and. I don't know when I did my PhD in this, in this field was I used to code in the evenings and then you were happy to get like a training run started on the GPUs. Then you went to sleep and then in the morning, because this takes a while, right? It takes multiple hours to train in the morning. You get up. First thing is you do is you log in and you check the training logs so that the images that the model produces. And then immediately you see whether it works or not. Because that's a nice part actually about visual AI. It's such an intuitive thing. And you immediately, OK, this is working. We wipe quite a lot. We have like our internal wipe channel basically where we I mean and that's a huge release criteria, right like you. If the wipes are not right and you don't realize the thing, what's the favorite thing that you've actually built on flux or generated out of flux? I'll show you. Yeah, it's my ex profile pic. This is a gal wearing virtual reality goggles. And I just like how it like, I don't know, made the beak. And integrated well into like the the Google. So one thing I like in general about not being in San Francisco, but in Fryeburg is you're out of the total hype zone, right? I actually like this ability to focus and I don't know, not have to be at certain events all the time. So that can be quite distracting. Last but not least, we're in Helsinki again, there's a lot of founders here. What's your message to all of them? Do unconventional things like starting a. Company out of your hometown. And do it more and I think. Yeah, be willing to. Embrace risk taking. That's an awesome message channel. Thank you so much for doing this. Congrats on everything so far and I'm really looking forward to seeing the model. Thank you.