An interview with AMD's Mike Clark, the Father of Zen — 'Zen Daddy' says 3nm Zen 5 is coming fast; also talks compact cores for desktop chips

AMD
(Image credit: AMD)

We interviewed Mike Clark, AMD's Corporate Fellow Silicon Design Engineer, during the company's recent Tech Day, where it unveiled the Zen 5 microarchitecture that powers the company's Ryzen 9000 and Ryzen AI 300 processors. 

Clark, known as the 'Father of Zen' or, depending on which AMD employee you ask, the 'Zen Daddy,' has worked on AMD's CPU architectures for 31 years. He was the lead architect of the first generation of Zen, which he unveiled at Hot Chips while the company was teetering on the edge of bankruptcy back in 2016. 

Over the last seven years, AMD has unveiled five generations of Zen, each delivering double-digit increases in instructions per clock (IPC) improvement. Clark has led Zen's development through all five generations, with a sixth in the hopper, transforming AMD from a struggling chipmaker to a stock market darling that has now clawed back a significant amount of market share from Intel. Now AMD has nearly twice the market cap of its long-time foe Intel, and the architectures driven by Clark served as the fuel for the incredible turnaround.

AMD

(Image credit: AMD)

AMD's Zen 5 architecture will span both the 4nm and 3nm process nodes, powering the next generation of AMD's entire CPU product stack that spans from desktop and mobile PCs to its EPYC processors for the data center. Designing one cohesive underlying architecture to address all those markets is an incredible engineering feat. AMD is launching the 4nm Zen 5 chips at the end of this month, but it hasn't yet announced the timeline for the 3nm variants. Clark expanded on the challenges of designing Zen 5 for both the 4nm and 3nm processes concurrently, saying the two versions are basically arriving "on top of each other."

AMD has used its compact Zen 'c' cores, smaller cores designed for background tasks much like Intel's E-cores, to reduce cost and boost performance in its laptop processors. However, unlike its competitor, AMD hasn't brought those cores to its mainstream and high-end desktop lineup yet. Zen 5c marks the second iteration of AMD's compact cores, but they are currently not planned for the mainstream Ryzen 9000 family. However, Clark said he thinks compact cores will come to future Ryzen desktop chips and also expanded on the techniques the company uses for its unique implementation.

Intel has famously abandoned injecting hardware acceleration support for high-performance AVX-512 instructions, but AMD's Zen 5 marks the debut of full AVX-512 acceleration for the Ryzen family. Unlike Intel, which has to reduce clock speeds when its processors run AVX-512 workloads, AMD says these powerful instructions will run at the same clock speeds as standard integer operations. Clark also expanded on how the company achieved that feat and said that its Zen 5c cores can also run full AVX-512.

Below is a lightly edited transcript of the key points of our conversation with Clark.

Will Zen 5c 'compact cores' come to high-end desktop PC chips?

AMD's approach to its compact Zen 5c cores is inherently different than Intel's approach with its e-cores. As with Intel's E-cores, AMD's Zen 5c cores are designed to consume less space on a processor die than the 'standard' performance cores while delivering enough performance for less demanding tasks, thus saving power and delivering more compute horsepower per square millimeter than was previously possible (deep dive here). But the similarities end there. Unlike Intel, AMD employs the same microarchitecture and supports the same features with its smaller cores. 

With Zen 5, AMD has also designed the smaller compact cores to deliver nearly the same performance as the larger cores, thus preventing the faster Zen 5 cores from waiting on the compact cores during threaded workloads. Clark said that he expects AMD's compact cores to eventually come to the company's desktop processors, explained that AMD uses a thread placement technique to target certain workloads to the smaller cores, and expanded on how AMD has shrunk its standard cores to create Zen 5c.  

Tom's Hardware (TH): When you view Zen 5c compact cores, do you think they only have a place in power-constrained environments [mobile]? Could you see this coming over to desktop PCs, where power isn't a consideration?

Mike Clark (MC): [...] If we keep building the compact cores in the way that we talked about—which I think we will; I don't know why I said it a little more theoretically—the hard part is really making sure we hit the right frequency point so that it's balanced with however many [cores] you're going to put down. But let's say you're really good at that, then there's no reason not to put a compact core on a desktop.

Whether it's the same performance at a given core count to the customer and cheaper because there's less area used, or we can squeeze in even more cores on a desktop because of the compact cores. And we couldn’t leverage them [performance cores] anyway because they were TDP-constrained when you got out to that many cores, so you may as well have used a compact core. I think as we get more experienced with Windows and see that the scheduling does work, well, I think you'll see us, in desktop, using the compact cores to both get more cores and be more cost-effective. Because it's wasted area [for performance cores] because we can't run everything at that 5.7 GHz frequency.

TH: When using compact cores in a heterogeneous design, do you schedule workloads into those cores using some sort of thread placement?

MC: We don't have any hardware that can magically move cores or make it transparent to software, so we leverage software. We can build a table of capabilities of the different cores and dynamically update that table to give them feedback as things are going on so that they can manage where to place the core for a lightly threaded workload. [...] We expect both the classic cores and the throughput [Zen 5c] cores to keep up at the same level and not be burdened by the throughput core not really having enough compute. The algorithm runs at the order of the slowest cores, so those throughput cores can run at a pretty high frequency so that we can handle true multi-threaded workloads. But then when you have multi-processing, you need to be smart about where you place things.

You should test it. I haven't seen it, but you can run Teams, and you'll see it on the compact cores. You can open up your browser, and it'll go over to the performance cores because you need that burstiness. And then, when you're done, it'll go away; Teams will still stay on those compact cores, and you'll get the best of both worlds.

TH: When you are looking at the standard core and shrinking it down while closely matching the performance capabilities so you don’t have thread dependency problems, how do you achieve that? Denser libraries, closer spacing?

MC: It's more of the latter — the library’s the same. [..] There are sort of logical blocks, and there are even subblocks, but to hit the high frequency in certain critical speed pads, we have to break the design down into small pieces, which we then do custom work on. But at the end of the day, it's a rectangle; things are further apart than they need to be, there's whitespace, and that's all to drive that high frequency. But then we say, ‘Okay, well, lower the max frequency.’ Then, we can combine blocks together; we don't need to do as much custom work, and it can pull the design in. It's now just naturally smaller because we utilize the space more. When it was bigger there's extra logic for repeaters and stuff like that, there’s buffering, and that all gets removed.

It's amazing how much you can shrink the core at whatever target you picked to then find a bunch of area and power to get the squeeze out of it. It was really just because of what we had to do to get that high frequency. Now, you could say, ‘Well, why aren't you better at picking those small bundles? ' But we've been doing that for years, and we can't perfect the smaller blocks. It's just kind of in the nature of the design.

How Zen 5 runs at normal frequencies while running AVX-512 workloads

TH: You mentioned that Zen 5 runs AVX-512 instructions at the same clocks as standard instructions. Intel has struggled with this for a long time, and then they've done all kinds of things, like bifurcating AVX instructions into different classes denoted by power usage. Has Zen 5 employed any notable tweaks to keep the AVX-512 clocks high? What's your secret to success?

MC: Fundamental to what I would call our secret to our success is trying to introduce it at a point where it's more balanced with the rest of the machine. That’s so it doesn't look like such a one-off and so you don't have to treat it as such a one-off, which leads to all those problems. Now, it can obviously burn more power, but so could AVX-256. But it's better that things grow together. If you imagined us trying to put AVX-512 on Zen 2, we had just grown from AVX-128 to AVX-256 at that time. I just have this balance thing; that's what Zen is, and it's just so in balance.

Now, we've learned as well. Even on the integer side, our schedulers burn a lot of power. And so, on both sides, I think a lot of the trick is, and I’m sure Intel's learned this too, is laying out the floor plan in a way that you're cognizant of where hotspots are going to be, knowing also that you never get everything right, so putting in sensors everywhere — but especially where you're worried. We've been good at getting those to work and using our firmware to manage that dynamically so that we can better respond. There are times when we do have to throttle it down because multiple cores are using it, and it’s more TDP-constrained. But that happens on the integer side, too.

TH: So frequencies would be pretty much in lockstep with integer?

MC: It’s just trying to sense it and react to it enough so that it's not, ‘Oh, this one guy [core] did it, and we took everyone down [frequency],’ and it’s not really that serious of a situation. So, it’s a management problem that we’ve grown to understand and deploy across the design, not just for AVX-512.

TH: When we look at the compact cores running AVX-512, do they run that at standard full data path, full 512-bit width, or do they run double-pumped AVX-256?

MC: We can do either. For what we’re launching today in Strix Point, both the performance core and the compact core both have the AVX cut-down [AVX-256] because they're in a heterogeneous situation, and they're in a mobile platform where area is at a premium. 

And while you could argue we could try to have it, we don't want software to have to try to deal with something like that. Even though we cut it down on the performance core, which helps the area, we can have more throughput cores at some level. But we could build a compact core for other markets, and I think you'll see that where we do have the full 512-bit data path as well because it's great for AI and vector workloads, even if it's a more dense design, that doesn't mean it doesn't want great vector performance when it needs it.

The biggest challenge of Zen 5 design

TH: What was the biggest challenge you encountered with Zen 5 development?

MC: It was actually dealing with two technologies [designing Zen 5 for both the 4nm and 3nm process technologies], especially a technology that the previous generation was in. And trying to do so much change, and therefore the unavoidable reality that in 4nm it's going to be [consume] more power than it's going to be in 3nm, no matter how smart we are. 

But we need that flexibility in our roadmap, and it makes sense. But still that was really hard to try to control having the two technologies and the features, and a feature that looks great in 3nm not looking so great in 4nm because of the power impact of the not-as-efficient transistor and how it affects the floorplan. Normally, we do the architecture in one, and then we port on the next one, and then you have a lot of time to deal in the floor plan with the two technologies. [..] It was just really challenging. But that gives Zen 6 a lot of room to improve.

And we're going to deliver 3nm here in short order with 4nm; basically, they're on top of each other. So the design teams are separate in building those, but we're trying to communicate and work together — it is still the same. We've tried to keep it simple for our own sanity. We have all these designs we have to validate and we have to build, and the more they're different, the more things just get out of control. It drives complexity.

That was a challenge, and one we love because, like I said, now that we've done it, we've learned a lot from it. We're going to be able to do it better the next time. That's what makes this job so fun: constantly learning, constantly new challenges, and new innovation.

Paul Alcorn
Managing Editor: News and Emerging Tech

Paul Alcorn is the Managing Editor: News and Emerging Tech for Tom's Hardware US. He also writes news and reviews on CPUs, storage, and enterprise hardware.

  • Giroro
    Does Jim Keller know AMD is trying to give Mike Clark credit as the creator of the Zen Architecture?
    Jim seems like the kind of guy who would sue over that kind of thing.
    Reply
  • PaulAlcorn
    Giroro said:
    Does Jim Keller know AMD is trying to give Mike Clark credit as the creator of the Zen Architecture?
    Jim seems like the kind of guy who would sue over that kind of thing.
    This is a very common misconception. However, Mike Clark was the lead architect from day one. I was in the audience when he introduced Zen 1 at Hot Chips in 2016 as the lead architect. I even interviewed him backstage after the presentation. (Fun fact: I interviewed him in the same green room Steve Jobs used before his most famous presentations).

    https://www.tomshardware.com/news/amd-zen-cpu-microarchitecture,32540.html
    Keller, to my knowledge, worked on the Infinity Fabric. He made contributions there, but was not the lead architect.
    Reply
  • bit_user
    Giroro said:
    Does Jim Keller know AMD is trying to give Mike Clark credit as the creator of the Zen Architecture?
    Yes.
    Ian Cutress: A few people consider you 'The Father of Zen', do you think you’d scribe to that position? Or should that go to somebody else?

    Jim Keller: Perhaps one of the uncles. There were a lot of really great people on Zen. There was a methodology team that was worldwide, the SoC team was partly in Austin and partly in India, the floating-point cache was done in Colorado, the core execution front end was in Austin, the Arm front end was in Sunnyvale, and we had good technical leaders. I was in daily communication for a while with Suzanne Plummer and Steve Hale, who kind of built the front end of the Zen core, and the Colorado team. It was really good people. Mike Clark's a great architect, so we had a lot of fun, and success. Success has a lot of authors - failure has one. So that was a success. Then some teams stepped up - we moved Excavator to the Boston team, where they took over finishing the design and the physical stuff, Harry Fair and his guys did a great job on that. So there were some fairly stressful organizational changes that we did, going through that. The team all came together, so I think there was a lot of camaraderie in it. So I won't claim to be the ‘father’ - I was brought in, you know, as the instigator and the chief nudge, but part architect part transformational leader. That was fun.

    Source: https://www.anandtech.com/show/16762/an-anandtech-interview-with-jim-keller-laziest-person-at-tesla
    Giroro said:
    Jim seems like the kind of guy who would sue over that kind of thing.
    I hope you're being sarcastic, because he seems pretty chill to me.
    Reply
  • bit_user
    Thanks for the interview @PaulAlcorn !

    If anyone is interested in some more in-depth details about the Zen 5 architecture, Chips & Cheese also got some fairly forthcoming answers to their own set of rather in-depth questions:
    https://chipsandcheese.com/2024/07/15/a-video-interview-with-mike-clark-chief-architect-of-zen-at-amd/
    I thought a very interesting revelation was how the dual decoders function, both in single-threaded mode and SMT mode.

    There's also a very interesting point made about how the architecture needed to be overhauled, in order to support the next several generations of cores. It reminded me of another quote from Jim (emphasis added):
    Jim Keller: So when you build a new computer, and Zen was a new computer, there was already work underway. You build in basically a roadmap, so I was thinking about what we were going to do for five years, chip after chip. We did this at Apple too when we built the first big core at Apple - we built big bones . When you make a computer faster, there's two ways to do it - you make the fundamental structure bigger, or you tweak features, and Zen had a big structure. Then there were obvious things to do for several generations to follow. They've been following through on that.

    So at some point, they will have to do another big rewrite and change. I don't know if they started that yet. What we had planned for the architectural performance improvements were fairly large, over a couple of years, and they seem to be doing a great job of executing to that. But I've been out of there for a while - four or five years now.

    Source: https://www.anandtech.com/show/16762/an-anandtech-interview-with-jim-keller-laziest-person-at-tesla
    So, I think it's a pretty safe bet that Zen 5 is that that "big rewrite and change". If you read/watch the C&C interview with Mike, that's sure what it sounds like. Also, considering that Zen 4 was cast as a more minor revision that I think focused mostly on the front end.
    Reply
  • Makaveli
    PaulAlcorn said:
    This is a very common misconception. However, Mike Clark was the lead architect from day one. I was in the audience when he introduced Zen 1 at Hot Chips in 2016 as the lead architect. I even interviewed him backstage after the presentation. (Fun fact: I interviewed him in the same green room Steve Jobs used before his most famous presentations).

    https://www.tomshardware.com/news/amd-zen-cpu-microarchitecture,32540.html
    Keller, to my knowledge, worked on the Infinity Fabric. He made contributions there, but was not the lead architect.
    This is a huge misconception that people just parrot on the internet.

    Thanks for this post to set things straight.

    And AMD stop it I want to go Zen 5 and more so X3D version but if there is going to be a 3nm version.... I may wait.
    Reply
  • bit_user
    Makaveli said:
    This is a huge misconception that people just parrot on the internet.
    I think the issue is that they want one name to attach to Zen. Jim Keller was around for much of its development, and definitely the most well-known among the principals involved, so he ends up getting attached to it.

    His actual title was:
    Corporate VP & Chief Architect
    AMD · Full-time 2012 - 2015 · 3 yrs

    Source: https://www.linkedin.com/in/jimbkeller/details/experience/ (requires sign-in)
    BTW, I think another big misconception is that Zen was instigated by Lisa Su. However, both the Zen project and Jim Keller's tenure pre-dated her role as CEO. I do think she probably deserves some credit for giving it the oxygen it needed to be successful, which included a couple big financial deals, but it was something she inherited rather than helped originate.

    PaulAlcorn said:
    Keller, to my knowledge, worked on the Infinity Fabric. He made contributions there, but was not the lead architect.
    I just took a look through the patents filed by James (B.) Keller, and all the ones at Advanced Micro Devices, Inc. were from his first stint there. So, I think that says he was in a more high-level, leadership role. And if he did contribute any novel ideas, he let the folks under him take the credit.

    Once you've got about a hundred patents, what's a few more?

    Makaveli said:
    And AMD stop it I want to go Zen 5 and more so X3D version but if there is going to be a 3nm version.... I may wait.
    There's always something better, not too far around the corner. You just have to decide when either the need is great enough, or the benefit of upgrading outweighs the costs. The advantages of upgrading sooner is that you get more time to enjoy the upgrade and it leaves more room for improvement between it and your next upgrade.
    Reply
  • usertests
    so you may as well have used a combat core
    I too, want to see combat cores.

    Makaveli said:
    And AMD stop it I want to go Zen 5 and more so X3D version but if there is going to be a 3nm version.... I may wait.
    I don't see any confirmation of another desktop CPU using 3nm, or even Zen 5c.

    What's likely is that Zen 5c chiplets for Epyc are made on 3nm. Anything else, don't hold your breath.
    Reply
  • mitch074
    bit_user said:
    There's always something better, not too far around the corner. You just have to decide when either the need is great enough, or the benefit of upgrading outweighs the costs. The advantages of upgrading sooner is that you get more time to enjoy the upgrade and it leaves more room for improvement between it and your next upgrade.
    True. However, you may be lucky with a generation that lasts "forever" - People who got a Sandy Bridge i7 were content for almost a decade, those with a Kaby Lake desktop much less so (no Win11 support, quad core when things soon required 6 hardware cores). Same, If you got a Ryzen 1600 AF you were good for quite some time, but the earlier models not as much - getting a 2700X in comparison was good ! Zen2 was also a bit of a letdown, being midway between early Zen and the fantastic Zen3. On graphics cards, owners of Radeon HD 48x0 were happy for a long, time, so were those who got a Geforce GTX 8800... Owners of Polaris GPUs were laughed at at the beginning, but after 2 GPU krashes, much less so - and so on.
    Reply
  • bit_user
    mitch074 said:
    you may be lucky with a generation that lasts "forever"
    Yeah, I get that. It's often hard to tell when those products will land, however. Part of it is that something like Alder Lake arrives and seems revolutionary, but then Raptor Lake comes along and really makes the architecture sing. But, now that the reliability concerns started to mount, it's starting to look like Gen 12 really might have been one of those golden generations.

    mitch074 said:
    Zen2 was also a bit of a letdown, being midway between early Zen and the fantastic Zen3.
    Really? I don't recall anyone complaining about Zen 2, at the time. It's just that once Zen 3 launched, its predecessor lost a little of its luster.

    What I'd say is: if someone has an inside scoop on a new product launch that seems like a big leap, and they can afford to wait, then maybe do. But, if all you have to go on is some nebulous tidbits and the successor is decently far in the future, then probably don't put off an upgrade if you really need/want to do it sooner.

    BTW, @Makaveli I'm pretty sure the 3nm product is the Zen 5C chiplet. That explains why they're "basically arriving on top of each other." It wouldn't
    make sense otherwise, as bringing the N4P chiplet to market would mean following through on a product that's almost immediately obsolete.
    https://www.tomshardware.com/pc-components/cpus/amd-announces-3nm-epyc-turin-launching-with-192-cores-and-384-threads-in-second-half-of-2024-54x-faster-than-intel-xeon-in-ai-workload
    Reply
  • Makaveli
    usertests said:
    I too, want to see combat cores.


    I don't see any confirmation of another desktop CPU using 3nm, or even Zen 5c.

    What's likely is that Zen 5c chiplets for Epyc are made on 3nm. Anything else, don't hold your breath.
    He didn't confirm anything but mention Zen 5 was made for both nodes.

    My gut tells me we will see desktop on 3nm in a refresh at sometimes later 2025 :)
    Reply