Cisco Optics Podcast Ep 61. Why some optics are good for AI and some aren’t (3/5)
Cisco Optics PodcastMarch 17, 202500:12:0316.62 MB

Cisco Optics Podcast Ep 61. Why some optics are good for AI and some aren’t (3/5)

I’m sure it’s no surprise to you that AI has been steadily changing the world, but did you know that optics is a key part of its hardware infrastructure? To explain it, fortunately we have a seasoned product manager who knows both the switching side and the optics side. Lucky for us, he sits next to me at the office and agreed to chat about it.

In Episode 61, we continue our conversation with Paymon Mogharabi, Senior Product Manager at Cisco’s Optics team, also known as the Transceiver Modules Group. We go into more detail about AI datacenter hardware architectures and Ethernet.

Paymon Mogharabi is a networking industry and Cisco veteran of nearly three decades with Electrical Engineering degrees from UC Irvine and USC. After starting at Cisco as a Technical Assistance Center engineer, he became a Technical marketing Engineer for Cisco's Catalyst switches. He then took product management positions for Cisco's Edge Services Router, Nexus data center switches, and UCS server products. He is now a Senior Product Manager in Cisco's Transceiver Modules Group and has sat next to me for the past 7 years, focusing on data center applications.

Related links
Cisco Optics-to-Device Compatibility Matrix: https://tmgmatrix.cisco.com/
Cisco Optics-to-Optics Interoperability Matrix: https://tmgmatrix.cisco.com/iop
Cisco Optics Product Information: https://copi.cisco.com/

Additional resources
Cisco Optics Podcast: https://optics.podcastpage.io/
Blog: https://blogs.cisco.com/tag/ciscoopticsblog
Cisco Optics YouTube playlist: http://cs.co/9008BlQen
Cisco Optics landing page: cisco.com/go/optics

Music credits
Sunny Morning by FSM Team | https://www.free-stock-music.com/artist.fsm-team.html
Upbeat by Mixaund | https://mixaund.bandcamp.com

[00:00:08] Hello everyone and welcome back to the Cisco Optics Podcast where we talk about pluggable optics for networks. I'm sure it's no surprise to you that AI has been steadily changing the world, but did you know that optics is a key part of its hardware infrastructure? To explain it, fortunately we have a seasoned product manager who knows both the switching side and the optics side. And lucky for us, he sits right next to me at the office and agreed to chat about it.

[00:00:34] In episode 61, we continue a conversation with Paymon Mogherabi, Senior Product Manager at Cisco's Optics team, also known as the Transceiver Modules Group. We go into more detail about AI data center hardware architectures and Ethernet. Paymon Mogherabi is a network industry and Cisco veteran of nearly three decades with electrical engineering degrees from UC Irvine and USC. After starting at Cisco as a Technical Assistance Center engineer, he became a technical marketing engineer for Cisco's Catalyst switches.

[00:01:03] He then took product management positions for Cisco's Edge Services router, Nexus data center switches, and UCS server products. He is now a Senior Product Manager in Cisco's Transceiver Modules Group, and has sat next to me for the past seven years focusing on data center applications. And now join me as I talk with Paymon Mogherabi.

[00:01:22] Where am I going to spend my money? It's going to be majority in the compute, but also a good percentage in the network, and a good percentage of that will be optics.

[00:01:56] Can we back up a step? Because you were starting to talk about the east-west traffic, but I've seen diagrams where they have like a front end and a back end area. Can you explain that? Sure. So on the back end side, these are primarily GPU-based compute nodes, and this is all internal. They're not talking to the outside world.

[00:02:23] But ultimately, you do have to have some connection between your back end to the outside world. And that's where the front end of the network comes in. And that's where the north-south traffic is. So they're obviously not communicating any compute operations to the outside world. Are they just communicating results after they've finished a job? Are they just transmitting the results to the outside world?

[00:02:52] In a sense, a portion, yeah. It's the results that go to the outside world, yeah. Okay. So the training would happen at the back end. Perhaps maybe the inference, those areas might be more involved on the front end. Okay. So everything you were just talking about, was that specific to the back end or does that apply to the front end as well?

[00:03:19] It's back end. So all that infrastructure that I mentioned. Now, of course, the front end does come into the picture, but the bulk of what I talked about was on the back end side. Because that's what's really changed from a traditional computer environment to an AI environment. You're really looking at the back end becoming a bigger, I guess, becoming a more critical element.

[00:03:47] That's where the workloads are getting, that's where the workloads reside. Okay. So back to the bottleneck question. I never understood. Where is the bottleneck? Is it the power consumption? A bottleneck would be on the bandwidth side. So I have, let's say I have a back end network and I have my clusters. They have to communicate with each other.

[00:04:15] If they have to traverse a network, that means they have to leave the NIC, go to the switch, in some cases even go all the way up to the spine and come back to the switch.

[00:04:25] So these latencies become a factor. So that if you have to constantly do this, either your design is not ideal or you're just, your rack densities are in a way where you have no choice but to spread out your clusters among many, many racks.

[00:04:50] The more racks. The more racks you spread this, the more you have to be, you have to traverse the network to get to the adjacent racks and on and on. And that's where I think, I do want to point out that today, when you look at the back end network, it's primarily InfiniBand.

[00:05:10] It's all about InfiniBand. There are some reasons. There are the incumbent. It's the incumbent technology that goes back to the HPC world, but it is the incumbent technology. At the same time, it is lossless. It's extremely fast. It's lossless. But the transition is happening. The transition is happening towards Ethernet.

[00:05:38] What's the basic difference between InfiniBand and Ethernet? Well, first off, InfiniBand is a closed architecture. It will not interoperate with the Ethernet world. It is strictly, in a sense, it's proprietary because it is one vendor that offers that. It's not very ecosystem friendly because you are bound by that. I see.

[00:06:08] And it's very expensive. It's very expensive compared to Ethernet. It's very expensive and it's understandable because you are bound to one vendor, one vendor. As you get into the Ethernet world, Ethernet has some catching up to do. There are some areas that Ethernet is catching up.

[00:06:32] But at the end of the day, there will be, of course, a need for InfiniBand. And at the same time, you will see transition source Ethernet, whether it's because of cost reasons, whether it's because of interoperability reasons. Or, you know, at the end of the day, you want to be able to have control over your infrastructure and not be bound to one vendor.

[00:07:02] Well, okay. So Ethernet, I guess, Ethernet is IEEE defined, right? Yeah. So you get that ecosystem. Mm-hmm. I can totally see that just the inertia is one reason to, you know, that it's not, that Ethernet hasn't like taken over overnight.

[00:07:24] What are some of the things that the Ethernet folks are doing in order to make Ethernet more appropriate for this application? So as much as I know, I mean, they are, so there is Rocky version 2 or that's RDMA over converged Ethernet. That's, so Ethernet uses that as a way to kind of become on par with InfiniBand. Mm-hmm. Mm-hmm.

[00:07:55] The other areas, maybe on the lossless, they may be, so in a nutshell, there is an ultra, I think it's the ultra performance Ethernet forum. Oh, okay.

[00:08:15] And what they are doing in parallel is to create an Ethernet solution that is on par or even exceeds what InfiniBand does at a lower cost structure, cost point. Oh, okay. Can we talk about the properties? Like you said lossless is kind of a big deal for AI, right? Right.

[00:08:45] You mentioned latency earlier. Correct. Latency, lossless, there's bandwidth, there's security, there's power. So lossless, latency, and those things are not, in the pre-AI Ethernet standards, were those not prioritized? In the case of lossless, I would say in the storage world, fiber channel, it was critical. Oh, okay.

[00:09:15] Yeah. But in the Ethernet world, not as much, I mean, there's retransmissions that happen. Right. Right. But in the case of fiber channel storage, it's lossless is absolute requirement. That much I can say. So why isn't fiber channel then considered as an option for these back end AI clusters?

[00:09:42] One thing I can say is that, at least on the fiber channel, as far as I know, it's the max that they can go today is 64 gig. Oh. So they have 8, 16, 32, 64. And usually it's a, they'll go back three speeds. That's 64, 32, 16. And that's total bandwidth. It's not like per lane. Correct. Yeah. Yeah. But that clearly when it comes to the- It's not enough. It's not, it's nowhere enough.

[00:10:12] Yeah. Okay. And that's, that's as far as my background or knowledge goes when it comes to fiber channel. Okay. Yeah. I've always, I've always been an outsider in that world also. You had mentioned something about, was it an overview of the AI?

[00:10:50] That was the third part of my conversation with Payman Mogherabi. Next time we'll get into smart NICs and future growth affecting optics. We have a new website. It's called optics.podcastpage.io. You can either listen there or use the same podcast platform you've been using all along. Please subscribe. Better yet, leave a review, especially if you use Apple Podcast. Remember, we're part of the Cisco Podcast Network where you can find other great Cisco podcasts too. We also have educational videos on YouTube.

[00:11:20] Just go to youtube.com and search on Cisco Optics. Thank you for listening. This is Pat Chow in technical marketing at Cisco Optics. The next episode is part four of my conversation with Payman Mogherabi. Until next time. I'll see you next time.