7 Comments
User's avatar
Stephen Fossey's avatar

There might be additional benefits to Data Center operators from adjusting chip clocking. Over-clocking a GPU is generally thought to decrease lifetime. For example: https://techreviewadvisor.com/how-overclocking-affects-your-gpus-lifespan/

Expand full comment
Daniel King's avatar

Another thought, Stephen:

Is there an underlying claim that wear and tear gets *so bad* that the chip will soon have to reduce voltage or clock frequency *anyway* (because transistor precision has been compromised)?

Expand full comment
Daniel King's avatar

This is an excellent point, with at least a few implications or corollaries to keep in mind.

1) It can be perfectly rational to sacrifice the “wall clock” lifespan of a GPU if that means getting more FLOPs out of it.

2) NVIDIA’s tech improves so fast that by the tail end of a chip’s life, it may be preferable simply to use the new product.

3) A business advantage goes to the compute provider who’s willing to serve first and serve up those workloads fast, so I’ll be curious to see the lengths to which DCs go, even if/when that means wearing GPUs harder.

Expand full comment
Stephen Fossey's avatar

That question is beyond my expertise. I just know high temperatures are bad for lifetimes and temperature cycling is bad because in general there’s a coefficient of thermal expansion mismatch between the chips and everything else like heat sinks.

Expand full comment
F. Ichiro Gifford's avatar

Oh hey, a fellow traveler in AI-energy policy!

Expand full comment
Daniel King's avatar

Giddy to bump into the one and only Dr. Gifford out here in the wild.

Expand full comment
Daniel King's avatar

Aye aye, captain ⚓️🫡

Expand full comment