Fellow AI nerds, beware: Google Cloud glitch leaves Nvidia T4 GPUs off estimated bills for some virtual machines

Wow, cool, they look free to use... *checks invoice the next day* ...They most certainly were not free

happy_computer

If something seems too good to be true, it probably is. There appears to be a bug in the Google Cloud Platform online user interface that may lead engineers into thinking they're renting GPU-accelerated virtual machines for free, when, really, they're not.

Anyone hoodwinked by the glitch will realize the Compute Engine resource is not gratis, and actually costing potentially several hundreds of dollars a month, the next time they look at their cloud bill. It's not going to bankrupt anyone, but it is something that you may trip up over, so consider this a heads up. You may even encounter a similar gremlin in future, or on another cloud platform. It can happen.

We found out about the bug from Soufian Salim, an AI engineer at French software startup Bee4win, who wanted to train a neural network model using Nvidia’s spanking new Tesla T4 GPUs. He had spun up a virtual machine instance from the Google Cloud marketplace – specifically, the AI Platform Deep Learning VM Image – and configured it to use a bunch of T4s to speed up calculations.

Here’s a screenshot of his settings that he shared with The Register:

gpu_bug

Virtual machine settings ... Image credit: Soufian Salim
Click to enlarge any screenshot

You can see that he’s configured the instance to use two CPUs with 13GB RAM, and four Tesla T4 GPUs, all for a reasonable price of $64.50 per month. The estimated charges tallied up on the right include the costs of using Google-hosted CPUs, memory, and a discount for being a frequent cloud customer, but there is no mention of the T4 GPUs. These should be listed on the page as they normally cost between $0.29 and $0.95 per hour per GPU, depending on the configuration and minus any discounts. Instead, nothing's shown, so they're free, er, right?

“Normally, a T4 should cost several hundred dollars [per month],” Salim told The Register earlier this week. “I think it's probably a [user interface] bug, according to documentation it should not be free. But I'm not 100 per cent sure. I have sent a bug report to the Google Cloud team."

We tried to spin up our own deep-learning instance using the same settings, and found the same glitch, too. When we tried to use a different GPU model such as Nvidia's V100, a charge popped up on the estimated bill for the cloud-based hardware, and when we switched back to a T4, it disappeared again, as you can see from the screenshots below. That would suggest the T4s are free to rent, whereas the V100s are not.

Screenshot of Tesla V100 GPU instance on GCP

Cost of V100 estimated on the right...

Screenshot of an allegedly free Tesla T4 GPU instance on GCP

...but zilch when we used a T4

Google does offer free T4s on its Colaboratory platform. Here, developers can run specific AI models in Jupyter notebooks using Google’s cloud resources at no charge at all. Salim said his model wasn’t using the Colab service, however, and neither was our model. We're also aware that you can use T4s for free via certain non-AI promotions, such as within a BlazingSQL Colaboratory environment. These aren't production environments, though, and are for testing purposes, hence the freebie GPUs.

Salim told The Register he was training a neural network based on Google’s BERT language model and deployed his virtual server on Tuesday morning. It was left running for over a day and a half, and it seemed to be behaving normally, he said.

A bug too good to be true

Salim's inkling that it was simply a user interface bug, and that he would eventually be billed by a backend system for the T4 GPUs, was later confirmed when he discovered that Bee4win was indeed charged for the rented graphics processors despite it not appearing on the estimated costs. "As I suspected, we were billed for the GPU, at about 0.9$/hour. It was an UI error," he told The Register on Thursday.

And a day after spinning up our own virtual machine, we checked our Google Cloud billing page, and found, yup, El Reg would be charged, too, for the T4s. Rather than be able to gleefully announce to the world that we'd found a way to get free Nvidia Tesla GPUs in production cloud instances, we found the hourly costs had caught up with us. Instead of quaffing martinis on expenses tonight, we shall instead be drinking tap water as, well, the money's gone on Google Cloud.

So, if you happen to see the same bug don't be tempted to spin up more T4 GPUs just because they look like they might be free, because they're not.

The problem hasn't been fixed, so be warned. A Google spokesperson told us on Friday: "We are aware that some customers are not seeing estimated charges for T4 GPUs on the marketplace web interface before creating virtual machines, and we are working to fix the pricing estimator." ®

Sponsored: How to get more from MicroStrategy by optimising your data stack

SUBSCRIBE TO OUR WEEKLY TECH NEWSLETTER




Biting the hand that feeds IT © 1998–2019