Student cluster-wrestlers face off in HPC battle: You and whose army? Um, China's

Yep, the people's Liberation Army is fielding a university team

HPC Blog With 20 university teams, Asia Supercomputer Community 2017 is the largest student cluster competition in the world. So it’s only natural that this story, which will give you a chance to meet the teams via video, will be the longest student cluster competition story in history.

In the videos, we’re talking to the kids on the first day of the competition. You’ll see essentially three separate moods in the videos:

1) Giddy optimism: everything is still possible, and the road to victory is ahead of them.

2) Guarded optimism: they’ve seen enough to know that this competition is really tough and they’re now a bit more wary. Still optimistic, but they know it’s going to be a slog to cross the finish line.

3) Depression/resignation: the team has seen some problems and knows that they’ve fallen behind. While they still might be able to catch up, reality is sinking in, and they’re starting to become philosophical about the experience. Now let’s take a look at the teams, in no particular order, and see where they are…

Beihang University: After a bit of debate in the video, we decide that this is the third time that Team Beihang has competed in a student cluster competition.

As some of you might know, the top two Chinese teams in this competition get an automatic entry into the ISC17 cluster competition in Germany. But Beihang hasn’t finished in the top two slots yet, so they haven’t made this Teutonic trip. But maybe the third time is the charm, right?

Hong Kong Baptist University: This is one of the more fun teams in the competition. This is their second outing and one of the participants in that first competition in 2016 is now the coach – which is a nice bit of continuity. The team is feeling good at this stage of the competition, having just finished LINPACK. They’re a little bit disappointed in their LINPACK score, feeling they could have done better, but also feeling that they can return to Hong Kong with their heads held high.

We also talk to the team’s PaddlePaddle (Parallel Distributed Deep Learning with a few extra capitals added) expert, which is one of the toughest apps in the competition. Students have to build their own deep learning model of traffic and provide a prediction for traffic on a future date. Very interesting stuff.

Zhengzhou University: This is the first time this university is competing at an ASC (or any) cluster competition. I thought I had seen them before, but I was wrong – it was another “Z” team. When we’re talking to the team, they’re having trouble with their cluster, but I can’t figure out if it’s hardware or software trouble.

I’m thinking that it’s software trouble since three of their five nodes are down. These types of problems are somewhat typical at these competitions, with at least a team or two always having some sort of problem that leaves them crippled compute-wise. Hopefully they’ll get fixed up and back in the race.

Weifang University: is another first time team at the ASC competition. One thing I can tell you is that these guys are workers. As we see in the video, they’ve skipped meals in order to keep lashing their cluster to ever higher performance. That’s impressive.

However, they’re a small and young school without much of a reputation in HPC or the world of student clustering. Can they perform under pressure? Can they make a name for themselves at ASC17? Only time will tell.

University of Warsaw: Warsaw is yet another new team to cluster competitions. What they don’t have in experience, they make up for in personality. These kids were plenty of fun at the competition. When we first meet them, it’s early on the first day of the tourney. All is going well for the team, the hardware is working fine and they have all of their software. They sidestepped the problems caused by a not-so-robust facility internet connection by bringing all of their software on thumb drives just in case – nice move.

They also ran into a problem with the Chinese documentation of the MANSUM app, but overcame that as well. They’re a hardworking and optimistic bunch of guys. We’ll see if this soul crushingly brutal competition beats that out of them.

Ural Federal University: This is the fourth time the team from Ural has competed at an ASC event. They’re a plucky team that has finished middle of the pack in previous competitions. In the video we discuss their current progress on the competition (they finished HPL early and were working on HPCG). They’re seeing some optimization problems on HPCG and probably wouldn’t be able to submit until later in the day.

The team is driving a five node cluster and, for the first time, GPUs – eight P100s to be exact. They were very pleasantly surprised to have the P100s and expect to see great results from them.

Taiyuan University of Technology: This is another veteran team, with the university having participated in the ASC competition a total of four times. However, most of the members of the team are new to cluster battle. The team has been preparing for six months, which I joke about in the video (it fell completely flat, of course, due to the language barrier.)

The team doesn’t quite know what to expect difficulty wise in the competition since it’s their first time competing. So we’ll see what happens.

Sun Yat-Sen University: The team is currently testing their system when we catch up to them. They’re a little anxious about the applications and how their hardware is going to perform. They expect to see the most challenge with Paddle Paddle, the AI application that requires them to build their own model and teach it how to evaluate and predict traffic patterns.

The team is pushing iron with a cluster that has eight nodes with a double brace (8) of P100 accelerators, but they’re not sure how they’re going to use the GPUs at filming time.

Saint Petersburg State University: they are a brand new team from Russia, at least from a student standpoint. The team is led by an advisor who was the captain of their team at ASC13, which will be helpful. In the video, you’ll see me spotlight the young woman in charge of LAAMPS, which seems to have given her a bleak outlook on life. We later became good buddies, particularly after the drinking started at the closing banquet.

St Petersburg even has a designated funny guy on the team, whose main job is to keep things light and make them all laugh. In addition to this job, he is also in charge of the hardware and helping out on software tasks. /p>

Shanghai Jiao Tong University: This is the sixth ASC competition for SJTU, although most of the team has only competed once or twice before. When we catch up to them, they’re still ironing out some problems with their cluster hardware and software, but seem to have a good handle on what to do. They seem pretty confident on the applications, with the possible exception of Falcon.

The team is fuelled by six compute nodes (plus a head node), backed by eight P100 accelerators. They have an above average amount of memory, with 256GB per compute node, which might give them an edge. We’ll see.

PLA Information Engineering University: I didn’t know that the PLA had its own universities, so this entry is a surprise for me. In the video, we discuss the tyranny of the time limit the students have on completing the benchmarks. You’ll also see me cracking myself up by asking my translator to ask them if they have my social security number, or if they could give me a list of my old passwords. How smart is it for me to joke around with the PLA? Probably not very smart.

We then discuss their use of NVIDIA K20s, which are almost older than my translator. The video ends with me posing with the team for a group picture, very touching stuff.

Northwestern Polytechnical University: This is another university that has participated multiple times in at the ASC competition. As can be heard in the video, it’s damned loud in that room, which makes it tough for us to communicate.

The team and I discuss their configuration, how they’re aiming for LINPACK with their shiny new P100s, and the advantages of the P100s vs. the K80s. This could be a team that’s ready to make a breakthrough.

National Tsing Hua University: This team from Taiwan is one that is familiar to cluster competition enthusiasts worldwide. They’ve jousted on the biggest cluster competition stages, in multiple SC events in the US, and multiple times at the ASC tourneys.

In the video, I interview a student who is making a strong statement with his hair – one that he says he can back up with his LAAMPS expertise. The team seems to be doing well, and who wouldn’t be doing well if you have eight Xeon nodes plus eight P100 accelerators. Our pal is also in charge of the mystery application, which he thinks is "something like ParaView", which is a visualization application. In reality, the mystery application is Saturne, a fluid dynamics app. But it’s OK, at this point in the competition, he has some time to get up to speed on the Saturne before he has to run it.

When we visit Team FAU, they’re in the midst of big problems. They can’t seem to get their InfiniBand working...

University of Miskolc: This team is from Hungary and they’ve competed in three previous competitions. While they haven’t finished in the money yet, they’re definitely improving their game. When we visit them, they’re working on various applications, including LAAMPS and Saturne. The team managed to hit 7 Tflop/s in their HPL, which is a pretty good score for a non-accelerated system. As we discuss in the video, it’s high time that this team gets some badass accelerators and goes to town with them.

The team is definitely challenged by the applications on the other platforms: MASNUM on the Sunway system and LAAMPS on the KNL system. But they seem to have their problems under control and are moving ahead.

Huazhong University of Science & Technology: We start the video with me commenting on the white peripheral high performance keyboard that one of the students is using. While this university has participated in a lot of cluster competitions, this team is entirely built with freshmen who are new to the world of high-end student clustering. The kids think that MASNUM is probably the hardest application this year, since it’s running on the world’s largest supercomputer (Sunway) which has an entirely different architecture than anything they’ve seen before.

The team is running an old school traditional CPU-only cluster, which will hurt them in LINPACK, but might be ok on the rest of the applications. We’ll see if they can return to their former glory.

Friedrich-Alexander Universitat: This team, known as Team FAU, hails from Germany and has competed in various previous ASC and SC competitions. Strangely enough, they have yet to compete in their home country ISC competition, but maybe we’ll see them this year.

When we visit Team FAU, they’re in the midst of big problems. They can’t seem to get their InfiniBand working, which has major negative ramifications for the team. Without InfiniBand, they can’t do RDMA, meaning they can’t get performance out of their GPUs. It also means they can’t optimize the remaining applications. For now, they’re using Ethernet, and, not surprisingly, getting horrible results. As I say in the video, it saddens me to see so much hardware doing so little. It’s the kind of sadness that sticks with a guy. Hopefully they’ll turn things around and get back in the game.

Dalian University of Technology: This is the second time out for the team from Dalian University, but most of the students are new , with only one veteran anchoring the squad. So far so good for the team, and they’re performing very well according to their spokesperson. While not predicting a win, she says that she’s confident that they’ll do their best and will hopefully come up with competitive results.

This could be another team that’s ready to break through to elite status. They did a pretty solid job at ASC16, particularly on LINPACK, and a further year of seasoning could only help.

Now that you’ve met the teams, next up will be our mid-competition team check-in, followed by our analysis of the results and pictures/videos from the gala awards ceremony. Stay tuned for more ASC17 action….

