The Register® — Biting the hand that feeds IT

Comments on: ScaleMP creates $10k four-socket box out of thin air

errr... interbox latency?? 

Posted Tuesday 1st April 2008 17:55 GMT

spinlocks? Cache-line ping-pong with microsecond overhead?

Well, at least it's not over TCP <snurk>

to anonymous coward 

Posted Tuesday 1st April 2008 19:51 GMT

Sensible question, but It's a hybrid NUMA/COMA architercture, so the latency is hidden.

Best MPI latency is about 1us 

Posted Tuesday 1st April 2008 23:58 GMT

The best Mellanox ConnectX MPI latency I have seen is about 1.2 microseconds with a switch in between. The switch adds about 200 nanoseconds to the mix, so back to back (the configuration of the ScaleMP system) would be about 1 microsecond, assuming similar performance to MPI.

Certainly NUMA memory management in the OS helps, and I assume the COMA aspects are provided by the hypervisor. This could lead to some unhelpful double buffering, unless the OS kernel is aware of the underlying COMA caching.

This system sounds similar to what Virtual Iron was pitching in the past, but abandoned. Is there a relationship between ScaleMP and Virtual Iron??

Also, is this InfiniBand based approach what is under the covers of the larger ScaleMP systems?

Magellan 

Posted Wednesday 2nd April 2008 06:12 GMT

Nothing to do with Virtual Iron, but the key is that becasue it isn't a standard NUMA architecture such as SGI Altix thelatency question is not relevant. The COMA memory chunk, caches Gigabytes of data. Predictive algorithms prefetch data blocks so that when computation is needed they are already resident in memory, hence no IB latency issue. The net issue is MPIcodesat the same speed as an IB cluster, support for large shared memory jobs, without cluster management complexity.

As far as the IB is converned the system uses standard Mellanox IB, including ConnectX if it shipping yet.

The larger, 8-32 socket system uses an internal IB switch to connect servers together.