Oracle 7400 storage drags down cloud storage firm
Customers: We've been down for weeks. Supplier: No you haven't.
Magic Quadrant for Enterprise Backup/Recovery
Utility IT service supplier Flexiant has blamed intermittently slow service over the past two weeks on slow accesses to disks on an Oracle ZFS-based 7400 storage array.
There has been slow communication between virtual machines on the servers and the 7400 storage. It is being said that some customers' servers have effectively been unusable for a week and a half and that this is an outage, not a slowdown.
A note on the Flexiant website says: "Customers on the FlexiScale platform are currently suffering intermittent slow disk I/O … The cause of the intermittent slow I/O has been isolated to periodic gradual increases in latency on our storage platform. These increases take place over a period of approximately one hour, after which the platform returns to normal. Currently we are seeing these occur around twice a day. The problem was raised with our storage vendor, Oracle, two weeks ago when it was happening far less frequently."
Since then Oracle has said a firmware bug had caused it and that the storage adapter cards were old. Flexiant carried out these Oracle-suggested upgrades on 29 October. It made a limited difference:
Unfortunately, whilst the upgrade has resulted in a reduction in the severity of the slowdowns, it appears not to have fixed the underlying issue; this implies that the underlying cause was not that previously identified by Oracle. The issue is at a ‘P1’ level within Oracle (their top priority for customer issues) and is currently being looked at by their engineering department. The two companies are working closely together to identify the root cause of the problem and to remedy the situation.
Flexiant is based in Livingston, Scotland and provides cloud infrastructure software and services for hosting providers, data centre owners and telecommunications operators. Its flagship product is Extility, a licensed virtualisation platform, and FlexiScale, its public cloud offering, uses it.
CEO Alex Bligh said he was not aware of any user saying their servers had been unusable for a week and a half: "One large user has reported occasional periods of 20 minutes when I/O slows down. To say it has been lasting for two weeks would be untrue. I'm not aware of any user who has raised a [support] ticket saying they have had problems for two weeks."
Anyone with problems is encouraged to raise a support ticket. ®
COMMENTS
Quick get the Fishworks team to look at it!
Oh they left didn't they?
Run, forrest, run .....
This story is basically confirming/telling us that "2 weeks downtime" is never heard of in this industry until the Oracle/Sun 7000 series came to life.
Personally I believe this claim because I think zfs is still not mature enough and we personally
experienced a specific issue on 7410C that literally down for 3 days just trying to delete a dedup enabled LUN, yes you heard me right, just one click to delete a LUN could bring the 7410 cluster down for 3 days, that is why I believe it could cause 2 weeks down time for other end user....
I guess the end results will again point to zfs module (akd)...... then another software upgrade with new MAJOR bugs waiting to be found....
And then the sales team will tell you that you should never expect a 99.8% up time for this kind of product, you should set your expectation to somewhere near five Eight ( 88.888%) instead of five Nine....
By the way, we are off 7410C and onto NetApp since.
7410 and iSCSI do not mix well
From the specs on their website...."These provide iSCSI LUNs to the end-user operating systems, looking just like physical disks".
In our experience the 7410 ZFS external slog on SSD (logzilla), which is designed to accelerate and quickly acknowledge sync writes, quickly gets overrun by iSCSI write traffic and then spends all of it's time flushing to disk and slowing everything down.
Tread carefully with these devices and what part of the business you are willing to bet on them.

IT infrastructure monitoring strategies
Requirements Checklist for Choosing a Cloud Backup and Recovery Service Provider
Data control in the cloud
Cloud based data management
Enabling efficient data center monitoring