Oracle 7400 storage drags down cloud storage firm
Customers: We've been down for weeks. Supplier: No you haven't.
Utility IT service supplier Flexiant has blamed intermittently slow service over the past two weeks on slow accesses to disks on an Oracle ZFS-based 7400 storage array.
There has been slow communication between virtual machines on the servers and the 7400 storage. It is being said that some customers' servers have effectively been unusable for a week and a half and that this is an outage, not a slowdown.
A note on the Flexiant website says: "Customers on the FlexiScale platform are currently suffering intermittent slow disk I/O … The cause of the intermittent slow I/O has been isolated to periodic gradual increases in latency on our storage platform. These increases take place over a period of approximately one hour, after which the platform returns to normal. Currently we are seeing these occur around twice a day. The problem was raised with our storage vendor, Oracle, two weeks ago when it was happening far less frequently."
Since then Oracle has said a firmware bug had caused it and that the storage adapter cards were old. Flexiant carried out these Oracle-suggested upgrades on 29 October. It made a limited difference:
Unfortunately, whilst the upgrade has resulted in a reduction in the severity of the slowdowns, it appears not to have fixed the underlying issue; this implies that the underlying cause was not that previously identified by Oracle. The issue is at a ‘P1’ level within Oracle (their top priority for customer issues) and is currently being looked at by their engineering department. The two companies are working closely together to identify the root cause of the problem and to remedy the situation.
Flexiant is based in Livingston, Scotland and provides cloud infrastructure software and services for hosting providers, data centre owners and telecommunications operators. Its flagship product is Extility, a licensed virtualisation platform, and FlexiScale, its public cloud offering, uses it.
CEO Alex Bligh said he was not aware of any user saying their servers had been unusable for a week and a half: "One large user has reported occasional periods of 20 minutes when I/O slows down. To say it has been lasting for two weeks would be untrue. I'm not aware of any user who has raised a [support] ticket saying they have had problems for two weeks."
Anyone with problems is encouraged to raise a support ticket. ®
Sponsored: Benefits from the lessons learned in HPC