SaaS

This article is more than 1 year old

Salesfarce to Failsforce: Salesforce database blunder outage enters day three as fix falters

El Reg tunes into customer conference calls to hear SVP of engineering apologize

Mon 20 May 2019 // 19:25 UTC

Three days on, Salesforce.com has yet to fully recover from an outage that began on Friday.

Fifteen hours and eight minutes after an errant database deployment script granted past and current users of the company's Pardot B2B marketing automation system full read and write access to all data, prompting the cloud CRM giant to disable all affected server instances, Salesforce declared victory on Saturday morning. And then immediately declared another emergency.

"Service disruption ended, 0104 PDT, May 18," the San Francisco tech titan said on its status website, only to immediately restart the clock with another notification, "Service disruption began, 0104 PDT, May 18."

Here's a summary of what happened: on Friday, the biz accidentally gave all users within current and former Pardot customers sysadmin-level access to all data, then pulled offline all instances running Pardot to prevent any information theft or tampering. Pulling the plug on these shared instances booted Pardot and non-Pardot customers off the Salesforce cloud: any customers sharing a Pardot-hosting instance lost access.

Then Salesforce wiped all access permissions for all affected users, and restored sysadmin-level access to customers' administrator accounts. Instances were gradually brought back online so admins could log in to manually repair user permissions by hand, allowing folks to get back to work as normal.

Over the weekend, Salesforce staff developed, tested, and ran a script that attempted to restore user permissions from backups, though this was not always successful. In some cases, it even went backwards, and regranted full read-write permissions to users.

Over the weekend, Salesforce held a series of conference calls to update customers on the status of repairs on the 105 affected instances. Come Monday morning, Salesforce functionality appears to have been restored for most organizations, though the tech goliath acknowledged its automated fixes and repair work haven't reached everyone.

Timeline

- A dodgy database script on Friday gave all current and past Pardot users sysadmin-level create-read-write-delete access to all data, which is a huge security and privacy blunder.

- Instances running Pardot were disabled to avoid any data theft or tampering, which kicked any customers using those instances – Pardot and non-Pardot – off Salesforce.

- Salesforce removed all access permissions from affected users and restored full permissions to customers' administrator accounts, allowing them to repair users' permissions. Some instances were fired back up so folks could log back in, others remained offline. Admins were encouraged to reassign their users' access permissions by hand to allow users to continue working.

- On Saturday, Salesforce was able to restore previous user permissions from a backup using a script on one instance of 105 affected.

- By Sunday, the permission restoration script had been run on the majority of instances, repairing user permissions with an "89 per cent" success rate – meaning about one in ten organizations still had their user rights wiped. Admins were told, if they still had permission issues, to fix it themselves or contact Salesforce support.

- On Monday, Salesforce staff are still working to restore sandboxed instances. We're told GovCloud was not affected by the snafu.

In a conference call on Monday, May 20, at 0030 PDT, Anmol Bhasin, SVP of engineering, said Salesforce was still dealing with a few thousand trouble reports after said automated script failed to fully undo the permission snafu.

"On the last customer update, I had communicated that the initial fix that we had put in place for restoring functionality for the pre-incident state – restoring permission sets in particular – which we believe should have restored functionality for all the affected organizations was not successful in doing so," he said.

Bhasin apologized for the disruption, and offered his assurance that the Salesforce is focused on fixing things at the highest level of the company and has devoted all available engineering resources toward resolving the problems.

Since then, there have been reports of instances going offline and then coming back online. The latest update from the cloud giant insists the automated permissions repair operation has been run on all production instances, but after that "a subset of users in affected orgs on the NA53, NA57, and NA59 instances had their permission levels reset again, which gave them broader data access than intended."

Customers on those instances are still experiencing problems on Monday morning.

In an email earlier today, Alex Brausewetter, CTO of Blue Canvas, told El Reg, "It's completely bonkers! From what we gather there are still hundreds if not thousands of customers affected. In one earlier call, they said they received thousands of complaints/support tickets after they ran they scripts that they thought would fix this issue. Salesforce has gone radio silent since yesterday night Pacific time and they just cancelled a bridge call that was scheduled for 0900 and moved it to 1030. Aside from that there's been no public communication."

The Register asked Salesforce to provide an update on the outage. A spokesperson merely pointed back to the published incident response webpage, which says that the issue is "ongoing."

Brausewetter has collected details from the calls into a Google Docs file, and shared the results. "It's totally unacceptable from this kind of service to leave customers in the dark like this," he told us. "For the customers affected by this permission problem, none of their users can log into the org or use Salesforce right now. And now the weekend is over…"

Topics

Special Features

Vendor Voice

Resources

SaaS

Salesfarce to Failsforce: Salesforce database blunder outage enters day three as fix falters

El Reg tunes into customer conference calls to hear SVP of engineering apologize

Timeline

More about

More about

Narrower topics

More about

More about

More about

Narrower topics

TIP US OFF

Other stories you might like

Salesforce apparently poised to slurp data management outfit Informatica

Misconfigured cloud server leaked clues of North Korean animation scam

Oracle scores big win with Fujitsu Japan for its Alloy partner cloud

Industrial systems integrating digitalisation

Tencent Cloud to revisit design after circular dependencies slowed emergency API fix

Alleged cryptojacker accused of stealing $3.5M from cloud to mine under $1M in crypto

Alibaba Cloud reveals network telemetry tool that helped cut number of engineers needed by 86%

Backblaze cloud storage buzzes with added Event Notifications

AWS must pay $525M to cloud storage patent holder, says jury

SharePoint logs are easily circumvented and Microsoft is dragging its heels

US-EAST-1 region is not the cloudy crock it's made out to be, claims AWS EC2 boss

Huawei Cloud reveals the dynamic traffic allocation system it uses to cut bandwidth bills

About Us

Our Websites

Your Privacy