AI biz borks US election spending data by using underpaid Amazon Mechanical Turks
Senate disclosure forms rife with errors from 'hi-tech' outfit
Captricity, a company that touts AI software capable of reading text better than people, has been blamed for a bumper crop of data entry errors that misrepresent what many US Senate candidates are actually spending for their campaigns.
According to a report published this week from the Center for Public Integrity (CPI), there are "errors in more than 5,900 candidate disclosures representing over $70 million, all of them traceable to the US government’s conversion of paper into electronic data."
For example, Bill Bledsoe, Libertarian US Senate candidate is listed as having spent $613,638 on gasoline on the US Federal Election Commission (FEC) website. His paper disclosure form declared $36.19 in gasoline expenses. But the FEC's paper-to-digital conversion process turned his campaign’s federal identification number, C00613638, into the dollar amount.
CPI says the FEC uses software developed by a Maryland-based government contractor, AuroTech, to digitize paper disclosure forms, in conjunction with the data conversion services of a subcontractor, Oakland-based Captricity.
Less than three bucks an hour
Captricity digitizes the documents with its machine learning software and, the CPI says, crowdsources the OCR verification to Amazon Mechanical Turk workers, about a quarter of whom typically are located outside the US, at an estimated average cost of $2.44 an hour.
That's rather less than the $4.65 median wage for Mechanical Turkers, based on an 2016 United Nations study, and the $7.25 federal minimum wage. The poor pay, CPI speculates, may partly explain why the data entry was so badly bungled. The software clearly also deserves some blame.
The advocacy group also expressed concern that hackers or foreign entities could take advantage of this demonstrably fragile system to introduce further errors for their own ends.
CPI identified Captricity's Mechanical Turk account name, p9r, which it found in its GitHub repository for a Django initialization file. CPI includes a link to the file on its webpage but the link no longer works.
The file was removed, presumably, because it included a secret access token. Also, the database default configuration object in the file includes a password key with a value of "1234" beside the p9r username – hopefully this is boilerplate that's overridden elsewhere.
In 2016, Captricity touted its success helping the FEC digitize its data, claiming its AI tech could cut turnaround time for election filings by 90 per cent.
Reminder: Vast majority of serfs toiling away as Mechanical Turks for megabucks Amazon earn less than min wageREAD MORE
"With Captricity’s cloud-native Data-as-a-Service platform, the FEC is able to upload scanned filing reports into designated folders securely hosted in the cloud via Amazon S3," the company said in a blog post at the time. "The images are then automatically extracted from each folder and uploaded to Captricity. Deep learning algorithms sort and capture the data from all of the documents quickly, securely and with 99.9 percent accuracy."
The Register attempted to reach Captricity to discuss the CPI findings and was contacted via autoresponder by a sales rep eager to sell us on "to Captricity’s intelligent automation solution; trusted by MetLife, MassMutual, the FEC, and other enterprises."
We called the rep to discuss the CPI report and were told Captricity CEO Nowell Outlaw had previously referred an inquiry from CPI to the FEC. We said we'd still like to talk. Outlaw subsequently responded to say that per his contractual agreement with the FEC, he could not comment. He suggested the CPI report contained inaccuracies without indicating what those might be.
AuroTech didn't immediately respond to an inquiry.
The FEC didn't respond either. The agency told CPI that it was aware of disclosure form errors and has been working with its contractors to reduce them.
CPI notes that a rule change bringing electronic form filing to Senate campaign disclosures – already enjoyed by candidates for the House of Representatives and US President – is currently part of a spending bill working its way through Congress. The error rate for electronic filings is about two per cent, compared to 20 per cent for digitized paper filings, the group says. ®