This article is more than 1 year old

Self-service-as-a-service startup Trifacta releases newest product version

No more programming tears, claims firm

Trifacta, the self-service-as-a-service data-wrangling business, has released the fourth version of its data preparation product suite.

The San Francisco-based business was founded four years ago, and has been commercial for the last two. It offers data preparation products which transforms unstructured data into something analysable, even by non-techie data scientists.

Talking to The Register, Trifacta's CEO Adam Wilson claimed that more than 3,000 companies were now using Trifacta and in response to feedback from those customers the business was changing how it allowed users to play with their data.

The core focus of the new release is to allow customers to build their own data preparation workflow. A large part of this is the addition of a tool called Builder to the Trifacta interface. Builder is designed to guide users through their data-wrangling tasks while providing greater room for independence.

Previously, Wilson explained, "as the users would touch the data we’d make recommendations and suggestions regarding cleaning it up." This was based on Trifacta's own machine learning algorithms and expectations of users' needs.

Now, with pattern profiling, users are able to visualise "common and anomalous text patterns that are automatically detected within each column. The addition of fuzzy join allows users to blend together disparate data sources with similar values but non-exact matches."

Feedback from customers showed that "a lot of users wanted the ability to go in and build up their transformation logic themselves," Wilson said. The new Trifacta release is intended to meet the needs of those independent-minded customers, allowing them to author the wrangling themselves without having to learn to program in Wrangle, Trifacta's scripting language for transforming and cleaning data.

Cloud support has also been expanded, with Trifacta now able to be deployed in the cloud with Amazon Web Services, Google Cloud Platform and Microsoft Azure. For AWS users, Trifacta provides integration with Amazon S3 and Redshift as input and output sources, as well as deployment on EC2.

With the Google Cloud Platform, Trifacta's getting into its ecosystem with support for Google Cloud Storage and BigQuery as input and output sources, data processing via Google Dataflow and deployment on Google Compute Engine.

Microsoft Azure cloud platform is also supported in the new version, with Trifacta adding support for deployment on Microsoft Azure HDI and also allowing the integration of data from Azure Blob Storage.

The firm has also expanded support for creating live connections to common relational sources such as Microsoft SQL Server, MySQL, Oracle, PostgreSQL and Teradata. Trifacta says that unlike approaches that force customers to make copies of data prior to preparation, it creates a live connection, streaming in live data from external sources to incorporate directly into the wrangling process. v4 also includes the initial release of its connectivity API.

The future for Trifacta, Wilson told The Register, was cleaning streaming data. This is going to "take a lot of the tedious things and models" and remove them from customers' workflows, and make sure that they can "do in clicks what used to take weeks or months". ®

More about

More about

More about

TIP US OFF

Send us news


Other stories you might like