Microsoft drops tools to help ingest and model big datasets

Microsoft is expanding the analytic and visualisation capabilities of its R analysis package, with the launch late last week of IDEAR and AMAR.

IDEAR (Interactive Data Exploration, Analysis and Reporting) and AMAR (the Automated Modelling and Reporting tool) are part of Redmond's newish Team Data Science Process.

IDEAR automates report generation in R, with interactivity provided by R Studio's Shiny library. Microsoft says its key features are:

  • Automatic variable type detection – to help you suck a large data file into R without having to work out what variable types it holds in advance; and
  • Variable ranking and target leaker identification – to help researchers evaluate their data, and the relevance of data to their machine learning tasks.

There's also a multi-dimensional data visualisation tool. As Microsoft puts it: “IDEAR projects the high-dimensional numerical matrix into a 2-D or 3-D principal component space. In 3-D principal component spaces, you can change the view angle to visualize the data in different perspectives, which may be helpful in revealing clustering patterns.”

AMAR helps data science types train their machine learning models with hyper-parameter sweeping; and compare the importance of different variables in their models.

The tools are at Redmond's Team Data Science Process GitHub repo. ®

