As more companies look to analytics and data-mining models to extract useful information from big data, a better way is needed to share these models between applications.
While there are well-established tools for building analytics and
data-mining models to help businesses spot fraudulent transactions or
recommend follow-up purchases to customers, plugging these models into
applications can be a painful process.
As more businesses call
upon these models to interrogate increasingly large datasets, it will
become necessary to have an easy way to export and share these models
between applications.
Sean Owen, director of data science at
Hadoop specialist Cloudera, expects the next big growth area in big data
will be in tools that make it simpler to share these models between
applications.
"It seems to be the common problem, the wheel that keeps getting reinvented by customers," he said.
"The
default thing to do is someone makes a model in [the statistical
modelling language] R and they say 'Here's a bunch of coefficients, go
program this into some Java code and use this on the website'.
"That requires some expertise on behalf of the developer too, it's very manual.
"They need something that the web service can ask in some standard simple way 'Here's a new data point, classify it for me'."
One candidate for a standardised way to share these models is the Predictive Model Markup Language (PMML) – an XML-based language for representing data mining and statistical models .
PMML
can represent not only the statistical techniques used to learn
patterns from data, such as artificial neural networks and decision
trees, but also pre-processing of raw input data and post-processing of
the model output.
A wide range of data mining tools can import or
export models as PMML, and the standard itself is developed by the Data
Mining Group, a vendor-led consortiums whose members include IBM,
MicroStrategy, SAS and SPSS.
Developing a standard way of representing and interacting with these models would be a "big deal" in the coming year said Owen.
"You
would think there would be a server for this and there really
isn't. SAS has an expensive proprietary tool that does that and there's
one open source package that kind of does it," he said.
"If I've
got a model, surely I should be able to load it up in something and then
query it with standard APIs and client libraries? We need to
standardise and have a suite of mature solutions to do this."
0 comments:
Post a Comment
Appreciate your concern ...