Data scientists at organization such as LinkedIn and Cisco are applying aspects of the scientific method to data mining and data analysis initiatives to try to make sure they get valid results.
Data science may not be a formal scientific discipline, but big data analytics teams increasingly are treating it like one, to help ensure that data science applications produce exact significant data.
For Instance, LinkedIn Corp.’s data science team works with product managers, application developers and other business users to define quantitative metrics for analyzing tests of planned new features on the social networking company’s website.
“All that we do at LinkedIn is extremely metric-driven,” said Yael Garten, its chief of data science. She added that “hundreds of metrics” are in place, part of an analytics process designed to enable data-driven discussions about how features are faring in trial runs.
The procedure likewise incorporates components intended to ensure the data scientist have substantial data to mine and break down, Garten said in an presentation at the 2017 TDWI Leadership Summit in Las Vegas.
Tracking and logging data is part of job descriptions and performance reviews for developers, and executive approval is needed to launch new features without related data being logged. “We treat data as a first-class citizen,” she said.
However, data scientists, product managers and developers jointly create data requirements and schemas, which a data model review committee then checks to see whether the specified data will be successfully generated, Garten said.
And while feature tests are in progress, the data science team meets weekly with business executives and product teams to review metrics and big data analytics results.
LinkedIn, which is situated in Mountain View, Calif., and was procured by Microsoft in December, even uses scientific terms as a feature of the data science process. For instance, Garten alluded to the element tests as experiments, and she said the measurements are utilized to try out speculations on how highlights will influence the exercises of LinkedIn users on the site.
A Scientific Measure In Data Science Work
Cisco is another organization that’s applying some scientific rigor to data science applications. Its corporate data science team has adopted a set of “open science” procedures, such as peer reviews of each other’s work, said Anu Miller, a senior data scientist at Cisco.
The Cisco groups strictly adhere to CRISP-DM, a data mining and data analysis methodology formally known as the Cross-Industry Standard Process for Data Mining. CRISP-DM, which was first developed in the late 1990s, outlines a six-phase process model for data analysts to follow. “We use it to guide our projects all the way through,” Miller said in another TDWI conference session. “We’re almost religious about this.”
Likewise, Miller and her colleagues use a decision-modeling process for tying big data analytics efforts to business decision-making that was created by James Taylor, CEO of consultancy Decision Management Solutions.
The data science team measures the applications it’s working on against Taylor’s process to make sure there’s a good reason to do the analytics work, Miller said. “We ask each other all the time, ‘What business decision are you looking to support?’ “
Applying aspects of the scientific method to big data analytics applications also helps to foster more teamwork among the data scientists, according to Miller. “Those things almost force you to be collaborative,” she said. “There are no unicorns on our team. We have to work together.”
Failure Is An Option For Big Data Analysts
Donald Farmer, who heads analytics at data management consultancy TreeHive Strategy in Woodinville, Wash., said effective data science applications also call for data analytics teams to be willing to experiment and to fail in their experiments, just like real scientists often do.
“Innovation involves a lot of failure, and you need to grasp it,” Farmer said. “If everything you do works, you’re not daring enough — and you’re not really being innovative.
hat point reverberated with gathering participant Reuben Schooler, senior data engineering manager in the digital transformation group at Duke Energy Corp. in Charlotte, N.C. But, he said, getting agreement internally on a tolerance for big data analytics failure can be a hurdle, especially in an organization like his that’s in the early stages of building a big data architecture to support data mining and data science applications.
“It’s science – the test tube explodes as a rule,” Schooler said. “The trick is how to do that so it doesn’t have any backlash on your operational systems.”
Such issues are currently in play at Duke Energy, an electric utility and natural gas distributor that operates in six states.
Schooler said the digital transformation unit was set up alongside the company’s main IT department to push new ways of using technology, as part of an effort to become a more data-driven organization.
In connection with that, a data science and data analytics team was put in place “to guide priorities around what we want to do” in business Intelligience operations, he added.
But exactly how the new Data analytics process will function is still a work in progress, according to Schooler. Some basic steps need to be taken first, he said for intance, deploying Hadoop and other big data technologies, and getting data seen as a full-fledged corporate asset throughout the company.
No comments:
Post a Comment