Reposted from LinkedIn.
A colleague of mine eloquently and rightfully declares that “if one person or one team can solve a problem by themselves, it is probably not a problem worth solving.” Given this context, as a follow on to my post “What Makes a Data Scientist - Part 1,” it is fitting to explore the role of teams in defining a Data Scientist. As discussed in the first post, the spectrum of skills that a Data Scientist needs to posses is too vast for a single person to be an expert in all. As part of a team she will lead a group of individuals who have more depth than her in some areas and less in others. How she utilizes and leverages that team as well as her ability to reach across functional boundaries will define the level of success her and her team actually attain.
Based on the spectrum of work a Data Scientist does she can actually land in two different organizations: an analytics team or a data team, both of which should have Data Scientists. The expectation for Data Scientists varies depending on which side of the fence she lands. The team with a data-focused Data Scientist is responsible for data architecture, data engineering, data intake, data transformation, ontology, metadata, statistical transformation, and descriptive analytics & reporting. A Data Scientist who lands on the analytics-focused team is responsible for descriptive analytics, model-based analytics, and data journalism. This is a critical point, if the Data Scientist building the model were responsible for doing this foundational work on the data, her natural inclination would be to bring it together to solve her problem. The outcome of having a separate team led by Data Scientists focused on the data is that the data can then be managed with the broader enterprise in mind and not just the problem at hand.
To be clear, one person is not expected to be an expert in all of these areas regardless of which side of the fence she lands. The expectation is that she has a solid understanding of the spectrum and depth in at least 2-3 areas, as well as the ability to execute (the concept of “E-shaped” skills). The Data Scientist cannot be the only one with an E-shaped skill set, her entire team must have this type of skill set as well. If you are familiar with Agile development concepts, think of the Data Scientist as Data and Analytics Product Owner (PO) and the others as “Team Members”. With this structure you can build Agile teams of 3-4 individuals that possess the necessary spectrum of skills and domain expertise.
Since my current focus is leading a data team, I am going to focus on the Data Scientist that is on the data side of the fence. In this example she must be able to lead and direct her team as well as work across organizational boundaries. Her role in this position is to understand the data and understand the application of these data to solve a complex modeling problem. Leveraging her domain expertise, she works with the analytics team to understand its problem and put it in the context of the larger organization. To do this the team needs to architect and engineer the solution based on a set of use cases. While the solution is being architected and engineered, the team needs to identify, source, consume, and transform the data. The team also needs to ensure the data have appropriate context in the form of an ontology and metadata. This is important in building a reusable enterprise data asset. Essentially, under her guidance the team must prepare the data and publish it to the Enterprise in such a way that it stands on its own. This last part is where her foundation in analytics comes in. Additionally, her depth of knowledge in statistical transformation, descriptive analytics, and reporting comes in to play at this point.
Data sets are a commodity, but unlike other commodities data sets are reusable. Even more importantly, the value of data multiplies with every use. Data gains value when leveraged by analytical models. These models are simultaneously generating new data and driving decisions. A Data Scientist maximizes her value when she treats data as a reusable commodity for the enterprise. By building and nurturing these data assets she enables reuse of the data for application development, other analytical questions, and reporting. Circling back to the statement “if one person or one team can solve a problem by themselves it is probably not a problem worth solving,” having an Agile team built around the spectrum of data science skills is critical to addressing decisions that deliver real business value for an organization while simultaneously building a repertoire of reusable data assets.