How Dell’s Data Science Team Benefits from Agile Practices
Agile development doesn’t work for data science… at least, not at first, said Randi Ludwig, Dell Technologies’ director of Data Science. That’s because, in part, there is an uncertainty that’s innate to data science, Ludwig told audiences at the Domino Data Lab Rev4 conference in New York on June 1.
“One of the things that breaks down for data science, in terms of agile development practices, is you don’t always know exactly where you’re going,” Ludwig said. “I haven’t even looked at that data. How am I supposed to know where do I even start with that?”
Nonetheless, Dell uses agile practices with its data science team and what Ludwig has found is that while there is a certain amount of uncertainty, it’s contained to the first part of the process where data scientists collect the data, prove there’s value and obtain sign-off from stakeholders. To manage that first part, she suggested time boxing it to three or four weeks.
“The uncertainty really only lies in the first part of this process,” she said. “What that agile looks like in the first half and then the second half of the process are different on a day-to-day basis for the team.”
After the uncertainty period, the rest of the data science process is more like software development, and agile becomes beneficial, she said.
Ludwig interwove how Dell implements agile practices in data science with the benefits the team reaps from those practices.
Benefits of Standups
First, standups should include anyone involved in a data science project, including data engineers, analysts and technical project managers, Ludwig said. Just talking to each other on a regular basis tends to fly in the face of how data scientists inherently work in isolation, but it helps put everyone on the same page and delivers value by adding context and avoiding rework. This pays dividends in that team members can step in for one another more than they can under the “lone wolf” approach to data science.
“Doing standups gives visibility to everybody else in the story,” she said. “That lack of context goes away just by talking to each other every day, and then if you actually write down what you talk about every day, you get other amazing benefits out of it.”
The standup doesn’t necessarily need to be every day, but it should be a recurring cadence that’s short enough that the project can’t go wildly afield, she added.
Benefits of Tickets
Documenting tickets is also a key practice that’s easy to do while alleviating single points of failure, she said, plus tickets have the benefit of not being onerous documentation.
“Just the fact of having things written down and talking to each other every day is massively beneficial, and in my experience is not how data science teams organically develop most of the time,” she said.
In the second half of the data science process, teams can articulate more clearly what exactly they’re going to do so tickets become possible. It’s important not to be too broad when writing tickets, however. Instead, break big ideas down into bite-sized chunks of work, she advised.
“‘I’m going to do EDA (exploratory data analysis) on finance data’ is way too broad. That’s way too big of a ticket. You’ve got to break those things down into smaller pieces,” she said. “Even just getting the team to articulate what are the some of the things you’re going to look for — you’re going to look for missing values, you’re going to look for columns that are high-quality data, you’re going to look to see if there’s any correlations between some of those columns — so that you’re not doing bringing in redundant features.”
It also helps inform the team about the why and how of the models being built. There can also be planning tickets that incorporate questions that need to be asked, she said.
Tickets become another form of data that can be used in year-end reviews and for the management of the team. For instance, one of Ludwig’s data scientists was able to demonstrate through tagged tickets how much time was spent on building data pipelines.
“Data scientists are not best at building data pipelines, you need data engineers for that,” Ludwig said. “This is a great tool because now I know that I need to either redistribute resources I have or go ask for more resources. I actually need more data engineers.”
Tickets can also be used to document problems encountered by the data science team. For instance, Ludwig was able to use tickets to show the database management team all the problems they were encountering with a particular database, thus justifying improvements to that database.
It can be challenging to get members to make tickets and keep them updated, she acknowledged, so she has everyone opened to Github so they can update the tickets during the standup.
Benefits of a Prioritization Log
Tickets also allow the team to create a prioritization log, she said. That triggers a slew of benefits, such as providing the team with support when there is pushback from stakeholders about requests.
“This magical thing happens where now you have stuff written down, which means you have a prioritization backlog, you can actually go through all of the ideas and thoughts you’ve had and figure out how to prioritize the work instead of just wondering,” she said. “You actually foster much less contentious relationships with stakeholders in terms of new asks by having all of the stuff written down.”
Stakeholders will start to understand that for the team to prioritize their request, they need to do some homework such as identifying what data sold be used, what business unit will consume the output of the data and what they think it should look like.
Another benefit: It can keep data scientists from wandering down rabbit holes as they explore the data. Instead, they should bring those questions to the standup and decide as a team for prioritizing.
”This helps you on your internal pipeline, as well as your intake with external stakeholders. Once they see that you have a list to work against, then they’re, ‘Oh, I need to actually be really specific about what I’m asking from you,’” she said.
Finally, there’s no more “wondering what the data science team is doing” and whether it will deliver benefits.
“One of the biggest concerns I’ve ever heard from leadership about data science teams is that they don’t know what your plan’s going to be, what are you going to deliver in 12 or 18 months, how many things I could learn between here that’s going to completely change whatever I tell you right now,” she said. “At least now you know that this investment has a path and a roadmap that’s going to continue to provide value for a long time.”
Benefits of Reviews and Retrospectives
“Stakeholders are just really convinced that people just disappear off into an ivory tower, and then they have no idea what are those data scientists doing,” Ludwig said.
There’s a lot of angst that can be eliminated just by talking with business stakeholders, which review sessions give you a chance to do. It’s important to take the time to make sure they understand what you’re working on, why and what you found out about it, and that you understand their business problem.
Retrospectives are also beneficial because they allow the data science team to reflect and improve.
“One of the things that I actually thought was one of the most interesting about data scientists or scientists at heart, they love to learn, they love to make things more efficient and optimize, but the number of teams that organically just decide to have retrospectives is very small, in my experience,” she said. “Having an organized framework of we’re going to sit down and periodically review what we’re doing and make sure we learn from it is an ad hoc thing that some people do or some people don’t. Just enforcing that regularly has a ton of value.”
Domino Data Lab paid for The New Stack’s travel and accommodations to attend the Rev4 conference.