Data Science Baby Steps
This article was written by Phillip Ng, the Director of Data Science at Sowen. Phillip helps organizations make end-to-end data science pipelines functional, modern, and user-friendly. He loves to delve deep into the details of how things work, and provide technical perspective on helping projects succeed.
TL:DR
Now is a great time to discover what you can do with the data in your organization. Gartner states in their Top Strategic Technology Trends 2024 Report that “By 2026, 30% of new apps will use AI to drive personalized adaptive user interfaces, up from under 5% today.” With the latest advancements in data science, machine learning, and application development, you can leverage cutting-edge models, deploy them into production, and provide value to your end-users faster and easier than ever. These intelligent applications will take off your plate some of your most time-consuming workloads. Let's take a look to see what's possible for you!
The Uphill Climb
You might've heard that most data science or machine learning projects don't make it to production; in 2019, the ballpark number was around 87% . Why is this the case? Looking at projects I’ve worked on some years back, ranging from internal tools to user-facing applications, I could certainly see why that was the case.
You need to start by collecting and preprocessing the data, ensuring it's clean, relevant, and well-structured for machine learning. Following that, you need to select the appropriate algorithm for your specific problem and data type. There are a lot of models out there, with different architectures, use cases, benefits, and disadvantages.
Once the algorithm is chosen, you need to train it using a portion of your data, tweaking parameters and fine-tuning to achieve the best model performance, which could require many iterations, costing time and money. Then, you need to evaluate the model using validation techniques to measure its accuracy and generalization on new data. After successful validation, you must deploy the model into a production environment, constantly monitoring its performance and retraining it periodically with new data to maintain its accuracy and relevance in real-world scenarios.
Finally, not to mention, you still needed a whole data infrastructure on the backend to support it, which was on-premises and required IT’s support team to make any changes (cringe). Rinse and repeat for every other model. Plus, cross your fingers you were able to fine-tune your model within your timeframe and without a big bill. It was a big feat for even large enterprises, especially to do it quickly, systematically, and at scale.
The Downhill Ride
How have things changed since then?With the recent advancements in the field of Data Science, especially with large language models, more accessible machine learning models and services, and an ecosystem of tools, creating intelligent applications has become much more simple and accessible; a small organization can easily get started with the right tools and training. If you've been at the outskirts of all the commotion and peering in with wonder and hesitation, now is a great time to venture into the mystical depths of data science. Here are some things going for you:
You have access to TONS of services where you can prototype your idea within an hour.
Models are getting more multi-modal, meaning they are more flexible with the type of data you might have, such as text, images, audio, etc.
You don't need to invest $$$ for a warehouse of servers or GPUs to get started, most ML services handle the hardware on their side, letting you focus on getting value quicker.
What this means is that you’ll be more prepared to deploy your own flavor of intelligent applications. What will you build?
The Journey Ahead
Now, you need a road map. Here are my 7 steps to ensure that you have the best experience with your data solution:
Identify your greatest pain point that most affects your bottom line.
Define the use case with clarity, with the user in mind.
Get the low-hanging fruit. Start small, really small.
Do it manually before you automate it.
Build ONE thing VERY well.
Don't reinvent the wheel
Make it useful AND accessible
Imagine this: You’re an impact-purpose organization and you regularly interact with the community to make it better. But, you’ve been hesitant to adopt the latest tools because they seem intimidating. No worries, let's dive into how you might go about it, shall we?
Identify your greatest pain point that most affects your bottom line:
You're juggling a bucket load of valuable feedback from diverse sources—surveys, emails, and even handwritten notes from the local community. It's like having a treasure trove of insights, but it's buried under paperwork, stuck in digital silos. Your biggest headache? You can't easily transform this wealth of human wisdom into actionable strategies. This inefficiency hurts productivity and stifles our ability to make informed, impactful decisions, costing the business time and money.
Define the use case with clarity, with the user in mind:
Jane, our passionate customer support guru, who's drowning in unstructured feedback. She needs a system that can quickly sort and highlight urgent issues, enabling her to respond promptly. Bill, the data analyst, seeks structured, clean data for meaningful insights. And the big bosses? They want an eagle-eye view of trends. By DEEPLY understanding these users and their needs, we can prioritize building tools that cater to their most crucial demands.
Get the low-hanging fruit:
Start Small, really small. We're not aiming to boil the ocean here. We'll pluck the low-hanging fruits first. It might be something as simple as creating a standardized feedback form that's user-friendly and captures essential data uniformly. Tackling the easy wins builds momentum and trust.
Do it manually FIRST before you automate it:
Before we bring in the tech magic, we roll up our sleeves and get hands-on with the data. We manually sift through the feedback, learning its nuances, and feeling the pulse of what our community is saying. This helps us truly understand the value hidden within. Sure, it's time-consuming and old-fashioned, but you must first understand the story behind the numbers—it's essential before we unleash the machines. That’s because if you make changes to your automated system down the road, it’ll be that much harder to course correct. Better to get it right the first time.
Don't reinvent the wheel:
Why spend ages trying to recreate things that work? We'll start with existing models or APIs like ChatGPT for natural language processing. Once we've mastered these and proved their value to the team and to leadership, then we'll consider customization and expansion. In his talk Opportunities in AI - 2023, Andrew Ng mentions how it used to take months to create something that you can now do in less than an hour by leveraging Large Language Models (LLM’s), that’s how far we've come!
Build ONE thing REALLY well:
When we build, we build for success. Take, for example, creating a sentiment analysis model that can process incoming feedback. It's not just about the model itself; it's about the entire ecosystem around it—privacy measures, bias detection, output validation, documentation of processes, and ensuring that the system is fortified against cyber threats. We want our solution to be a well-oiled machine.
Make it useful AND accessible:
We don't just want a sophisticated system; we want a user-friendly, approachable interface. We're aiming for an application that even your newest intern could navigate with a smile. There are so many product-centric tools and teams out there, you can get a customized build for your application quickly and affordably. It's not just about what it can do; it's about how easy and delightful it is to use, a true testament to the user experience. In a nutshell, we're taking baby steps with a grand vision in mind, YOUR organization’s grand vision. I look forward to seeing you on your next adventure into the data realm. 🙂