Data science is exploding in popularity, but what does it really mean?
Buzz words are popping up, but how does data science actually add value to the business? This article will explain the buzzwords, define common misconceptions, and share industry knowledge on my experience so far in the AI industry.
Here are the common buzz words and their definition so we can be on the same page.
Glossary of Buzz Words
Artificial Intelligence can be used defined as a branch of computer science that can simulate human intelligence. AI is implemented in machines to perform tasks that actually require human intelligence. (Tech4fresher)
Deep learning is a subset of machine learning, which is essentially a neural network with three or more layers. These neural networks attempt to simulate the behavior of the human brain — albeit far from matching its ability — allowing it to “learn” from large amounts of data. (IBM)
Machine learning involves computers discovering how they can perform tasks without being explicitly programmed to do so. It involves computers learning from data provided so that they carry out certain tasks. (Wikipedia)
In Supervised learning, you train the machine using data which is well “labeled.” (Guru99)
Unsupervised learning is a machine learning technique, where you do not need to supervise the model. (Guru99)
In my previous role as a data scientist for a B2B data and analytics company called TMX Group (the Toronto Stock Exchange), I created a KPI dashboard for the C-suites, created churn analysis for the products to institutional clients, and defined a customer segmentation model. These are common responsibilities of a data scientist. I will assume that you have the basic technical skills (HackerRank is a great resource for interview prep), so I will explain the business use cases.
Churn analysis is a common practice in the data science industry. It evaluates a company’s customer loss rate to reduce it. Churn can be minimized by assessing the product and how people use it. Most business models now have a subscription-based model, and people can cancel whenever they don’t like it anymore. There are many reasons why churn occurs: switching to a competitor, closure of an account, cancelled subscriptions from poor customer fit, missing functionality, and failure to achieve outcomes. There are 2 types of churn — voluntary and involuntary. Because customers, especially the institutional ones, are so expensive to acquire, churn is actually better math than revenue growth for data scientists. Regardless of marketing and the effectiveness of customer acquisition through sales, if your product cannot make people stay, you don’t have a sustainable model for the long term and scalability. Acquiring new customers is considerably more expensive than maintaining and upgrading existing customer relationships. The more customers you churn, the more money you must spend to recoup the loss of business by finding new ones.
A major project I worked on was analyzing churn data. Data collection for this plus mathematical translation is important for this case. It would help if you had customer engagement and usage, support tickets, competitor pricing points, and the likelihood to upgrade to get the right data. You want to understand the customer behavioural pattern and usually at what month do they start churning. Essentially, the solution would be to predict your churn and flag high-risk customers so that the sales and customer success team can target them. You can analyze your products based on the pricing tiers and compare that to industry standards. There are other key metrics like ARPU (average revenue per unit), MRR (monthly recurring revenue), and ARR (annual recurring revenue). Surprisingly, I sometimes find a negative churn rate, which is amazing for the business because it means your expansion revenue from all existing customers outweighs the revenue lost from existing customers over the same period.
Here is a dashboard that is similar to what I did (I cannot share the actual one I did because of company policy):
This leads me to my next point! Churn and understanding customers go hand in hand, so we must also create a customer segmentation model. There is a lot of synergy between the two models, so I typically build the two in parallel.
A customer segmentation model is a form of focus marketing plan that applies extensive data science skills. for the companies that use data science. They want to target their customers better, which means they need to understand their customers, break them down into predictable chunks, and collect the proper data to build a machine learning algorithm that can take in those data and make predictions. I would break the customers down by industry. Whether those are churn susceptible, the risk from how long the user has been using the product, differing pricing tiers and whether low tiers are getting enough features and the high tiers are not charged too much. This information is qualitative and experience-based, so as a young data scientist, I would talk to the Head of Product, Strategy, and Sales working with the clients for a long time to get insights.
Data science sounds complicated, but essentially all the social media companies and all the B to C e-commerce companies or even being B sometimes use this to target their customers better because this can exponentially increase their revenue and scale their revenue model and reduce a lot of marketing and advertising costs. I’m so under the point of breaking the customers down to predict the chunk for feature selection and feature engineering.
Sure, so for feature selection, first you need to identify the segmentation plays very commonly they look at demographic psychographic geographic and behavioural. So, in my previous experience working at the AI Startup advanced symbolics, we did a lot of market research for different companies to help their target marketing. So we scraped data from social media sites using a conversation-based audience space topic model to scrape data on who these people identify and what they are focused on talking about in their tweets. In terms of feature engineering, you need first to identify all the possible features and features are just a fancy term of saying variables for an equation so how I think of it is that an equation is a model, and the features of the model are the variables.
So first, you need to think about in your specific case of what you’re trying to predict your customer behaviour and whatever your industry is in is what is the most likely thing intuitively in your head. Then you can even brainstorm a bunch of other features that you think could work and add calculations on top of the industry standards. So, the industry standards are demographic, psychographic, geographic and behavioural. In my previous Startup that works extensively on AI market research, we use this as social media like Twitter feeds and build conversation models and topic models to gather all the things that the potential customers would talk about or care about and then use that to build our features around. Feature selecting let’s remodel and feature engineering is essentially using the existing variables add that you can find the data engineer additional variables that can test our model.
Feature Engineering and Feature Selection
Feature engineering is the process of using domain knowledge to extract features (characteristics, properties, attributes) from raw data. A feature is a property shared by independent units on which analysis or prediction is to be done. Features are used by predictive models and influence results. (Wikipedia)
Feature selection is the process of reducing the number of input variables when developing a predictive model. It is desirable to reduce the number of input variables to both reduce the computational cost of modeling and, in some cases, to improve the performance of the model. (Machine Learning Mastery)
Let’s talk about what would be in the features. Typically people will always look at demographics sometimes, they look at the socioeconomic class, and you can get that from Twitter as well geography as in where they are and sentiment analysis and then like whether they are positive or negative or neutral about a certain topic that would be related to your company’s product, and you can use that. And then a few others that are important as well are gender, age and lifestyle, and those are other ways you can segment those which I have used as well. Those are good predictors.
Understand feature engineering precisely what it is. It’s the process of using domain knowledge to extract features like characteristics, properties and attributes from raw data. a feature is a property shared by independent units on which analysis or prediction can be made. And features can be used in predictive models to influence results, so I’ll give you a few examples that I have done and seen in the industry, so let’s start with an industry standard like the Netflix recommendations tool. Show Netflix is a better be business where they essentially built algorithms to recommends the right movies for you so that you can continue to go on the app make less decision making make it easier to use the app and so you will stay on it longer and watch more that’s essentially what they need to do right. They need to use recall as the most important thing here.
So in the case of Netflix, they use ML to decide what new shows or movies to make, and it’s known for content creation at scale. They need to do this in an excellent pipeline with 195 million members across 190 countries with a diverse taste in entertainment. They need to think about content marketing and studio production, and data science is essential in this pipeline. Hence, they need to think about which existing titles are comparable and what audience size they can expect and in which region, so this ties back into the fundamental features that the industry typically uses in terms of demographic, geographic and behavioural. We’re trying to predict so to predict the movies that people like. So think about it like other streaming platforms right that do content creation like medium Netflix, even Pornhub and YouTube I would definitely use be recommendation algorithm to help people find what they like and for content creators to deliver to the right target audience, and that’s a huge pain point that the traditional industry can’t do cause I think you can definitely have an audience for everything.
But let’s say you want to scale your platform, you would need a few power users, or if you power content creators that would really dominate and attract as many people onto the platform as possible, so one might go onto Instagram because they want to follow Selena Gomez or they want to use a competitor of Netflix like Disney plus because they have the movie Cruella on there, so that’s why audience sizing and what’s the other one titles and genres of movies are critical. So in audience sizing, it’s usually both a technical and business problem because the sample size is always very important right you need to first have metadata you need to have tags and summaries which plugged into the learned embeddings and then it would push out things like similar title identification an audience sizing so in reality you are actually using previous data to predict what type of movies they would like in terms of similar titles and then how big of the audiences so a very simple example is let’s say you are a person who like rom-com and you absolutely loved the movie friends with benefits and so Netflix will see that and they also see that oh there is what’s the other movie that’s also similar to friends with benefits I forgot no strings attached I mean that that is the audience sizing there as well like after you create a similar titles and genres then you see like OK the audience size of friends with benefits is this much and so I would also predict got the audience for this similar thing is also this much .
Different Common Use Case of AI and Data Science
When you think of AI, you have to think about automation, like AI is trying to replace more cognitive tasks that humans have been doing for a long time now. Before, you have marketing agencies trying to navigate market research to target people. Still, now they essentially automate that process by using data to consume a lot more data and make better prediction analysis, so think of AI in different use cases as to how you can learn from the past Ann make predictions of the future. Still, in terms of results, you have to decide whether you want the results to be precise or do you want the result to have high recall as in whether or not you can have as much frequency of this one perfect thing happening.
In Medtech, people are trying to automate the diagnosis of diseases and predict treatment success rates. In finance, they are trying to automate trading, applying directly in HFT. In retail and B2B sales, they are looking at the churn and customer segmentation. In politics, they are predicting election results and doing market research. Tools like NLP and sentiment analysis are useful across the board because humans are not the only ones who can understand qualitative and emotionally driven data now. Texts are much richer in meaning than just numbers and processing that opens doors for massively scalable analysis, which I have done in my previous AI startup experience.
This is original content from NewsBreak’s Creator Program. Join today to publish and share your own content.