Precision vs recall, which one is better for your model?

Anh Le

I often get this question in my data science interview because this is one of the most important concepts in data science and also most confusing. So I am writing this to explain it clearly.

Key Takeaways

  • Precision: This tells when you predict something positive, how many times they were actually positive. Precision will be more important when the cost of acting is high, but the cost of not acting is low.
  • Recall: This tells, out of actual positive data, how many times you predicted correctly. The recall will be more important when the cost of acting is low, but not acting is high.
https://img.particlenews.com/image.php?url=4d4pb4_0blwgXyy00
Photo by Luke Chesser on Unsplash

It depends on the use case.

  • For rare cancer data modelling, anything that doesn’t account for false negatives is a crime. The recall is a better measure than precision.
  • For YouTube recommendations, false negatives are more of a concern. Precision is better here.

We have thousands of free customers registering on our website every week. The call center team wants to call them all, but it is impossible, so they ask me to select those with good chances to be a buyer (high temperature is how we refer to them). We don’t care to call a guy who is not going to buy (so precision is not essential), but all of them with high temperatures must always be in my selection, so they don’t go without buying. That means my model needs to have a high recall, no matter if the precision goes to hell.

Which is more important?

It depends on what the costs of each error are.

Precision involves direct costs; the more false positives you have, the more cost per true positive you have. If your costs are low, then precision doesn’t matter as much. For instance, if you have 1M email addresses, and it will cost $10 to send an email to all of them, it’s probably not worth your time to try to identify the people most likely to respond instead of just spamming all of them.

On the other hand, recall involves opportunity costs; you give up opportunities every time you have a false negative. So recall is least important when the marginal value of additional correct identification is small, e.g. there are multiple opportunities, there is little difference between them, and only a limited number can be pursued. For instance, suppose you want to buy an apple. There are 100 apples at the store, and 10 of them are wrong. If you have a method of distinguishing bad apples that miss 80% of good ones, then you will identify about 18 good apples. Usually, a recall of 20% would be terrible, but if you only want five apples, then missing those other 72 apples doesn’t matter.

Recall

So recall is most important when:

  • The number of opportunities is small (if there were only 10 good apples, then you would be unlikely to find 5 good ones with a recall rate of only 20%)
  • There are significant differences between opportunities (if some apples are better than others, then a recall rate of 20% is enough to get 5 good apples, but they aren’t necessarily going to be the best apples)
    OR
  • The marginal benefit of opportunities remains high, even for a large number of opportunities. For instance, while most shoppers won’t benefit from more than 18 good apples, the store would like to have more than 18 apples to sell.

Thus, precision will be more important than recall when the cost of acting is high, but the cost of not acting is low. Note that this is the cost of acting/not acting per candidate, not the “cost of having any action at all” versus the “cost of not having any action at all.” In the apple example, it’s the cost of buying/not buying a particular apple, not the cost of buying some apples versus the cost of not buying any apples; the cost of not buying a particular apple is low because there are lots of other apples. Since the cost of buying a bad apple is high, but the cost of passing up a delicious apple is low, precision is more important in that example. Another example would be hiring when there’s a lot of similar candidates.

The recall is more important than precision when the cost of acting is low, but the opportunity cost of passing up on a candidate is high. There’s the spam example I gave earlier (the cost of missing out on an email address isn’t high, but the cost of sending out an email to someone who doesn’t respond is even lower). Another example would be identifying candidates for the flu shot: give the flu shot to someone who doesn’t need it, and it costs a few dollars, don’t give it to someone who does need it, and they could die. Because of this, health care plans will generally offer the flu shot to everyone, disregarding precision entirely.

Although the recall may be more important than precision (or vice versa) in some situations, you need both to get a more interpretable assessment.

For instance, as noted by @SmallChess, a false negative is usually more disastrous than a false positive for preliminary diagnoses in the medical community. Therefore, one might consider recall to be a more important measurement. However, you could have 100% recall yet have a useless model: if your model always outputs a positive prediction, it would have 100% recall but be completely uninformative.

This is why we look at multiple metrics:

Accumulation has a great answer on developing more examples explaining the importance of precision over recall and vice versa.

Most of the other answers make a compelling case for the importance of recall, so I thought I’d give an example of the importance of precision. This is a completely hypothetical example, but it makes the case.

Let us say that a machine learning model is created to predict whether a certain day is a good day to launch satellites or not based on the weather.

  • If the model accidentally predicts that a good day to launch satellites is bad (false negative), we miss the chance to launch. This is not such a big deal.
  • However, if the model predicts it is a good day, but it is actually a bad day to launch the satellites(false positive), the satellites may be destroyed. The cost of damages will be in the billions.

Precision

This is a case where precision is more important than recall.

I had a tough time remembering the difference between precision and recall until I came up with this mnemonic for myself:

PREcision is to PREgnancy tests as reCALL is to CALL center.
https://img.particlenews.com/image.php?url=1eZo9j_0blwgXyy00
https://img.particlenews.com/image.php?url=4YPzL2_0blwgXyy00
Photo by Camylla Battani on Unsplash

The test manufacturer must ensure that a positive result means the woman is really pregnant with a pregnancy test. People might react to a positive test by suddenly getting married or buying a house (if many consumers got false positives and suffered huge costs for no reason, the test manufacturer would lack customers). I got a false negative pregnancy test once, and it just meant it took a few more weeks before I found out I was pregnant…the truth ultimately became apPARENT. (Pun intended.)

Now picture a call center for insurance claims. Most fraudulent claims are phoned in on Mondays after the fraudsters connect with collaborators and craft their made-up stories (“let’s say the car was stolen”) over the weekend. What’s the best thing for an insurance company to do on Mondays? Maybe they should tune to favour recall over precision. It is far better to flag more claims as positive (likely fraud) for further investigation than to miss some of the fraud and pay out cash that should have never been paid. A false positive (flagged for additional scrutiny as possible fraud, but the customer loss was real) can likely be cleared up by assigning an experienced adjuster, who can insist on a police report, request building security video, etc. A false negative (accepting a fraudster’s false claim and paying out in cash) is the pure loss to the insurance company and encourages more fraud.

F1 is great but understanding how the test/prediction will be used really important because there’s always some risk of being wrong…you want to know how dire the consequences will be if wrong.

Email Spam detection: This is one of the examples where Precision is more important than Recall.

Quick Recap:

  • Precision: This tells when you predict something positive, how many times they were actually positive. Whereas,
  • Recall: This tells, out of actual positive data, how many times you predicted correctly.

Having said above, in the case of spam email detection, One should be okay if a spam email (positive case) is left undetected and doesn’t go to the spam folder but, if an email is good (negative), then it must not go to spam folder. i.e. Precison is more important. (If the model predicts something positive (i.e. spam), it better be spam. Else, you may miss important emails).

When we have an imbalanced class and need high true positives, precision is preferred over recall because precision has no false negative in its formula, which can impact.

I took a simple example from Aurelion Geron’s book, Hands-on Machine Learning with Scikit-Learn and Tensorflow. Imagine that we want to make sure that our website blocker for our child only allows ‘safe’ websites to be shown.

In this case, a ‘safe’ website is a positive class. Here, we want the blocker to be certain that the website is safe, even if some safe websites are predicted to be part of the negative or unsafe class and are consequently blocked. That is, we want high precision at the expense of recall.

We want to investigate every potential safety risk in airport security, where a safety risk is a positive class. In this case, we will have high recall at the expense of precision (many bags with no safety hazards will be investigated).

Precision at the expense of Recall in Trading

I have seen this kind of trade-off in nearly every stock or Forex prediction engine I have written. It seems like a general principle that attempts to improve precision (ratio of winning trades) tend to reduce recall (trading frequency). A non-intuitive side effect is that lower recall reduces our sample size, making it harder to gain confidence using the law of large numbers. Another non-intuitive side effect is that fewer trades have fewer opportunities to compound gains. In many cases, the strategy that trades profitably more often but with a somewhat higher error rate will produce a much higher net profit in many conditions.

The exception is when a strong filter can deliver enough trades to benefit from the law of large numbers and compounding. Under those conditions, Kelly betting rules allow a larger investment per trade.

http://bayesanalytic.com/precision-vs-recall-a-net-trading-profit-perspective/

This is original content from NewsBreak’s Creator Program. Join today to publish and share your own content.

Comments / 0

Published by

As an Asian female immigrant, I want to break out of the stereotypes and believe in myself to make a difference in the world. And to make a difference, I want to understand how the world works through the lens of research, insights, and data. Having had a natural talent for science and arts since I was little, I joined science competitions to hackathons from school to university. I am currently a data scientist with a keen interest in developing insights. I love learning about how the world works. I believe a great data scientist must have the speech of a diplomat, the curiosity of a scientist, and a business mindset of an entrepreneur. With a background in Biotech, Engineering, and Finance from the University of Waterloo (both a Bachelor and a Master), my insatiable curiosity and a strong desire to make an impact beyond myself have allowed me to step outside my comfort zones. My blog on finance, data science, and blockchain on Medium @thisisanhle (also on other socials) has over 10,000 views a month. Follow me on Medium for my insights.

New York, NY
0 followers

More from Anh Le

Comments / 0