TU Wien:Statistik und Wahrscheinlichkeitstheorie UE (Levajkovic)/Übungen 2023W/HW01.4
- Spam filter
One way to design a spam filter is to look at the phrases in an email. In particular, some phrases are more frequent in spam emails. Suppose that we have the following information: 30% of emails are spam, 1% of spam emails contain the phrase ”filled with joy”; 0.2% of non-spam emails contain the phrase ”filled with joy”. Suppose that an email is checked and found to
contain the phrase ”filled with joy”. What is the probability that the email is spam?
Lösungsvorschlag von Lessi[Bearbeiten | Quelltext bearbeiten]
--Lessi 2024-02-07T13:04:11Z
s <- 0.3 # probability of spam
n <- 0.7 # non-spam
js <- 0.01 # prob of "filled with joy" within spam email
jn <- 0.002
We are interested in the probability
is obtained using law of total probability:
j <- js * s + jn * n
j
If 30% of emails are spam and of those 1% contain "filled with joy" then of all emails are spam and contain that phrase. 70% of emails are not spam and 0.2% of those contain the phrase. Therefore of all emails are not spam and contain that phrase. This means that of all emails contain "filled with joy".
Now we can compute :
(js * s) / j
The probability that the email containing "filled with joy" is spam is