TU Wien:Statistik und Wahrscheinlichkeitstheorie UE (Levajkovic)/Übungen 2023W/HW01.4

Aus VoWi
Zur Navigation springen Zur Suche springen
Spam filter

One way to design a spam filter is to look at the phrases in an email. In particular, some phrases are more frequent in spam emails. Suppose that we have the following information: 30% of emails are spam, 1% of spam emails contain the phrase ”filled with joy”; 0.2% of non-spam emails contain the phrase ”filled with joy”. Suppose that an email is checked and found to

contain the phrase ”filled with joy”. What is the probability that the email is spam?

Dieses Beispiel ist als solved markiert. Ist dies falsch oder ungenau? Aktualisiere den Lösungsstatus (Details: Vorlage:Beispiel)


Lösungsvorschlag von Lessi[Bearbeiten | Quelltext bearbeiten]

--Lessi 2024-02-07T13:04:11Z

s <- 0.3    # probability of spam
n <- 0.7    # non-spam
js <- 0.01  # prob of "filled with joy" within spam email
jn <- 0.002

We are interested in the probability

is obtained using law of total probability:

j <- js * s + jn * n
j

If 30% of emails are spam and of those 1% contain "filled with joy" then of all emails are spam and contain that phrase. 70% of emails are not spam and 0.2% of those contain the phrase. Therefore of all emails are not spam and contain that phrase. This means that of all emails contain "filled with joy".

Now we can compute :

(js * s) / j

The probability that the email containing "filled with joy" is spam is