# Intro to Bayes

Week 8 | 1.1

### LEARNING OBJECTIVES

After this lesson, you will be able to:

• Understanding the basics of Bayes Formula/Relation
• Introduction to the "ingredients" of the Bayesian "world view" (i.e. Posterior, Prior, Likelihood
• Do some exercises (conceptual/mathematical) to build greater intuition/facility with Bayesian thinking

### STUDENT PRE-WORK

Before this lesson, you should already be able to:

• Understanding of Conditional Probabilities
• Basic facility with Probability of events/sets

### INSTRUCTOR PREP

Before this lesson, instructors will need to:

• Read in / Review any dataset(s) & starter/solution code
• Generate a brief slide deck
• Prepare any specific materials
• Provide students with additional resources

### LESSON GUIDE

TIMING TYPE TOPIC
5 min Opening Topic description
10 min Introduction Topic description
15 min Demo Topic description
25 min Guided Practice Topic description
25 min Independent Practice Topic description
5 min Conclusion Topic description

## Opening (5 mins)

"If [one] carries the umbrella and it does not rain, he is mildly inconvenienced. But if he does not carry the umbrella and it rains, he will suffer getting wet. A good Bayesian finds himself carrying an umbrella on many days when it does not rain. The policy actions of the last year were very much in this spirit. The Fed cut the interest rate to 1 percent to prevent the low probability outcome of spiraling deflation because it regarded that outcome as potentially very damaging while the alternative possible outcome of a rise of the inflation rate from 1.5 percent to 2.5 percent was deemed less damaging and more easily reversed."

• Martin Feldstein, former chief economic advisor to Ronald Reagan

What pellucidity can be gleaned from the above depths? Why is the noted Harvard economist talking about rain and umbrellas?

First and foremost let's make something absolutely clear, despite of what Dr. Feldstein has said above, Bayesianism is an all-inclusive sex-neutral club (non-Umbrella user concerns notwithstanding). So rejoice, whether you are a: he/she/some other non-binary optional, you too can (and will) be a Bayesian.

Ready to learn Marty's lingo? All you have to do is memorize and understand a special formula (read on below), and all the secrets of world will be revealed to you; and you don't have to even join a carpet cleaning cabal (for the Seinfeld fans out there).

## Introduction: Topic (10 mins)

#### 1.1 Introduction to the Bayes Formula (Part 1 - The Numerator of Bayes Formula):

Behold, the Bayes formula, in it's "almost" full generality for events... Be at ease though, although Bayesian analysis can be viewed as a 'highly mathematical' subject area, the above is the most "ugly" you'll see in these worksheets, roughly pre-calculus level math (or 'maths' for our British readers). In general I will attempt to shield your eyes from the cyclopean edifice, of notational miasma, that is mathematical syntax. Quite frankly, it's not really needed, and we can accomplish a good deal of the 'operational' knowledge from the formalism via intuition and some finger exercises. Let's get to taking this bad boy apart! We'll dissect the top of the formula first, the numerator, and start labeling things:

Statisticians have named this the 'likelihood function'. I don't think Probabilist ever deigned to name it, naming such a trivial form would be beneath the level of the mathematician. That's ok, cause no one pays much attention to the mathematician anyways. The likelihood function is usually a probability model (e.g. Bernoulli, Poisson schemes etc.). The other part of the numerator

This is called the 'prior'. Basically the prior is a distribution that represents one's beliefs of something occurring (e.g. of it raining, becoming a millionaire, the sun rising tomorrow etc.), unless your beliefs map onto infinity (then it breaks the bounds of the norm in the Hilbert space, and is pathological blah blah blah). We would call that an "improper prior". Suffice to say, I don't think any normal person would ever assign the chance of an event occurring in their prior as infinity ... except maybe Kayne.

## Demo: Topic (15 mins)

So to start off, let's go back to Marty's comments on rain and umbrellas. This is an oft used example to explain Bayesian thinking/analysis, and will serve as our first functional exemplar for elementary Bayesian analysis on events. Why events? Events are easily modeled as sets, and thus, are one of the simplest things we can perform on the Bayes analysis. Fear not, we will quickly move to something more useful (namely counting numbers). Observe the following:

The above stylized view of decision making accurately models a simple Bayesian analysis process. Edges of the tree represent the probability quantity, and directly leads to the nodes/ovals, of what Marty thinks. However, given that Marty's an acolyte of the Chicago school of economics, we can safely assume that he will dutifully follow the suggestions of the output of his prior beliefs on whether it will rain (so this may not be a bad representation after all).

In the context of the language we've established thus far, the edges/lines before the first layer of ovals together represent the "prior", and the four edges after the first layer of ovals, collectively would construct the 'likelihood' mapping (this isn't technically a likelihood function/mapping for reasons which will be clear later in this worksheet, but we'll use that word as a pedagogical tool to help embed that functional form in your head). Finally leading to the terminal set of ovals. Each oval can be thought of as a representation of the whole numerator on the Bayes formula.

### Very Brief Summary of Basic Probability Results:

Probability, like other subfields of mathematics is dense, and you've probably forgotten some of the basic rules, here's a list of a couple useful identities to memorize when thinking about events, likelihoods, and all the other fun stuff you're going to get into throughout this week and your future life as a data scientist:

Rule Notation Written Description
Addition Rule Assume A, B are independent: $P(A \cup B)=P(A)+P(B)$ The probability of A or B is the probability of A plus the probability of B
Multiplication Rule Assume A, B are independent: $P(A \cap B)=P(A)P(B)$ The Probability of A and B is just the probability of A multiplied by the probability of B

#### 1.3.1 Finger exercise using Bayes for statistics

Suppose we have a bag full of red and blue billiard balls. What is the probability that the quotient of red balls to blue is some parameter r?

#### 1.3.2 Thought Exercise

We would ordinarily read the left hand side of the above formula as the probability of θ given a sequence of xi, which just symbolize the data.

That makes sense, as we know from previous work, we can condition on the data, and the more data we get, in certain cases, we expect a tighter point estimate, or a higher confidence, etc. Yet how would we interpret the likelihood function If we were to read it directly, the likelihood function would be the probability of the xi's given θi, the parameter. Think about that for a minute. What does that really mean?

### 1.4 "Bayesians, Frequentist, lend me your ears" - Keynote speaker at the American Statistical Association

By now, you should have a sneaking feeling on the difference between Bayesian and Frequentist statistics. The differences can be summarized pretty easily as follows:

1. Frequentist: Believes that the "true" distribution is fixed (and not known), and we can glean more more about the "true" distribution by engaging in sampling, testing for effects, and studying relevant parameters of the population.
2. Bayesian: Believes that data/information can inform more "insight" into the population, and as we receive more data/information, our view of the population can be confirmed, or denied, but our view of the population is variable.

The Bayes view (which was actually developed by Laplace), is not only more practical/sensible, but it's impact on modern science has been profound. Everything from simulations, computational biology, and machine learning, primarily utilizes the Bayes approach. It's essentially a heuristic, and it fits neatly into an algorithmic approach or environment.

## Concluding thoughts, or as Jerry Seinfeld once said: "Good luck with all that..." (5 mins)

Some parting thoughts: We won't spend the next eternity discussing the nuances of the likelihood principle, and its ramifications in statistics (probably hundreds of monographs have been written just on that topic, and related issues). In fact, this issue of "inverting" the conditional probability helped start a schism in the field of statistics that lasted over half a century, between the "Frequentist" and the "Bayesian" statistician.

But the likelihood function merits this much explanation at least: It indicates that we can not only have an unknown parameter, like the case of classical frequentist statistics, where our job is usually to 'find' or estimate the unknown population parameter through sampling, but that the parameter can actually be 'random' (in fact, a random variable), and it's definition/properties can be revealed through data, or samples.

Our next steps will be to take these concepts and formalism, develop them further, and also apply them to computing, which is where the real power lies in the Bayesian approach.