Bayesian statistics for confused data scientists

<- Back

Bayesian statistics for confused data scientists

speckx

Comments (37)

bvan
Nicely done. I have the same challenge with Bayesian stats and usually do not understand why there is such controversy. It isn’t a question of either/or, except in the minds of academics who rarely venture out into the real world, or have to balance intellectual purity with getting a job done.In the very first example, a practitioner would consciously have to decide (i.e. make the assumption) whether the number of side on the die (n) is known and deterministic. Once that decision is made, the framework with which observations are evaluated and statistical reasoning applied will forever be conditional on that assumption.. unless it is revised. Practitioners are generally OK with that, whether it leads to ‘Bayesian’ or ‘frequentist’ analysis, and move on.
statskier
I went through grad school in a very frequentist environment. We “learned” Bayesian methods but we never used them much.In my professional life I’ve never personally worked on a problem that I felt wasn’t adequately approached with frequentist methods. I’m sure other people’s experiences are different depending on the problems you gravitate towards.In fact, I tend to get pretty frustrated with Bayesian approaches because when I do turn to them it tends to be in situations that already quite complex and large. In basically every instance of that I’ve never been able to make the Bayesian approach work. Won’t converge or the sampler says it will take days and days to run. I can almost always just resort to some resampling method that might take a few hours but it runs and gives me sensible results.I realize this is heavily biased by basically only attempting on super-complex problems, but it has sort of soured me on even trying anymore.To be clear I have no issue with Bayesian methods. Clearly they work well and many people use them with great success. But I just haven’t encountered anything in several decades of statistical work that I found really required Bayesian approaches, so I’ve really lost any motivation I had to experiment with it more.
oliver236
Nice writeup. Something that clicked for me reading this is how much the prior/likelihood/posterior dynamic mirrors transfer learning in deep learning. The prior is basically your pre-trained weights: broad knowledge you bring to the table before seeing any task-specific data. The likelihood is your fine-tuning step. And the Bernstein-von Mises result at the end is essentially saying "with enough fine-tuning data, your pre-training washes out."Obviously the analogy isn't perfect (priors are explicit and interpretable, pre-trained weights are not), but I think it's a useful mental model for anyone coming from an ML background who finds Bayesian stats unintuitive. Regularization being secretly Bayesian was the other thing that made it click for me. If you've ever tuned a Ridge regression lambda, you were doing informal prior selection.
jhbadger
I think Rafael Irizarry put it best over a decade ago -- while historically there was a feud between self-declared "frequentists" and "Bayesians", people doing statistics in the modern era aren't interested in playing sides, but use a combination of techniques originating in both camps: https://simplystatistics.org/posts/2014-10-13-as-an-applied-...
algolint
The frequentist vs. Bayesian debate often becomes more about "what can I compute easily?" than "what is the correct mental model?". With tools like Stan and PyMC getting better, the "computational cost" argument is weakening, but the "intuition cost" remains high. Most people are naturally frequentists in their day-to-day reasoning, and switching to a mindset of "probability as a degree of belief" requires a significant cognitive shift that isn't always rewarded with better results in simple business or engineering contexts.
fumeux_fume
As a data scientist, I find applied Bayesian methods to be incredibly straightforward for most of the common problems we see like A/B testing and online measuring of parameters. I dislike that people usually first introduce Bayesian methods theoretically, which can be a lot for beginners to wrap their head around. Why not just start from the blissful elegance of updating your parameter's prior distribution with your observed data to magically get your parameter's estimate?
jrumbut
The author makes a comparison to Haskell, which I think might be a little misleading.Haskell is a little more complicated to learn but also more expressive than other programming languages, this is where the comparison works.But where it breaks down is safety. If your Haskell code runs, it's more likely to be correct because of all the type system goodness.That's the reverse of the situation with Bayesian statistics, which is more like C++. It has all kinds of cool features, but they all come with superpowered footguns.Frequentist statistics is more like Java. No one loves it but it allows you to get a lot of work done without having to track down one of the few people who really understand Haskell.
hawtads
I think it would be interesting if frequentist stats can come up with more generative models. Current high level generative machine learning all rely on Bayesian modeling.
lottin
> In Bayesian statistics, on the other hand, the parameter is not a point but a distribution.To be more precise, in Bayesian statistics a parameter is random variable. But what does that mean? A parameter is a characteristic of a population (as opposed to a characteristic of a sample, which is called a statistic). A quantity, such as the average cars per household right now. That's a parameter. To think of a parameter as a random variable is like regarding reality as just one realisation of an infinite number of alternate realities that could have been. The problem is we only observe our reality. All the data samples that we can ever study come from this reality. As a result, it's impossible to infer anything about the probability distribution of the parameter. The whole Bayesian approach to statistical inference is nonsensical.