Probability Modelling: Language and Physics

In two week’s time we will have Daniel Bernhardt from Facebook, London presenting to us at the Data Hide. He will spend the tutorial session beforehand discussing the details of spell checkers at Facebook.

To prepare for the 17th this we we will have a discussion of n-grams and how to model language, and starting thinking more about the spell checking problem that Daniel will talk about. We’ll have a simple review of probability and how to relate marginals and conditional probabilities and think about how probability can be used to represent sequences. In particular you should refresh yourself on the sum rule of probability and the product rule of probability.

Think about this type of modelling and how it relates to Navier Stokes. In particular what are the abstractions in Navier Stokes? What are the abstactions in n-gram models and how they are applied to language?

Which model would you think of as “Higher Level” and which model is a “Lower Level” model.

Navier Stokes is often thought of as a “Physical model” can n-gram models be thought of in this way? If so how why? If not why not?