In order to stress the fact that the probability density depends on the two parameters, we write. The joint probability density of the sample is because the joint density of a set of independent variables is equal to the product of their marginal densities see the lecture on Independent random variables. The likelihood function is. The log-likelihood function is typically used to derive the maximum likelihood estimator of the parameter.
The estimator is obtained by solving that is, by finding the parameter that maximizes the log-likelihood of the observed sample. This is the same as maximizing the likelihood function because the natural logarithm is a strictly increasing function. One may wonder why the log of the likelihood function is taken.
There are several good reasons. To understand them, suppose that the sample is made up of independent observations as in the example above. A new term appears in our optimisation formula!
For example, if you initialised your thetas from a Gaussian distribution means of 0, variance of 1 you would end up with an L2 regularisation on theta. Try it and be amazed! Notice that if we sample our thetas from the uniform distribution, the added term becomes a constant so we can get back to maximising the likelihood.
Notice how engineering problems pushed us to find better notations or better optimisation procedures, surprisingly in machine learning, the basic probability theories are often not that complicated to grasp but the engineering feat to make them actually work are insane. In this case, one could rewrite our first equation as:. The main problem is usually that all hyperparameters are not in the same mathematical space integers vs reals numbers, etc… and so you must mix optimisation techniques.
Anyway if you read this far, thank you! ML notes series. We are a bunch of optimist machine learning scientist doing…. Been geeking code and stuff since forever.
ML and Web3 enthusiast. We are a bunch of optimist machine learning scientist doing crazy projects! Contact us for more information. Sign in. ML notes: Why the log-likelihood? Morgan Follow. Some context Machine learning is about modelling: you experience something and you wonder afterwards if you could have predicted it, or even better if you can build something that could have predicted it for you.
It can be pictures, sounds, spreadsheets, etc, or even a mix. To handle those experiences, you define a model. You have a lot of choices here but in the end, you get an algorithm which covers initialisation past , learning present and predictions future. The modelling part Choosing a family of models is already choosing something which means that either you are choosing randomly or you have a prior knowledge of the task at hand and you are making an educated guess. But how do we do that?
ML notes series Why the log likelihood? Written by Morgan Follow. More From Medium. Why can you not explain Net Promoter Score movements over time? Federico Cesconi. The exponential can cause overflow, whereas taking the log makes this way less likely. Maximizing the product would be a horrible task, maximizing the sum however is quite doable.
This link is not working any more. Anno Anno 1 1 bronze badge. Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. Post as a guest Name. Email Required, but never shown. Upcoming Events. Featured on Meta. Now live: A fully responsive profile. The unofficial elections nomination post.
Linked 3. Related 7. Hot Network Questions. Question feed.
0コメント