Squashing function:

Given some data , our goal is to minimize the loss function :

The issue is that there is no closed-form solution for , but fortunately, is convex in . Thus, we can use local search to find the minimum of this function, via gradient descent.