So, you’ve got a data set, you say? Some points And you’re not sure what do with them. Maybe you have a set of x and y coordinates?

In my case, I had a set of coordinates, each representing the number of an DJs followers on SoundCloud. I had another set of coordinates that represented an DJ’s cost per night.

I wanted to be able to use this data set to be able to predict given an artist’s Soundcloud followers count, how much they might cost! So which set is the x and which is the y? And how do I go about this?

A great Rubyist once said, now we’re in math world, so let’s go to the Google.

I found these two sites, which combined helped me make my algorithm pretty easily.

The data we have is x and the data we are predicting would be y.

In this website all you do is plug in your x, y coordinates and it creates for lines of best fit.

http://www.had2know.com/academics/regression-calculator-statistics-best-fit.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

4 Regression Equations
X independent, Y dependent

Linear: Y = AX + B:
Y = 0.2243X + 258.589
correlation = 0.9954

Exponential: Y = C(D^X):
Y = 349.6169(1.0001^X)
correlation = 0.8647

Power: Y = E(X^F):
Y = 31.8027(X^0.4464)
correlation = 0.8692

Logarithmic: Y = G + H(Ln(X)):
Y = -3363.0363 + 713.5463Ln(X)
correlation = 0.7238

Then you can use this site to graph them - which isn’t technically necessary but it helped me see which type of line seemed to make sense.

http://www.mathsisfun.com/data/grapher-equation.html

It’s pretty neat and helped me discover the following method aka my predictive algorithm. (sdcl_followers is an attribute of a Dj.)

1
2
3
4
5
6
7
class Dj < ActiveRecord::Base
...
  def rate_get
    29.3447*(sdcl_followers**0.4635)
  end

end

That all said, this algorithm is pretty shit. It turns out soundcloud followers isn’t the greatest determiner of cost/night. I’ll need to take some other factors into account.

Hope this is helpful.