README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78

# ft_linear_regression

A simple linear regression implementation using gradient descent to predict car prices based on mileage.

## Requirements

- Python 3
- matplotlib (for visualization only)
- pandoc (to generate HTML documentation)

```
pip install matplotlib
```

To generate the HTML version of this README (to see the equations):
```
pandoc README.md --mathml -s -o README.html
```

## Usage

### Train the model

```
python3 train.py <learning_rate> <iterations>
```

Example:
```
python3 train.py 1.0 1000
```

This trains the model on `data.csv` and saves the resulting parameters (θ0, θ1) to `thetas.csv`.

### Predict a price

```
python3 predict.py
```

Prompts for a mileage value and outputs the estimated price. Loops until Ctrl+C.
If no trained model is found, θ0 and θ1 default to 0.

### Visualize

```
python3 visualize.py
```

Displays a scatter plot of the dataset. If a trained model exists, the regression line is drawn on top.

## How it works

The model fits a linear function:

```
estimatePrice(mileage) = θ0 + θ1 * mileage
```

Parameters are found via gradient descent. The input data is normalized before training using min-max normalization, and the resulting thetas are denormalized afterward so they work directly on raw mileage values.

### Why normalization?

The two variables have very different scales: mileage ranges from ~22,000 to ~240,000 while prices range from ~3,600 to ~8,300. This causes the gradient for $\theta_1$ (which is multiplied by mileage) to be orders of magnitude larger than the gradient for $\theta_0$. No single learning rate can work well for both parameters simultaneously.

Min-max normalization scales each variable to $[0, 1]$:

$$x_{\text{norm}} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}$$

Without normalization, if you pick a learning rate small enough to prevent $\theta_1$ from overshooting, $\theta_0$ barely moves and needs millions of iterations. If you pick a larger learning rate so $\theta_0$ converges in a reasonable time, $\theta_1$ overshoots, oscillates, and diverges to infinity (NaN).

Normalization brings both gradients to the same scale, allowing gradient descent to converge efficiently with a single learning rate.

After training on normalized data, the thetas are converted back to work on raw values:

$$\theta_1' = \theta_1 \cdot \frac{p_{\max} - p_{\min}}{km_{\max} - km_{\min}}$$

$$\theta_0' = \theta_0 \cdot (p_{\max} - p_{\min}) + p_{\min} - \theta_1' \cdot km_{\min}$$