# Vectorized version of gradient descent. theta = theta * reg_param - alpha * (1 / num_examples) * (delta.T @ self.data).T # We should NOT regularize the parameter theta_zero. theta[0] = theta[0] - alpha * (1 / num_examples) * (self.data[:, 0].T @ delta).T the first code line ,theta include theta[0]. so I think can write like this: theta[0] -= alpha * (1 / num_examples) * (self.data[:, 0].T @ delta) theta[1:] = theta[1:] * reg_param - alpha * (1 / num_examples) * (self.data[:, 1:].T @ delta) thanks