Implementation of Flexible Conditional Density Estimator (FlexCode) in Python. See Izbicki, R.; Lee, A.B. Converting High-Dimensional Regression to High-Dimensional Conditional Density Estimation. Electronic Journal of Statistics, 2017 for details. Port of the original R package.
FlexCode is a general-purpose method for converting any conditional mean point estimator of
By the orthogonality property, the expansion coefficients are just conditional means
where the coefficients are estimated from data by an appropriate regression method.
git clone https://github.com/lee-group-cmu/FlexCode.git
pip install FlexCode[all]
Flexcode handles a number of regression models; if you wish to avoid installing all dependencies you can specify your desired regression methods using the optional requires in brackets. Targets include
- xgboost
- scikit-learn (for nearest neighbor regression, random forests)
import numpy as np
import scipy.stats
import flexcode
from flexcode.regression_models import NN
import matplotlib.pyplot as plt
# Generate data p(z | x) = N(x, 1)
def generate_data(n_draws):
x = np.random.normal(0, 1, n_draws)
z = np.random.normal(x, 1, n_draws)
return x.reshape((len(x), 1)), z.reshape((len(z), 1))
x_train, z_train = generate_data(10000)
x_validation, z_validation = generate_data(10000)
x_test, z_test = generate_data(10000)
# Parameterize model
model = flexcode.FlexCodeModel(NN, max_basis=31, basis_system="cosine",
regression_params={"k":20})
# Fit and tune model
model.fit(x_train, z_train)
model.tune(x_validation, z_validation)
# Estimate CDE loss
print(model.estimate_error(x_test, z_test))
# Calculate conditional density estimates
cdes, z_grid = model.predict(x_test, n_grid=200)
for ii in range(10):
true_density = scipy.stats.norm.pdf(z_grid, x_test[ii], 1)
plt.plot(z_grid, cdes[ii, :])
plt.plot(z_grid, true_density, color = "green")
plt.axvline(x=z_test[ii], color="red")
plt.show()
One particular realization of the FlexCode algorithm is FlexZBoost which uses XGBoost as the regression method. We apply this method to photo-z estimation in the LSST DESC DC-1. For members of the LSST DESC, you can find information on obtaining the data here.
import numpy as np
import pandas as pd
import flexcode
from flexcode.regression_models import XGBoost
# Read in data
def process_data(feature_file, has_z=False):
"""Processes buzzard data"""
df = pd.read_table(feature_file, sep=" ")
df["ug"] = df["u"] - df["g"]
df.assign(ug = df.u - df.g,
gr = df.g - df.r,
ri = df.r - df.i,
iz = df.i - df.z,
zy = df.z - df.y,
ug_err = np.sqrt(df['u.err'] ** 2 + df['g.err'] ** 2),
gr_err = np.sqrt(df['g.err'] ** 2 + df['r.err'] ** 2),
ri_err = np.sqrt(df['r.err'] ** 2 + df['i.err'] ** 2),
iz_err = np.sqrt(df['i.err'] ** 2 + df['z.err'] ** 2),
zy_err = np.sqrt(df['z.err'] ** 2 + df['y.err'] ** 2))
if has_z:
z = df.redshift.as_matrix()
df.drop('redshift', axis=1, inplace=True)
else:
z = None
return df.as_matrix(), z
x_data, z_data = process_data('buzzard_spec_witherrors_mass.txt', has_z=True)
x_test, _ = process_data('buzzard_phot_witherrors_mass.txt', has_z=False)
n_obs = x_data.shape[0]
n_train = round(n_obs * 0.8)
n_validation = n_obs - n_train
perm = np.random.permutation(n_obs)
x_train = x_data[perm[:n_train], :]
z_train = z_data[perm[:n_train]]
x_validation = x_data[perm[n_train:]]
z_validation = z_data[perm[n_train:]]
# Fit the model
model = flexcode.FlexCodeModel(XGBoost, max_basis=40, basis_system='cosine',
regression_params={"max_depth": 8})
model.fit(x_train, z_train)
model.tune(x_validation, z_validation)
# Make predictions
cdes, z_grid = model.predict(x_test, n_grid=200)