Data Based Problems

The DESDEO framework provides handling of data-driven optimization problems. Some methods, such as E-NAUTILUS in desdeo-mcdm, find the most preffered solution from a provided dataset. Other methods, such as most of the EA’s from desdeo-emo, require a surrogate model to be trained for each of the objectives. The desdeo_problem provides support for both of these cases.

For data based problems, use the data specific objective/problem classes

[1]:

import pandas as pd
import numpy as np

VectorDataObjective is an objective class that can handle data, as well as multi-objective evaluators.

The GaussianProcessRegressor here is same as the one in scikit-learn with one small difference. The predict method has been replaced to return uncertainity values (in the form of standard deviation of the prediction) by default. It supports hyperparameters in the same format as the sklearn method.

[2]:

from desdeo_problem import VectorDataObjective as VDO
from desdeo_problem.surrogatemodels.SurrogateModels import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import Matern

Creating some random data

‘a’ and ‘b’ are randomly generated between 0 and 1.

f1 = a + b

f2 = a * b

For data-driven problems, make sure that the input dataset is in the pandas DataFrame format, with the column names being the same as the variable/objective names.

[3]:

data = np.random.rand(100,2)

f1 = (data[:,0]+data[:,1]).reshape(-1,1)
f2 = (data[:,0]*data[:,1]).reshape(-1,1)

data = np.hstack((data, f1, f2))

X = ['a','b']
y = ['f1','f2']
datapd = pd.DataFrame(data, columns=X+y)
datapd.head()

[3]:

	a	b	f1	f2
0	0.720371	0.269914	0.990285	0.194438
1	0.460496	0.928888	1.389383	0.427749
2	0.173756	0.856071	1.029827	0.148747
3	0.700958	0.566548	1.267507	0.397127
4	0.027785	0.640366	0.668152	0.017793

Using VectorDataObjective class

The VectorDataObjective class takes as its input the data in a dataframe format and the objective names in a list.

[4]:

obj = VDO(data=datapd, name=y)

Training surrogate models

Pass the surrogate modelling technique and the model parameters to the train method of the objective instance.

If only one modelling technique is passed, the model_parameters should be a dict (or None) and this will be used for all the objectives.

If multiple modelling techniques are passed, models should be the list of modelling techniques, and model_parameters should be a list of dicts. The length of these lists should be the same as the number of objectives and each list element will be used to train one objective in order.

[5]:

obj.train(models=GaussianProcessRegressor, model_parameters={'kernel': Matern(nu=1.5)})

E:\Projects\.virtualenvs\desdeo-problem\lib\site-packages\sklearn\gaussian_process\_gpr.py:616: ConvergenceWarning: lbfgs failed to converge (status=2):
ABNORMAL_TERMINATION_IN_LNSRCH.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  _check_optimize_result("lbfgs", opt_res)

Using surrogate models to evaluate objective values

Use the obj.evaluate method to get predictions. Note that use_surrogates should be true.

[6]:

print(obj.evaluate(np.asarray([[0.5,0.3]]), use_surrogate=True))

Objective Evaluation Results Object
Objective values are:
         f1    f2
0  0.800003  0.15
Uncertainity values are:
         f1        f2
0  0.000445  0.001171

E:\Projects\.virtualenvs\desdeo-problem\lib\site-packages\sklearn\base.py:450: UserWarning: X does not have valid feature names, but GaussianProcessRegressor was fitted with feature names
  warnings.warn(
E:\Projects\.virtualenvs\desdeo-problem\lib\site-packages\sklearn\base.py:450: UserWarning: X does not have valid feature names, but GaussianProcessRegressor was fitted with feature names
  warnings.warn(

[7]:

obj._model_trained

[7]:

{'f1': True, 'f2': True}

Creating data problem class

Creating the objective class should be bypassed for now, use DataProblem class directly with the data in a dataframe.

The DataProblem provides a train method which trains all the objectives sequentially. The input arguments for this train method is the same as that of the VectorDataObjective class.

To make sure that the evaluate method uses the surrogate models for evaluations, pass the use_surrogate=True argument.

[8]:

from desdeo_problem import DataProblem

[9]:

maximize = pd.DataFrame([[True, False]], columns=['f1','f2'])
prob = DataProblem(data=datapd, objective_names=y, variable_names=X, maximize=maximize)

[10]:

prob.train(GaussianProcessRegressor)

[11]:

print(prob.evaluate(np.asarray([[0.1,0.8], [0.5,0.3]]), use_surrogate=True))

Evaluation Results Object
Objective values are:
[[0.89999816 0.08000015]
 [0.80000041 0.15000012]]
Constraint violation values are:
None
Fitness values are:
[[-0.89999816  0.08000015]
 [-0.80000041  0.15000012]]
Uncertainity values are:
[[6.48833930e-06 6.48833930e-06]
 [4.85016202e-06 4.85016202e-06]]

E:\Projects\.virtualenvs\desdeo-problem\lib\site-packages\sklearn\base.py:450: UserWarning: X does not have valid feature names, but GaussianProcessRegressor was fitted with feature names
  warnings.warn(
E:\Projects\.virtualenvs\desdeo-problem\lib\site-packages\sklearn\base.py:450: UserWarning: X does not have valid feature names, but GaussianProcessRegressor was fitted with feature names
  warnings.warn(

Lipschitian models

[12]:

from desdeo_problem.surrogatemodels.lipschitzian import LipschitzianRegressor

[13]:

prob = DataProblem(data=datapd, objective_names=y, variable_names=X)

[14]:

prob.train(LipschitzianRegressor)

[15]:

print(prob.evaluate(np.asarray([[0.1,0.8], [0.5,0.3]]), use_surrogate=True))

Evaluation Results Object
Objective values are:
[[0.9        0.08094064]
 [0.8        0.16315449]]
Constraint violation values are:
None
Fitness values are:
[[0.9        0.08094064]
 [0.8        0.16315449]]
Uncertainity values are:
[[8.88178420e-16 4.52040286e-02]
 [9.43689571e-16 5.29605172e-02]]

[ ]: