Data Based Problems
The DESDEO framework provides handling of data-driven optimization problems. Some methods, such as E-NAUTILUS in desdeo-mcdm
, find the most preffered solution from a provided dataset. Other methods, such as most of the EA’s from desdeo-emo
, require a surrogate model to be trained for each of the objectives. The desdeo_problem
provides support for both of these cases.
For data based problems, use the data specific objective/problem classes
[1]:
import pandas as pd
import numpy as np
VectorDataObjective is an objective class that can handle data, as well as multi-objective evaluators.
The GaussianProcessRegressor here is same as the one in scikit-learn with one small difference. The predict method has been replaced to return uncertainity values (in the form of standard deviation of the prediction) by default. It supports hyperparameters in the same format as the sklearn method.
[2]:
from desdeo_problem import VectorDataObjective as VDO
from desdeo_problem.surrogatemodels.SurrogateModels import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import Matern
Creating some random data
‘a’ and ‘b’ are randomly generated between 0 and 1.
For data-driven problems, make sure that the input dataset is in the pandas DataFrame format, with the column names being the same as the variable/objective names.
[3]:
data = np.random.rand(100,2)
f1 = (data[:,0]+data[:,1]).reshape(-1,1)
f2 = (data[:,0]*data[:,1]).reshape(-1,1)
data = np.hstack((data, f1, f2))
X = ['a','b']
y = ['f1','f2']
datapd = pd.DataFrame(data, columns=X+y)
datapd.head()
[3]:
a | b | f1 | f2 | |
---|---|---|---|---|
0 | 0.720371 | 0.269914 | 0.990285 | 0.194438 |
1 | 0.460496 | 0.928888 | 1.389383 | 0.427749 |
2 | 0.173756 | 0.856071 | 1.029827 | 0.148747 |
3 | 0.700958 | 0.566548 | 1.267507 | 0.397127 |
4 | 0.027785 | 0.640366 | 0.668152 | 0.017793 |
Using VectorDataObjective class
The VectorDataObjective
class takes as its input the data in a dataframe format and the objective names in a list.
[4]:
obj = VDO(data=datapd, name=y)
Training surrogate models
Pass the surrogate modelling technique and the model parameters to the train method of the objective instance.
If only one modelling technique is passed, the model_parameters
should be a dict (or None) and this will be used for all the objectives.
If multiple modelling techniques are passed, models
should be the list of modelling techniques, and model_parameters
should be a list of dicts. The length of these lists should be the same as the number of objectives and each list element will be used to train one objective in order.
[5]:
obj.train(models=GaussianProcessRegressor, model_parameters={'kernel': Matern(nu=1.5)})
E:\Projects\.virtualenvs\desdeo-problem\lib\site-packages\sklearn\gaussian_process\_gpr.py:616: ConvergenceWarning: lbfgs failed to converge (status=2):
ABNORMAL_TERMINATION_IN_LNSRCH.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
_check_optimize_result("lbfgs", opt_res)
Using surrogate models to evaluate objective values
Use the obj.evaluate method to get predictions. Note that use_surrogates
should be true.
[6]:
print(obj.evaluate(np.asarray([[0.5,0.3]]), use_surrogate=True))
Objective Evaluation Results Object
Objective values are:
f1 f2
0 0.800003 0.15
Uncertainity values are:
f1 f2
0 0.000445 0.001171
E:\Projects\.virtualenvs\desdeo-problem\lib\site-packages\sklearn\base.py:450: UserWarning: X does not have valid feature names, but GaussianProcessRegressor was fitted with feature names
warnings.warn(
E:\Projects\.virtualenvs\desdeo-problem\lib\site-packages\sklearn\base.py:450: UserWarning: X does not have valid feature names, but GaussianProcessRegressor was fitted with feature names
warnings.warn(
[7]:
obj._model_trained
[7]:
{'f1': True, 'f2': True}
Creating data problem class
Creating the objective class should be bypassed for now, use DataProblem
class directly with the data in a dataframe.
The DataProblem
provides a train
method which trains all the objectives sequentially. The input arguments for this train method is the same as that of the VectorDataObjective
class.
To make sure that the evaluate
method uses the surrogate models for evaluations, pass the use_surrogate=True
argument.
[8]:
from desdeo_problem import DataProblem
[9]:
maximize = pd.DataFrame([[True, False]], columns=['f1','f2'])
prob = DataProblem(data=datapd, objective_names=y, variable_names=X, maximize=maximize)
[10]:
prob.train(GaussianProcessRegressor)
[11]:
print(prob.evaluate(np.asarray([[0.1,0.8], [0.5,0.3]]), use_surrogate=True))
Evaluation Results Object
Objective values are:
[[0.89999816 0.08000015]
[0.80000041 0.15000012]]
Constraint violation values are:
None
Fitness values are:
[[-0.89999816 0.08000015]
[-0.80000041 0.15000012]]
Uncertainity values are:
[[6.48833930e-06 6.48833930e-06]
[4.85016202e-06 4.85016202e-06]]
E:\Projects\.virtualenvs\desdeo-problem\lib\site-packages\sklearn\base.py:450: UserWarning: X does not have valid feature names, but GaussianProcessRegressor was fitted with feature names
warnings.warn(
E:\Projects\.virtualenvs\desdeo-problem\lib\site-packages\sklearn\base.py:450: UserWarning: X does not have valid feature names, but GaussianProcessRegressor was fitted with feature names
warnings.warn(
Lipschitian models
[12]:
from desdeo_problem.surrogatemodels.lipschitzian import LipschitzianRegressor
[13]:
prob = DataProblem(data=datapd, objective_names=y, variable_names=X)
[14]:
prob.train(LipschitzianRegressor)
[15]:
print(prob.evaluate(np.asarray([[0.1,0.8], [0.5,0.3]]), use_surrogate=True))
Evaluation Results Object
Objective values are:
[[0.9 0.08094064]
[0.8 0.16315449]]
Constraint violation values are:
None
Fitness values are:
[[0.9 0.08094064]
[0.8 0.16315449]]
Uncertainity values are:
[[8.88178420e-16 4.52040286e-02]
[9.43689571e-16 5.29605172e-02]]
[ ]: