{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Based Problems\n",
    "\n",
    "The DESDEO framework provides handling of data-driven optimization problems. Some methods, such as E-NAUTILUS in `desdeo-mcdm`, find the most preffered solution from a provided dataset. Other methods, such as most of the EA's from `desdeo-emo`, require a surrogate model to be trained for each of the objectives. The `desdeo_problem` provides support for both of these cases. \n",
    "\n",
    "For data based problems, use the data specific objective/problem classes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "VectorDataObjective is an objective class that can handle data, as well as multi-objective evaluators.\n",
    "\n",
    "The GaussianProcessRegressor here is same as the one in scikit-learn with one small difference. The predict method has been replaced to return uncertainity values (in the form of standard deviation of the prediction) by default. It supports hyperparameters in the same format as the sklearn method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from desdeo_problem import VectorDataObjective as VDO\n",
    "from desdeo_problem.surrogatemodels.SurrogateModels import GaussianProcessRegressor\n",
    "from sklearn.gaussian_process.kernels import Matern"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating some random data\n",
    "\n",
    "'a' and 'b' are randomly generated between 0 and 1.\n",
    "\n",
    "f1 = a + b  \n",
    "f2 = a * b\n",
    "\n",
    "For data-driven problems, make sure that the input dataset is in the pandas DataFrame format, with the column names being the same as the variable/objective names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>f1</th>\n",
       "      <th>f2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.720371</td>\n",
       "      <td>0.269914</td>\n",
       "      <td>0.990285</td>\n",
       "      <td>0.194438</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.460496</td>\n",
       "      <td>0.928888</td>\n",
       "      <td>1.389383</td>\n",
       "      <td>0.427749</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.173756</td>\n",
       "      <td>0.856071</td>\n",
       "      <td>1.029827</td>\n",
       "      <td>0.148747</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.700958</td>\n",
       "      <td>0.566548</td>\n",
       "      <td>1.267507</td>\n",
       "      <td>0.397127</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.027785</td>\n",
       "      <td>0.640366</td>\n",
       "      <td>0.668152</td>\n",
       "      <td>0.017793</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          a         b        f1        f2\n",
       "0  0.720371  0.269914  0.990285  0.194438\n",
       "1  0.460496  0.928888  1.389383  0.427749\n",
       "2  0.173756  0.856071  1.029827  0.148747\n",
       "3  0.700958  0.566548  1.267507  0.397127\n",
       "4  0.027785  0.640366  0.668152  0.017793"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = np.random.rand(100,2)\n",
    "\n",
    "f1 = (data[:,0]+data[:,1]).reshape(-1,1)\n",
    "f2 = (data[:,0]*data[:,1]).reshape(-1,1)\n",
    "\n",
    "data = np.hstack((data, f1, f2))\n",
    "\n",
    "X = ['a','b']\n",
    "y = ['f1','f2']\n",
    "datapd = pd.DataFrame(data, columns=X+y)\n",
    "datapd.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using VectorDataObjective class\n",
    "\n",
    "The `VectorDataObjective` class takes as its input the data in a dataframe format and the objective names in a list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "obj = VDO(data=datapd, name=y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training surrogate models\n",
    "\n",
    "Pass the surrogate modelling technique and the model parameters to the train method of the objective instance.\n",
    "\n",
    "If only one modelling technique is passed, the `model_parameters` should be a dict (or None) and this will be used for all the objectives.\n",
    "\n",
    "If multiple modelling techniques are passed, `models` should be the list of modelling techniques, and `model_parameters` should be a list of dicts. The length of these lists should be the same as the number of objectives and each list element will be used to train one objective in order."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "E:\\Projects\\.virtualenvs\\desdeo-problem\\lib\\site-packages\\sklearn\\gaussian_process\\_gpr.py:616: ConvergenceWarning: lbfgs failed to converge (status=2):\n",
      "ABNORMAL_TERMINATION_IN_LNSRCH.\n",
      "\n",
      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
      "  _check_optimize_result(\"lbfgs\", opt_res)\n"
     ]
    }
   ],
   "source": [
    "obj.train(models=GaussianProcessRegressor, model_parameters={'kernel': Matern(nu=1.5)})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using surrogate models to evaluate objective values\n",
    "\n",
    "Use the obj.evaluate method to get predictions. Note that `use_surrogates` should be true."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Objective Evaluation Results Object \n",
      "Objective values are: \n",
      "         f1    f2\n",
      "0  0.800003  0.15\n",
      "Uncertainity values are: \n",
      "         f1        f2\n",
      "0  0.000445  0.001171\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "E:\\Projects\\.virtualenvs\\desdeo-problem\\lib\\site-packages\\sklearn\\base.py:450: UserWarning: X does not have valid feature names, but GaussianProcessRegressor was fitted with feature names\n",
      "  warnings.warn(\n",
      "E:\\Projects\\.virtualenvs\\desdeo-problem\\lib\\site-packages\\sklearn\\base.py:450: UserWarning: X does not have valid feature names, but GaussianProcessRegressor was fitted with feature names\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "print(obj.evaluate(np.asarray([[0.5,0.3]]), use_surrogate=True))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'f1': True, 'f2': True}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "obj._model_trained"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating data problem class\n",
    "\n",
    "Creating the objective class should be bypassed for now, use `DataProblem` class directly with the data in a dataframe.\n",
    "\n",
    "The `DataProblem` provides a `train` method which trains all the objectives sequentially. The input arguments for this train method is the same as that of the `VectorDataObjective` class.\n",
    "\n",
    "To make sure that the `evaluate` method uses the surrogate models for evaluations, pass the `use_surrogate=True` argument."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "from desdeo_problem import DataProblem"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "maximize = pd.DataFrame([[True, False]], columns=['f1','f2'])\n",
    "prob = DataProblem(data=datapd, objective_names=y, variable_names=X, maximize=maximize)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "prob.train(GaussianProcessRegressor)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Evaluation Results Object \n",
      "Objective values are: \n",
      "[[0.89999816 0.08000015]\n",
      " [0.80000041 0.15000012]]\n",
      "Constraint violation values are: \n",
      "None\n",
      "Fitness values are: \n",
      "[[-0.89999816  0.08000015]\n",
      " [-0.80000041  0.15000012]]\n",
      "Uncertainity values are: \n",
      "[[6.48833930e-06 6.48833930e-06]\n",
      " [4.85016202e-06 4.85016202e-06]]\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "E:\\Projects\\.virtualenvs\\desdeo-problem\\lib\\site-packages\\sklearn\\base.py:450: UserWarning: X does not have valid feature names, but GaussianProcessRegressor was fitted with feature names\n",
      "  warnings.warn(\n",
      "E:\\Projects\\.virtualenvs\\desdeo-problem\\lib\\site-packages\\sklearn\\base.py:450: UserWarning: X does not have valid feature names, but GaussianProcessRegressor was fitted with feature names\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "print(prob.evaluate(np.asarray([[0.1,0.8], [0.5,0.3]]), use_surrogate=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lipschitian models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "from desdeo_problem.surrogatemodels.lipschitzian import LipschitzianRegressor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "prob = DataProblem(data=datapd, objective_names=y, variable_names=X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "prob.train(LipschitzianRegressor)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Evaluation Results Object \n",
      "Objective values are: \n",
      "[[0.9        0.08094064]\n",
      " [0.8        0.16315449]]\n",
      "Constraint violation values are: \n",
      "None\n",
      "Fitness values are: \n",
      "[[0.9        0.08094064]\n",
      " [0.8        0.16315449]]\n",
      "Uncertainity values are: \n",
      "[[8.88178420e-16 4.52040286e-02]\n",
      " [9.43689571e-16 5.29605172e-02]]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "print(prob.evaluate(np.asarray([[0.1,0.8], [0.5,0.3]]), use_surrogate=True))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "desdeo-problem",
   "language": "python",
   "name": "desdeo-problem"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
