{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "multinomial_model.ipynb", "provenance": [], "collapsed_sections": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3 (Spyder)", "language": "python3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "oeFDTihBDTnb" }, "source": [ "# Multinomial Logit" ] }, { "cell_type": "markdown", "metadata": { "id": "f2K2FpPHbdzT" }, "source": [ "This is a step-by-step guide on how to estimate Multinomial Logit models using the `xlogit` package. You can interactively execute the code in this guide by opening it Google Colab using the following link:\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/arteagac/xlogit/blob/master/examples/multinomial_model.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "id": "is9MSL-AkK9G" }, "source": [ "## Install `xlogit` package" ] }, { "cell_type": "markdown", "metadata": { "id": "HEpWCRkFRm5t" }, "source": [ "Install `xlogit` using `pip` as shown below." ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "36ZQw8iIkDib", "outputId": "4c3ef40d-20a7-4fc3-d10c-b282ab498c3a" }, "source": [ "!pip install xlogit" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Collecting xlogit\n", " Downloading https://files.pythonhosted.org/packages/60/5f/9bc576d180c366af77bc04e268536e9e34be23c52a520918aa0cb56b438e/xlogit-0.1.3-py3-none-any.whl\n", "Requirement already satisfied: numpy>=1.13.1 in /usr/local/lib/python3.7/dist-packages (from xlogit) (1.19.5)\n", "Requirement already satisfied: scipy>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from xlogit) (1.4.1)\n", "Installing collected packages: xlogit\n", "Successfully installed xlogit-0.1.3\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "4gVLebey57-t" }, "source": [ "## Route Choice Dataset" ] }, { "cell_type": "markdown", "metadata": { "id": "6TZFv__H3NmV" }, "source": [ "This dataset contains choices of 151 commuters among three Home-to-work route alternatives. The three alternatives are arterial, rural, and freeway roads. This dataset was taken from Example 13.1 of the book \"Statistical and econometric methods for transportation data analysis\" [(Washintong et. al., 2011) ](https://engineering.purdue.edu/~flm/StatEconBook.htm)." ] }, { "cell_type": "markdown", "metadata": { "id": "pMd_kflT6s_U" }, "source": [ "### Read data" ] }, { "cell_type": "markdown", "metadata": { "id": "RzpPDpOue-iB" }, "source": [ "We start by importing the data using pandas and renaming the columns of interest (choice, distance, male, and vehicle model). In addition, we create a column with the name of the alternatives and a column that uniquely identifies every observation in the dataset. Note that this dataset is long format." ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "id": "BiYw1Rbo6NIO", "outputId": "2e32651a-e256-4cc5-bab9-747bd93be82a" }, "source": [ "import pandas as pd\n", "import numpy as np\n", "df = pd.read_csv(\"https://engineering.purdue.edu/~flm/StatEcon-Files/Ex13-1.txt\",\n", " sep=\"\\t\", header=None, prefix=\"x\")\n", "df.rename(columns={'x0': 'choice', 'x6': 'dist', 'x10': 'male', 'x14': 'vehmodel'},\n", " inplace=True) # Rename columns of interest\n", "df['alt'] = np.tile(['arterial', 'rural', 'freeway'], len(df)//3) # Add column with alternatives\n", "df['ids'] = np.repeat(np.arange(len(df)//3), 3) # Add column with unique ids\n", "df['vehage'] = 86 - df['vehmodel']\n", "df" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
choicex1x2x3x4x5distx7x8x9malex11x12x13vehmodelx15x16altidsvehage
011004601448002010186028arterial00
10010440744002010186028rural00
20001130761002010186028freeway00
311005951359102110285027arterial11
400105151370102110285027rural11
...............................................................
44800103251070012110184124rural1492
4491001200574012110184124freeway1492
45001009001451102110383118arterial1503
4510010550747102110383118rural1503
4521001220764102110383118freeway1503
\n", "

453 rows × 20 columns

\n", "
" ], "text/plain": [ " choice x1 x2 x3 x4 x5 ... vehmodel x15 x16 alt ids vehage\n", "0 1 1 0 0 460 14 ... 86 0 28 arterial 0 0\n", "1 0 0 1 0 440 7 ... 86 0 28 rural 0 0\n", "2 0 0 0 1 130 7 ... 86 0 28 freeway 0 0\n", "3 1 1 0 0 595 13 ... 85 0 27 arterial 1 1\n", "4 0 0 1 0 515 13 ... 85 0 27 rural 1 1\n", ".. ... .. .. .. ... .. ... ... ... ... ... ... ...\n", "448 0 0 1 0 325 10 ... 84 1 24 rural 149 2\n", "449 1 0 0 1 200 5 ... 84 1 24 freeway 149 2\n", "450 0 1 0 0 900 14 ... 83 1 18 arterial 150 3\n", "451 0 0 1 0 550 7 ... 83 1 18 rural 150 3\n", "452 1 0 0 1 220 7 ... 83 1 18 freeway 150 3\n", "\n", "[453 rows x 20 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 23 } ] }, { "cell_type": "markdown", "metadata": { "id": "u0OvCmYp6fKi" }, "source": [ "### Create model specification" ] }, { "cell_type": "markdown", "metadata": { "id": "vm_yhbPn4GBo" }, "source": [ "The following code creates the model specification by including additional variables in the dataframe to accomodate the specification needs. The newly added variables are simply the product of existing variables with dummy variables for the different alternatives. Note that the specification in the code below corresponds to the following utility maximization formulation:" ] }, { "cell_type": "markdown", "metadata": { "id": "_ZJwXpOw5jfa" }, "source": [ "\\begin{equation}\n", "\\begin{array}{}\n", "V_{arterial} & = & \\quad \\beta_3DIST_{arterial} \\\\\n", "V_{rural} & = \\beta_1ASC_{rural} & + \\beta_4DIST_{rural} & + \\beta_6VEHAGE_{rural} \\\\\n", "V_{freeway} & = \\beta_2ASC_{freew} & + \\beta_5DIST_{freeway} & + \\beta_7{VEHAGE_{freeway}} & + \\beta_8MALE_{freeway} \n", "\\end{array}\n", "\\end{equation}" ] }, { "cell_type": "code", "metadata": { "id": "eC5Shb7S6eLE" }, "source": [ "# Alternative specific constants\n", "df['asc_rural'] = np.ones(len(df)) * (df['alt'] == 'rural')\n", "df['asc_freeway'] = np.ones(len(df)) * (df['alt'] == 'freeway')\n", "\n", "# Distance\n", "df['dist_arterial'] = df['dist'] * (df['alt'] == 'arterial')\n", "df['dist_rural'] = df['dist'] * (df['alt'] == 'rural')\n", "df['dist_freeway'] = df['dist'] * (df['alt'] == 'freeway')\n", "\n", "# Vehicle age\n", "df['vehage_rural'] = df['vehage'] * (df['alt'] == 'rural')\n", "df['vehage_freeway'] = df['vehage'] * (df['alt'] == 'freeway')\n", "\n", "# Male driver\n", "df['male_freeway'] = df['male'] * (df['alt'] == 'freeway')" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "AAINE_sp6irt" }, "source": [ "### Estimate model" ] }, { "cell_type": "markdown", "metadata": { "id": "w8zT7oWs_nz9" }, "source": [ "After creating the model specification, we can use `xlogit` to estimate the model as follows:" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ek-eceEl5-QQ", "outputId": "788668e8-1c68-4904-8a62-782c8121def6" }, "source": [ "from xlogit import MultinomialLogit\n", "varnames=['asc_rural', 'asc_freeway', 'dist_arter', 'dist_rural',\n", " 'dist_freew', 'vehage_rural', 'vehage_freew', 'male_freew']\n", "model = MultinomialLogit()\n", "model.fit(X=df[varnames], y=df['choice'], varnames=varnames,\n", " ids=df['ids'], alts=df['alt'])\n", "model.summary()" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Estimation time= 0.0 seconds\n", "---------------------------------------------------------------------------\n", "Coefficient Estimate Std.Err. z-val P>|z|\n", "---------------------------------------------------------------------------\n", "asc_rural 2.8131275 1.1504189 2.4453072 0.0416 * \n", "asc_freeway -2.6868817 2.2817329 -1.1775619 0.398 \n", "dist_arterial -0.1229102 0.0240053 -5.1201358 4.14e-06 ***\n", "dist_rural -0.1773579 0.0279818 -6.3383224 1.3e-08 ***\n", "dist_freeway -0.0956391 0.0369681 -2.5870738 0.0295 * \n", "vehage_rural 0.1236721 0.0535597 2.3090535 0.057 . \n", "vehage_freeway 0.2268642 0.0755401 3.0032267 0.00969 ** \n", "male_freeway 0.5990000 0.6202114 0.9657996 0.499 \n", "---------------------------------------------------------------------------\n", "Significance: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n", "\n", "Log-Likelihood= -94.440\n", "AIC= 204.881\n", "BIC= 229.019\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "Zf6sYPvjfoeP" }, "source": [ "## Swissmetro Dataset" ] }, { "cell_type": "markdown", "metadata": { "id": "BOWB3Lffg5Qc" }, "source": [ "In this example, we will estimate a Multinomial Logit where each alternative is defined with a different utility specification. The swissmetro dataset is an SP/RP survey dataset popularly used in Biogeme and Pylogit examples. The dataset is available at http://transp-or.epfl.ch/data/swissmetro.dat and [Bierlaire et. al., (2001)](https://transp-or.epfl.ch/documents/proceedings/BierAxhaAbay01.pdf) provides a detailed discussion of the data as wells as its context and collection process. . Note that the dataset is available in wide format; therefore, we need to convert it to long format for `xlogit`." ] }, { "cell_type": "markdown", "metadata": { "id": "n4No84MAeFOM" }, "source": [ "### Read data" ] }, { "cell_type": "markdown", "metadata": { "id": "mUemwG5YjaGg" }, "source": [ "The dataset is imported to the Python environment using `pandas`. Then, two types of samples, ones with a trip purpose different to commute or business and ones with an unknown choice, are filtered out. The original dataset contains 10,729 records, but after filtering, 6,768 records remain for following analysis. Finally, a new column that uniquely identifies each sample is added to the dataframe and the `CHOICE` column, which originally contains a numerical coding of the choices, is mapped to a description that is consistent with the alternatives in the column names. " ] }, { "cell_type": "code", "metadata": { "id": "4jqERhnWhGCc", "colab": { "base_uri": "https://localhost:8080/", "height": 444 }, "outputId": "f1ed82ab-6ea7-4e96-b2dc-bc9284ef5cd1" }, "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "df_wide = pd.read_table(\"http://transp-or.epfl.ch/data/swissmetro.dat\", sep='\\t')\n", "\n", "# Keep only observations for commute and business purposes that contain known choices\n", "df_wide = df_wide[(df_wide['PURPOSE'].isin([1, 3]) & (df_wide['CHOICE'] != 0))]\n", "\n", "df_wide['custom_id'] = np.arange(len(df_wide)) # Add unique identifier\n", "df_wide['CHOICE'] = df_wide['CHOICE'].map({1: 'TRAIN', 2:'SM', 3: 'CAR'})\n", "df_wide" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
GROUPSURVEYSPIDPURPOSEFIRSTTICKETWHOLUGGAGEAGEMALEINCOMEGAORIGINDESTTRAIN_AVCAR_AVSM_AVTRAIN_TTTRAIN_COTRAIN_HESM_TTSM_COSM_HESM_SEATSCAR_TTCAR_COCHOICEcustom_id
020111011030202111111248120635220011765SM0
12011101103020211111034830604910011784SM1
22011101103020211111304860675830011752SM2
3201110110302021111103403063522007252SM3
4201110110302021111130366063422009084SM4
..........................................................................................
8446311939317315120121111081330501730013064TRAIN6763
844731193931731512012111108123053161008080TRAIN6764
844831193931731512012111108166050162008064TRAIN6765
8449311939317315120121111281630531730080104TRAIN6766
8450311939317315120121111081360532130010080TRAIN6767
\n", "

6768 rows × 29 columns

\n", "
" ], "text/plain": [ " GROUP SURVEY SP ID ... CAR_TT CAR_CO CHOICE custom_id\n", "0 2 0 1 1 ... 117 65 SM 0\n", "1 2 0 1 1 ... 117 84 SM 1\n", "2 2 0 1 1 ... 117 52 SM 2\n", "3 2 0 1 1 ... 72 52 SM 3\n", "4 2 0 1 1 ... 90 84 SM 4\n", "... ... ... .. ... ... ... ... ... ...\n", "8446 3 1 1 939 ... 130 64 TRAIN 6763\n", "8447 3 1 1 939 ... 80 80 TRAIN 6764\n", "8448 3 1 1 939 ... 80 64 TRAIN 6765\n", "8449 3 1 1 939 ... 80 104 TRAIN 6766\n", "8450 3 1 1 939 ... 100 80 TRAIN 6767\n", "\n", "[6768 rows x 29 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 2 } ] }, { "cell_type": "markdown", "metadata": { "id": "-GRMhgM2eIPz" }, "source": [ "### Reshape data" ] }, { "cell_type": "markdown", "metadata": { "id": "r9OxW-yNhcal" }, "source": [ "Given that `xlogit` requires the dataset to be provided in the long format, we reshape the dataset using the `wide_to_long` utility provided by `xlogit`. This function takes as input the column that uniquely identifies each sample (`id_col`), the name of column to save the alternatives (`alt_name`), the list of alternatives (`alt_list`), the columns that vary across alternatives (`varying`), and whether the alternative names are prefix in the column names (`alt_is_prefix`).The `wide_to_long` method fills with `NaN` the columns that do not have certain alternatives (e.g. `SEATS` and `HE`). Depending on your specification needs, you can ignore the `NaN` or replace them with zeros. In this case we replaced them with zeros using the `empty_val` parameter. Additional details about the `wide_to_long` function can be found in the [xlogit's documentation](https://xlogit.readthedocs.io/en/latest/api/utils.html)." ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "id": "1KM-BvFvhWed", "outputId": "06502368-e8d6-4de3-be02-50d2db983472" }, "source": [ "from xlogit.utils import wide_to_long\n", "\n", "df = wide_to_long(df_wide, id_col='custom_id', alt_name='alt', sep='_',\n", " alt_list=['TRAIN', 'SM', 'CAR'], empty_val=0,\n", " varying=['TT', 'CO', 'HE', 'AV', 'SEATS'], alt_is_prefix=True)\n", "df" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
custom_idaltTTCOHEAVSEATSGROUPSURVEYSPIDPURPOSEFIRSTTICKETWHOLUGGAGEAGEMALEINCOMEGAORIGINDESTCHOICE
00CAR11765010201110110302021SM
10SM63522010201110110302021SM
20TRAIN1124812010201110110302021SM
31CAR11784010201110110302021SM
41SM60491010201110110302021SM
........................................................................
202996766SM5317301031193931731512012TRAIN
203006766TRAIN12816301031193931731512012TRAIN
203016767CAR1008001031193931731512012TRAIN
203026767SM5321301031193931731512012TRAIN
203036767TRAIN10813601031193931731512012TRAIN
\n", "

20304 rows × 23 columns

\n", "
" ], "text/plain": [ " custom_id alt TT CO HE ... INCOME GA ORIGIN DEST CHOICE\n", "0 0 CAR 117 65 0 ... 2 0 2 1 SM\n", "1 0 SM 63 52 20 ... 2 0 2 1 SM\n", "2 0 TRAIN 112 48 120 ... 2 0 2 1 SM\n", "3 1 CAR 117 84 0 ... 2 0 2 1 SM\n", "4 1 SM 60 49 10 ... 2 0 2 1 SM\n", "... ... ... ... .. ... ... ... .. ... ... ...\n", "20299 6766 SM 53 17 30 ... 2 0 1 2 TRAIN\n", "20300 6766 TRAIN 128 16 30 ... 2 0 1 2 TRAIN\n", "20301 6767 CAR 100 80 0 ... 2 0 1 2 TRAIN\n", "20302 6767 SM 53 21 30 ... 2 0 1 2 TRAIN\n", "20303 6767 TRAIN 108 13 60 ... 2 0 1 2 TRAIN\n", "\n", "[20304 rows x 23 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 3 } ] }, { "cell_type": "markdown", "metadata": { "id": "-lInWLqXk0m8" }, "source": [ "We then scale some variables as per the [examples in Biogeme and Pylogit](https://https://github.com/timothyb0912/pylogit/blob/master/examples/notebooks/Main%20PyLogit%20Example.ipynb). The time and headway variables are converted to hours, the price is scaled, and the new variables `single_luggage`, `free ticket`, `multiple_luggage`, `regular_class` and `train_survey` are created to accommodate the model specification as shown below:" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 444 }, "id": "7HirQJUhkTH6", "outputId": "03945233-8a3f-407a-f634-93b2c80abed3" }, "source": [ "# Scale travel time and headway variables to hours\n", "df['time'] = df['TT'] / 60.0 \n", "df['headway'] = df['HE'] / 60.0 \n", "df['cost'] = df['CO'] / 100.0\n", "\n", "# We set the cost as zero for individuals with an annual pass paid by employer\n", "annual_pass = (df['alt'].isin(['TRAIN', 'SM'])) & ((df[\"GA\"] == 1) | (df[\"WHO\"] == 2))\n", "df[\"cost\"] = df[\"cost\"] * (~annual_pass)\n", "\n", "#Travellers carrying only single luggage\n", "df[\"single_luggage\"] = (df[\"LUGGAGE\"] == 1).astype(int)\n", "\n", "#Travellers carrying more than one luggage\n", "df[\"multiple_luggage\"] = (df[\"LUGGAGE\"] == 3).astype(int)\n", "\n", "# Travellers travelling in classes other than First class\n", "df[\"regular_class\"] = 1 - df[\"FIRST\"]\n", "\n", "# Travellers who responded to the survey while on a train\n", "df[\"train_survey\"] = 1 - df[\"SURVEY\"]\n", "df" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
custom_idaltTTCOHEAVSEATSGROUPSURVEYSPIDPURPOSEFIRSTTICKETWHOLUGGAGEAGEMALEINCOMEGAORIGINDESTCHOICEtimeheadwaycostsingle_luggagemultiple_luggageregular_classtrain_survey
00CAR11765010201110110302021SM1.9500000.0000000.650011
10SM63522010201110110302021SM1.0500000.3333330.520011
20TRAIN1124812010201110110302021SM1.8666672.0000000.480011
31CAR11784010201110110302021SM1.9500000.0000000.840011
41SM60491010201110110302021SM1.0000000.1666670.490011
.............................................................................................
202996766SM5317301031193931731512012TRAIN0.8833330.5000000.171000
203006766TRAIN12816301031193931731512012TRAIN2.1333330.5000000.161000
203016767CAR1008001031193931731512012TRAIN1.6666670.0000000.801000
203026767SM5321301031193931731512012TRAIN0.8833330.5000000.211000
203036767TRAIN10813601031193931731512012TRAIN1.8000001.0000000.131000
\n", "

20304 rows × 30 columns

\n", "
" ], "text/plain": [ " custom_id alt TT ... multiple_luggage regular_class train_survey\n", "0 0 CAR 117 ... 0 1 1\n", "1 0 SM 63 ... 0 1 1\n", "2 0 TRAIN 112 ... 0 1 1\n", "3 1 CAR 117 ... 0 1 1\n", "4 1 SM 60 ... 0 1 1\n", "... ... ... ... ... ... ... ...\n", "20299 6766 SM 53 ... 0 0 0\n", "20300 6766 TRAIN 128 ... 0 0 0\n", "20301 6767 CAR 100 ... 0 0 0\n", "20302 6767 SM 53 ... 0 0 0\n", "20303 6767 TRAIN 108 ... 0 0 0\n", "\n", "[20304 rows x 30 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 4 } ] }, { "cell_type": "markdown", "metadata": { "id": "L26tLi_C2v8-" }, "source": [ "### Create model specification" ] }, { "cell_type": "markdown", "metadata": { "id": "dubKfhWYpam7" }, "source": [ "By operating the dataframe columns, highly-flexible utility specifications can be modeled in `xlogit`. As shown below, alternative specific constants or coefficients can be included in the specification by strategically creating new columns and setting their values depending on the alternative. This flexibility allows even the specification of one or multiple coefficients per alternative or group of alternatives." ] }, { "cell_type": "code", "metadata": { "id": "pA-zQcLcpcAF" }, "source": [ "# Create model specification\n", "# Alternative Specific Constants\n", "df['asc_train'] = np.ones(len(df))*(df['alt'] == 'TRAIN')\n", "df['asc_sm'] = np.ones(len(df))*(df['alt'] == 'SM')\n", "\n", "# Travel cost (One coefficient per alternative)\n", "df['cost_train'] = df['cost']*(df['alt'] == 'TRAIN')\n", "df['cost_sm'] = df['cost']*(df['alt'] == 'SM')\n", "df['cost_car'] = df['cost']*(df['alt'] == 'CAR')\n", "\n", "# Travel time (One coefficient for train and sm and other for car)\n", "df['time_train_sm'] = df['time']*((df['alt'] == 'TRAIN') | (df['alt'] == 'SM'))\n", "df['time_car'] = df['time']*(df['alt'] == 'CAR')\n", "\n", "# Headway (One coefficient per alternative, except for car)\n", "df['headway_train'] = df['headway']*(df['alt'] == 'TRAIN')\n", "df['headway_sm'] = df['headway']*(df['alt'] == 'SM')\n", "\n", "# Seat config (Coefficient only for swissmetro)\n", "df['seatconf_sm'] = df['SEATS']*(df['alt'] == 'SM')\n", "\n", "# Train Survey (Coefficient only for swissmetro)\n", "df['survey_train_sm'] = df['train_survey']* ((df['alt'] == 'TRAIN') | (df['alt'] == 'SM'))\n", "\n", "# Regular class (Coefficient only for swissmetro)\n", "df['regular_class_sm'] = df['regular_class']*(df['alt'] == 'TRAIN')\n", "\n", "# Luggage (Coefficient only for car)\n", "df['single_lug_car'] = df['single_luggage']*(df['alt'] == 'CAR')\n", "df['multip_lug_car'] = df['multiple_luggage']*(df['alt'] == 'CAR')" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "POyPLthmeX9Z" }, "source": [ "### Estimate model" ] }, { "cell_type": "markdown", "metadata": { "id": "pHpj44RxoShn" }, "source": [ "The swissmetro dataset contains unbalanced choice situations across individuals (i.e., some individuals do not have observations for all alternatives). The `avail` option enables estimation for such datasets. `avail` takes the values that indicate the availability of each alternative across individuals.\n", "Once the model specification is complete, the model is estimated as follows:" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "qXvKwbEloaDm", "outputId": "d2ab4c53-4d50-4408-f935-133a8376c68e" }, "source": [ "from xlogit import MultinomialLogit\n", "\n", "varnames=['asc_train', 'asc_sm', 'time_train_sm', 'time_car', 'cost_train',\n", " 'cost_sm', 'cost_car', 'headway_train', 'headway_sm', 'seatconf_sm',\n", " 'survey_train_sm', 'regular_class_sm', 'single_lug_car',\n", " 'multip_lug_car']\n", "model = MultinomialLogit()\n", "model.fit(X=df[varnames],\n", " y=df['CHOICE'],\n", " varnames=varnames,\n", " alts=df['alt'],\n", " ids=df['custom_id'],\n", " avail=df['AV'])\n", "model.summary()" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Estimation time= 0.1 seconds\n", "---------------------------------------------------------------------------\n", "Coefficient Estimate Std.Err. z-val P>|z|\n", "---------------------------------------------------------------------------\n", "asc_train -1.2929512 0.1237556 -10.4476139 2.43e-24 ***\n", "asc_sm -0.5026152 0.1032927 -4.8659312 5.87e-06 ***\n", "time_train_sm -0.6990098 0.0396510 -17.6290608 8.13e-67 ***\n", "time_car -0.7229887 0.0442625 -16.3340968 1.18e-57 ***\n", "cost_train -0.5618773 0.0807075 -6.9618968 2.59e-11 ***\n", "cost_sm -0.2816843 0.0417373 -6.7489902 1.1e-10 ***\n", "cost_car -0.5139009 0.0970406 -5.2957304 6.66e-07 ***\n", "headway_train -0.3143519 0.0505955 -6.2130370 3.49e-09 ***\n", "headway_sm -0.3773753 0.1652542 -2.2836046 0.0589 . \n", "seatconf_sm -0.7824379 0.0758912 -10.3100010 9.91e-24 ***\n", "survey_train_sm 2.5424946 0.0921336 27.5957408 1.5e-157 ***\n", "regular_class_sm 0.5650259 0.0652226 8.6630441 4.93e-17 ***\n", "single_lug_car 0.4227658 0.0611684 6.9115077 3.66e-11 ***\n", "multip_lug_car 1.4141058 0.2373032 5.9590672 1.62e-08 ***\n", "---------------------------------------------------------------------------\n", "Significance: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n", "\n", "Log-Likelihood= -5159.258\n", "AIC= 10346.517\n", "BIC= 10441.996\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "AEJIluSUqL9r" }, "source": [ "The estimates are identical to those provided by PyLogit (Brathwaite and Walker 2018) as shown in [this link](https://github.com/timothyb0912/pylogit/blob/master/examples/notebooks/Main%20PyLogit%20Example.ipynb) " ] }, { "cell_type": "markdown", "metadata": { "id": "4Rur4sCUfMbi" }, "source": [ "## Fishing Dataset" ] }, { "cell_type": "markdown", "metadata": { "id": "REmsMhK8e41H" }, "source": [ "The following example illustrates the estimation of a Multinomial Logit model for choices of 1,182 individuals for sport fishing modes using `xlogit`. The goal is to analyze the market shares of four alternatives (i.e., beach, pier, boat, and charter) based on their cost and fish catch rate. [Cameron (2005)](http://cameron.econ.ucdavis.edu/mmabook/mma.html) provides additional details about this dataset. The following code illustrates how to use `xlogit` to estimate the model parameters." ] }, { "cell_type": "markdown", "metadata": { "id": "CUDXAA26kOfK" }, "source": [ "### Read data" ] }, { "cell_type": "markdown", "metadata": { "id": "JquXmr1xQo-C" }, "source": [ "The data to be analyzed can be imported to Python using any preferred method. In this example, the data in CSV format was imported using the popular `pandas` Python package. However, it is worth highlighting that `xlogit` does not depend on the `pandas` package, as `xlogit` can take any array-like structure as input. This represents an additional advantage because `xlogit` can be used with any preferred dataframe library, and not only with `pandas`." ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "id": "B5JFuzuIkIig", "outputId": "01b98073-5441-44fd-a656-6b5870582554" }, "source": [ "import pandas as pd\n", "df = pd.read_csv(\"https://raw.github.com/arteagac/xlogit/master/examples/data/fishing_long.csv\")\n", "df" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idaltchoiceincomepricecatch
01beach07083.33170157.9300.0678
11boat07083.33170157.9300.2601
21charter17083.33170182.9300.5391
31pier07083.33170157.9300.0503
42beach01249.9998015.1140.1049
.....................
47231181pier0416.6666836.6360.4522
47241182beach06250.00130339.8900.2537
47251182boat16250.00130235.4360.6817
47261182charter06250.00130260.4362.3014
47271182pier06250.00130339.8900.1498
\n", "

4728 rows × 6 columns

\n", "
" ], "text/plain": [ " id alt choice income price catch\n", "0 1 beach 0 7083.33170 157.930 0.0678\n", "1 1 boat 0 7083.33170 157.930 0.2601\n", "2 1 charter 1 7083.33170 182.930 0.5391\n", "3 1 pier 0 7083.33170 157.930 0.0503\n", "4 2 beach 0 1249.99980 15.114 0.1049\n", "... ... ... ... ... ... ...\n", "4723 1181 pier 0 416.66668 36.636 0.4522\n", "4724 1182 beach 0 6250.00130 339.890 0.2537\n", "4725 1182 boat 1 6250.00130 235.436 0.6817\n", "4726 1182 charter 0 6250.00130 260.436 2.3014\n", "4727 1182 pier 0 6250.00130 339.890 0.1498\n", "\n", "[4728 rows x 6 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 7 } ] }, { "cell_type": "markdown", "metadata": { "id": "HOgCue_r_69x" }, "source": [ "### Estimate the model" ] }, { "cell_type": "markdown", "metadata": { "id": "Me6W8tAjQte6" }, "source": [ "Once the data is in the `Python` environment, `xlogit` can be used to fit the model, as shown below. The `MultinomialLogit` class is imported from `xlogit`, and its constructor is used to initialize a new model. The `fit` method estimates the model using the input data and estimation criteria provided as arguments to the method's call. The arguments of the `fit` methods are described in [`xlogit`'s documentation](https://https://xlogit.readthedocs.io/en/latest/api/).\n" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "JnYczrXDksg5", "outputId": "ef8a7f6e-9467-4dbc-e431-2c90d4127a16" }, "source": [ "from xlogit import MultinomialLogit\n", "\n", "varnames = ['income','price', 'catch']\n", "model = MultinomialLogit()\n", "model.fit(X=df[varnames],\n", " y=df['choice'],\n", " varnames=varnames,\n", " isvars=['income'],\n", " ids=df['id'],\n", " alts=df['alt'],\n", " fit_intercept=True)\n", "model.summary()" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Estimation time= 0.0 seconds\n", "---------------------------------------------------------------------------\n", "Coefficient Estimate Std.Err. z-val P>|z|\n", "---------------------------------------------------------------------------\n", "_intercept.boat 0.5273413 0.2017396 2.6139700 0.0264 * \n", "_intercept.charter 1.6943827 0.2096186 8.0831712 1.2e-14 ***\n", "_intercept.pier 0.7779899 0.2051062 3.7931076 0.000622 ***\n", "income.boat 0.0000894 0.0000473 1.8906643 0.134 \n", "income.charter -0.0000333 0.0000480 -0.6936462 0.627 \n", "income.pier -0.0001276 0.0000467 -2.7335425 0.0192 * \n", "price -0.0251161 0.0015240 -16.4800784 5.89e-54 ***\n", "catch 0.3578374 0.0985344 3.6315992 0.00113 ** \n", "---------------------------------------------------------------------------\n", "Significance: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n", "\n", "Log-Likelihood= -1215.138\n", "AIC= 2446.275\n", "BIC= 2486.875\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "22Ly9IcJQPMQ" }, "source": [ "## Heating Dataset" ] }, { "cell_type": "markdown", "metadata": { "id": "HVA-R5U0T9m-" }, "source": [ "For this example, we use the Heating dataset from R's mlogit package (Croissant 2020), which contains choice of heating systems in California house. The dataset contains 90 observations, with 8 explanatory variables and more information can be found in https://cran.r-project.org/web/packages/mlogit/vignettes/e1mlogit.html." ] }, { "cell_type": "markdown", "metadata": { "id": "zTJ27gpReSAd" }, "source": [ "### Read data" ] }, { "cell_type": "code", "metadata": { "id": "wmIl7b0iSse5", "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "outputId": "a4558c22-484a-4c45-e3d1-4eeb4fad0655" }, "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "df_wide = pd.read_csv(\"https://raw.github.com/arteagac/xlogit/master/examples/data/heating_wide.csv\")\n", "df_wide" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idcasedepvaric.gcic.gric.ecic.eric.hpoc.gcoc.groc.ecoc.eroc.hpincomeagehedroomsregion
01gc866.00962.64859.90995.761135.50199.69151.72553.34505.60237.887256ncostl
12gc727.93758.89796.82894.69968.90168.66168.66520.24486.49199.195605scostl
23gc599.48783.05719.86900.111048.30165.58137.80439.06404.74171.474652ncostl
34er835.17793.06761.25831.041048.70180.88147.14483.00425.22222.952504scostl
45er755.59846.29858.86985.64883.05174.91138.90404.41389.52178.492256valley
...................................................
895896gc766.39877.71751.59869.78942.70142.61136.21474.48420.65203.006204mountn
896897gc1128.501167.801047.601292.601297.10207.40213.77705.36551.61243.767457scostl
897898gc787.101055.20842.791041.301064.80175.05141.63478.86448.61254.515607scostl
898899gc860.561081.30799.761123.201218.20211.04151.31495.20401.56246.485506scostl
899900gc893.941119.90967.881091.701387.50175.80180.11518.68458.53245.132654ncostl
\n", "

900 rows × 16 columns

\n", "
" ], "text/plain": [ " idcase depvar ic.gc ic.gr ... income agehed rooms region\n", "0 1 gc 866.00 962.64 ... 7 25 6 ncostl\n", "1 2 gc 727.93 758.89 ... 5 60 5 scostl\n", "2 3 gc 599.48 783.05 ... 4 65 2 ncostl\n", "3 4 er 835.17 793.06 ... 2 50 4 scostl\n", "4 5 er 755.59 846.29 ... 2 25 6 valley\n", ".. ... ... ... ... ... ... ... ... ...\n", "895 896 gc 766.39 877.71 ... 6 20 4 mountn\n", "896 897 gc 1128.50 1167.80 ... 7 45 7 scostl\n", "897 898 gc 787.10 1055.20 ... 5 60 7 scostl\n", "898 899 gc 860.56 1081.30 ... 5 50 6 scostl\n", "899 900 gc 893.94 1119.90 ... 2 65 4 ncostl\n", "\n", "[900 rows x 16 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 9 } ] }, { "cell_type": "markdown", "metadata": { "id": "bma2TRUleTyy" }, "source": [ "### Reshape data" ] }, { "cell_type": "markdown", "metadata": { "id": "co24t943VCqW" }, "source": [ "The dataset is available in wide format. Since `xlogit` requires the data in long format, we convert it as shown below:" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "id": "Dd4hkXnQS7u-", "outputId": "8c96665d-c357-4299-9e24-453674a2c105" }, "source": [ "from xlogit.utils import wide_to_long\n", "df = wide_to_long(df_wide, id_col='idcase', alt_name='alt', varying=['ic', 'oc'],\n", " alt_list=['ec', 'er', 'gc', 'gr', 'hp'], sep='.', empty_val=0)\n", "df" ], "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idcasealticocdepvarincomeagehedroomsregion
01ec859.90553.34gc7256ncostl
11er995.76505.60gc7256ncostl
21gc866.00199.69gc7256ncostl
31gr962.64151.72gc7256ncostl
41hp1135.50237.88gc7256ncostl
..............................
4495900ec967.88518.68gc2654ncostl
4496900er1091.70458.53gc2654ncostl
4497900gc893.94175.80gc2654ncostl
4498900gr1119.90180.11gc2654ncostl
4499900hp1387.50245.13gc2654ncostl
\n", "

4500 rows × 9 columns

\n", "
" ], "text/plain": [ " idcase alt ic oc depvar income agehed rooms region\n", "0 1 ec 859.90 553.34 gc 7 25 6 ncostl\n", "1 1 er 995.76 505.60 gc 7 25 6 ncostl\n", "2 1 gc 866.00 199.69 gc 7 25 6 ncostl\n", "3 1 gr 962.64 151.72 gc 7 25 6 ncostl\n", "4 1 hp 1135.50 237.88 gc 7 25 6 ncostl\n", "... ... .. ... ... ... ... ... ... ...\n", "4495 900 ec 967.88 518.68 gc 2 65 4 ncostl\n", "4496 900 er 1091.70 458.53 gc 2 65 4 ncostl\n", "4497 900 gc 893.94 175.80 gc 2 65 4 ncostl\n", "4498 900 gr 1119.90 180.11 gc 2 65 4 ncostl\n", "4499 900 hp 1387.50 245.13 gc 2 65 4 ncostl\n", "\n", "[4500 rows x 9 columns]" ] }, "metadata": { "tags": [] }, "execution_count": 10 } ] }, { "cell_type": "markdown", "metadata": { "id": "qV9gAmnHed2f" }, "source": [ "### Estimate the model" ] }, { "cell_type": "markdown", "metadata": { "id": "8ozNDq8CVmwy" }, "source": [ "We now import `MultinomialLogit` from xlogit and estimate the model as shown below:" ] }, { "cell_type": "code", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "JJJL5oeq1rRh", "outputId": "7e03a602-6df0-4c30-c5bb-abd72c3fded7" }, "source": [ "from xlogit import MultinomialLogit\n", "\n", "varnames = ['ic', 'oc']\n", "model = MultinomialLogit()\n", "model.fit(X=df[varnames],\n", " y=df['depvar'],\n", " varnames=varnames,\n", " alts=df['alt'],\n", " ids=df['idcase'])\n", "model.summary()" ], "execution_count": null, "outputs": [ { "output_type": "stream", "text": [ "Estimation time= 0.0 seconds\n", "---------------------------------------------------------------------------\n", "Coefficient Estimate Std.Err. z-val P>|z|\n", "---------------------------------------------------------------------------\n", "ic -0.0062318 0.0003516 -17.7222802 2.16e-59 ***\n", "oc -0.0045800 0.0003208 -14.2767999 9.14e-41 ***\n", "---------------------------------------------------------------------------\n", "Significance: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n", "\n", "Log-Likelihood= -1095.237\n", "AIC= 2194.474\n", "BIC= 2204.079\n" ], "name": "stdout" } ] }, { "cell_type": "markdown", "metadata": { "id": "I1kynb9l4KUN" }, "source": [ "Note that these results are identical to the ones estimated by the R mlogit package: https://cran.r-project.org/web/packages/mlogit/vignettes/e1mlogit.html" ] }, { "cell_type": "markdown", "metadata": { "id": "Aw_ONyTnigBI" }, "source": [ "## References" ] }, { "cell_type": "markdown", "metadata": { "id": "KRn6BBFVihVO" }, "source": [ "- Bierlaire, M. (2018). PandasBiogeme: a short introduction. EPFL (Transport and Mobility Laboratory, ENAC).\n", "\n", "- Brathwaite, T., & Walker, J. L. (2018). Asymmetric, closed-form, finite-parameter models of multinomial choice. Journal of Choice Modelling, 29, 78–112. \n", "\n", "- Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: methods and applications. Cambridge university press.\n", "\n", "- Croissant, Y. (2020). Estimation of Random Utility Models in R: The mlogit Package. Journal of Statistical Software, 95(1), 1-41.\n", "\n", "- Washington, S., Karlaftis, M., Mannering, F., & Anastasopoulos, P. (2020). Statistical and econometric methods for transportation data analysis. Chapman and Hall/CRC." ] } ] }