{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "convert_data_wide_to_long.ipynb",
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KGmrniFLZ_rx"
},
"source": [
"# Convert Data from Wide to Long Format"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RIC-iofnaMyn"
},
"source": [
"`xlogit` processes data only in long format. However, converting data from wide to long format is a relatively easy task and `xlogit` provides a convinient tool for this purpose. In this tutorial, we'll explain the reshaping using two examples. Let's start by importing the `pandas` and `numpy` python packages."
]
},
{
"cell_type": "code",
"metadata": {
"id": "fueNeIAbv8sW"
},
"source": [
"import pandas as pd\n",
"import numpy as np"
],
"execution_count": 1,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "w8tpAUVvBajn"
},
"source": [
"In case you haven't installed `xlogit`, you can install it using `pip` as follows:"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "bc5G_n5yBhyL",
"outputId": "3eb7d947-2d8a-4847-e1f8-ee28613b25f8"
},
"source": [
"!pip install xlogit"
],
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"text": [
"Collecting xlogit\n",
" Downloading https://files.pythonhosted.org/packages/60/5f/9bc576d180c366af77bc04e268536e9e34be23c52a520918aa0cb56b438e/xlogit-0.1.3-py3-none-any.whl\n",
"Requirement already satisfied: scipy>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from xlogit) (1.4.1)\n",
"Requirement already satisfied: numpy>=1.13.1 in /usr/local/lib/python3.7/dist-packages (from xlogit) (1.19.5)\n",
"Installing collected packages: xlogit\n",
"Successfully installed xlogit-0.1.3\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "pHFbKUIkafhX"
},
"source": [
"## Basic example"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kpaNiP4VahUp"
},
"source": [
"This example provides a quick intuition about the format conversion using `pandas`."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bLhROdletgFE"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-Lbu6mGro4Xl"
},
"source": [
"Let's create some artificial data in wide format for our example. Let's assume that our dataset contains choices among three transportation modes (i.e. car, train, and bus) for two individuals identified by `id`. We assume we collected the income of the individuals, and the time and cost of the transportation modes and saved the data in the `df_wide` dataframe as follows:"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 112
},
"id": "cHVYGTFjZ4km",
"outputId": "33d409ed-77e8-4816-9c4c-e47a2713ab4e"
},
"source": [
"# Create example dataset in wide format\n",
"df_wide = pd.DataFrame({'id': [1, 2],\n",
" 'income': [4, 2],\n",
" 'time_car': [2, 1],\n",
" 'time_train': [2, 4],\n",
" 'time_bus': [4, 5],\n",
" 'cost_car': [3, 7],\n",
" 'cost_train': [6, 8],\n",
" 'cost_bus': [9, 2],\n",
" 'choice': ['bus', 'car']\n",
" })\n",
"df_wide"
],
"execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" income | \n",
" time_car | \n",
" time_train | \n",
" time_bus | \n",
" cost_car | \n",
" cost_train | \n",
" cost_bus | \n",
" choice | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1 | \n",
" 4 | \n",
" 2 | \n",
" 2 | \n",
" 4 | \n",
" 3 | \n",
" 6 | \n",
" 9 | \n",
" bus | \n",
"
\n",
" \n",
" | 1 | \n",
" 2 | \n",
" 2 | \n",
" 1 | \n",
" 4 | \n",
" 5 | \n",
" 7 | \n",
" 8 | \n",
" 2 | \n",
" car | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id income time_car time_train ... cost_car cost_train cost_bus choice\n",
"0 1 4 2 2 ... 3 6 9 bus\n",
"1 2 2 1 4 ... 7 8 2 car\n",
"\n",
"[2 rows x 9 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 3
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "noXqdOh0crsq"
},
"source": [
"The dataframe `df_wide` is in wide format because every row represent a sample in the dataset. The variables `time` and `cost` have one column for each choice alternative, as these as alternative-specific variables. After parsing these data to long format, we want the alternative-specific variables to be in rows and not in columns; therefore, in long format every row is associated with an alternative."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "F_JmscYExU6j"
},
"source": [
"To convert `df_wide` to long format we use the `wide_to_long` function provided by `xlogit`."
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 237
},
"id": "Pyg0jFaBjw5p",
"outputId": "e29909cd-b410-4167-e07c-440c57e116e4"
},
"source": [
"from xlogit.utils import wide_to_long\n",
"df_long = wide_to_long(df_wide, id_col='id', alt_list=['car', 'train', 'bus'],\n",
" varying=['time', 'cost'], alt_name='alt', sep='_')\n",
"\n",
"df_long"
],
"execution_count": 4,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" alt | \n",
" time | \n",
" cost | \n",
" income | \n",
" choice | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1 | \n",
" bus | \n",
" 4 | \n",
" 9 | \n",
" 4 | \n",
" bus | \n",
"
\n",
" \n",
" | 1 | \n",
" 1 | \n",
" car | \n",
" 2 | \n",
" 3 | \n",
" 4 | \n",
" bus | \n",
"
\n",
" \n",
" | 2 | \n",
" 1 | \n",
" train | \n",
" 2 | \n",
" 6 | \n",
" 4 | \n",
" bus | \n",
"
\n",
" \n",
" | 3 | \n",
" 2 | \n",
" bus | \n",
" 5 | \n",
" 2 | \n",
" 2 | \n",
" car | \n",
"
\n",
" \n",
" | 4 | \n",
" 2 | \n",
" car | \n",
" 1 | \n",
" 7 | \n",
" 2 | \n",
" car | \n",
"
\n",
" \n",
" | 5 | \n",
" 2 | \n",
" train | \n",
" 4 | \n",
" 8 | \n",
" 2 | \n",
" car | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" id alt time cost income choice\n",
"0 1 bus 4 9 4 bus\n",
"1 1 car 2 3 4 bus\n",
"2 1 train 2 6 4 bus\n",
"3 2 bus 5 2 2 car\n",
"4 2 car 1 7 2 car\n",
"5 2 train 4 8 2 car"
]
},
"metadata": {
"tags": []
},
"execution_count": 4
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6WgA2bnxnTQQ"
},
"source": [
"In the long format shown above, every three rows represent an individual and every row represents an alternative. An additional column called `alt` was created to indicate the alternative associated to a row, and now the variables `time` and `cost` have their values in different rows insted of columns. The rest of the variables (i.e. `id`, `choice`, and `income`) are simply repeated across rows as they are associated to the same individual. For instance, the individual with `id` equal to `1` has an `income` of `4`, which is repeated throughout the rows of this individual. This data in long format can be now processed by `xlogit`.\n",
"\n",
"The `wide_to_long` function takes as input the following parameters:\n",
"- `dataframe`: The wide-format DataFrame.\n",
"- `id_col` : Name of the column that uniquely identifies each sample.\n",
"- `alt_list`: List of alternatives.\n",
"- `alt_name`: Name of the alternatives column in returned dataset.\n",
"- `varying`: List of column names that vary across alternatives.\n",
"- `sep`: Separator of column names that vary across alternatives.\n",
"- `alt_is_prefix`: True if alternative is prefix of the variable name or False if it is suffix.\n",
"- `empty_val`: Value to fill when alternative not available for a certain variable.\n",
"\n",
"The current version of `xlogit` only supports wide to long conversion for `Pandas` dataframes. For additional details about this function, consult the documentation at: https://xlogit.readthedocs.io/en/latest/api/utils.html#xlogit.utils.wide_to_long"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "L_AvJKDSt7f3"
},
"source": [
"## Admissions dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DXCXSB73wmMw"
},
"source": [
"This is a dataset used in a binary logit example in [UCLA IDRE's website](https://stats.idre.ucla.edu/stata/dae/logistic-regression/). The idea is to predict model the admission to a program based on the candidate's gre, gpa and school's rank. The outcome variable is `admit`, which is equal to 1 when the student was admited or zero otherwise. Note that we added an `id` column to uniquely identify each row.\n",
"> Note that this dataset does not contain alternative-specific variables. All the explanatory variables are individual specific."
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 424
},
"id": "NE08CpzBvulM",
"outputId": "8cea8abf-2c8a-4009-e5f2-fa37fea3bfab"
},
"source": [
"import pandas as pd\n",
"\n",
"df_wide = pd.read_stata(\"https://stats.idre.ucla.edu/stat/stata/dae/binary.dta\")\n",
"df_wide['id'] = np.arange(len(df_wide)) # Add unique identifier column\n",
"df_wide"
],
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" admit | \n",
" gre | \n",
" gpa | \n",
" rank | \n",
" id | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0.0 | \n",
" 380.0 | \n",
" 3.61 | \n",
" 3.0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 1 | \n",
" 1.0 | \n",
" 660.0 | \n",
" 3.67 | \n",
" 3.0 | \n",
" 1 | \n",
"
\n",
" \n",
" | 2 | \n",
" 1.0 | \n",
" 800.0 | \n",
" 4.00 | \n",
" 1.0 | \n",
" 2 | \n",
"
\n",
" \n",
" | 3 | \n",
" 1.0 | \n",
" 640.0 | \n",
" 3.19 | \n",
" 4.0 | \n",
" 3 | \n",
"
\n",
" \n",
" | 4 | \n",
" 0.0 | \n",
" 520.0 | \n",
" 2.93 | \n",
" 4.0 | \n",
" 4 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 395 | \n",
" 0.0 | \n",
" 620.0 | \n",
" 4.00 | \n",
" 2.0 | \n",
" 395 | \n",
"
\n",
" \n",
" | 396 | \n",
" 0.0 | \n",
" 560.0 | \n",
" 3.04 | \n",
" 3.0 | \n",
" 396 | \n",
"
\n",
" \n",
" | 397 | \n",
" 0.0 | \n",
" 460.0 | \n",
" 2.63 | \n",
" 2.0 | \n",
" 397 | \n",
"
\n",
" \n",
" | 398 | \n",
" 0.0 | \n",
" 700.0 | \n",
" 3.65 | \n",
" 2.0 | \n",
" 398 | \n",
"
\n",
" \n",
" | 399 | \n",
" 0.0 | \n",
" 600.0 | \n",
" 3.89 | \n",
" 3.0 | \n",
" 399 | \n",
"
\n",
" \n",
"
\n",
"
400 rows × 5 columns
\n",
"
"
],
"text/plain": [
" admit gre gpa rank id\n",
"0 0.0 380.0 3.61 3.0 0\n",
"1 1.0 660.0 3.67 3.0 1\n",
"2 1.0 800.0 4.00 1.0 2\n",
"3 1.0 640.0 3.19 4.0 3\n",
"4 0.0 520.0 2.93 4.0 4\n",
".. ... ... ... ... ...\n",
"395 0.0 620.0 4.00 2.0 395\n",
"396 0.0 560.0 3.04 3.0 396\n",
"397 0.0 460.0 2.63 2.0 397\n",
"398 0.0 700.0 3.65 2.0 398\n",
"399 0.0 600.0 3.89 3.0 399\n",
"\n",
"[400 rows x 5 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "eWJ6qflTyoDL"
},
"source": [
"Now we can proceed to convert the data to long format as follows:"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 424
},
"id": "XxOGIS8pyrg_",
"outputId": "ad16bf39-059b-49d3-9d31-6b77258e2f58"
},
"source": [
"df = wide_to_long(df_wide, id_col='id', alt_list=[0, 1], alt_name='alt')\n",
"df"
],
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" alt | \n",
" admit | \n",
" gre | \n",
" gpa | \n",
" rank | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0 | \n",
" 0 | \n",
" 0.0 | \n",
" 380.0 | \n",
" 3.61 | \n",
" 3.0 | \n",
"
\n",
" \n",
" | 1 | \n",
" 0 | \n",
" 1 | \n",
" 0.0 | \n",
" 380.0 | \n",
" 3.61 | \n",
" 3.0 | \n",
"
\n",
" \n",
" | 2 | \n",
" 1 | \n",
" 0 | \n",
" 1.0 | \n",
" 660.0 | \n",
" 3.67 | \n",
" 3.0 | \n",
"
\n",
" \n",
" | 3 | \n",
" 1 | \n",
" 1 | \n",
" 1.0 | \n",
" 660.0 | \n",
" 3.67 | \n",
" 3.0 | \n",
"
\n",
" \n",
" | 4 | \n",
" 2 | \n",
" 0 | \n",
" 1.0 | \n",
" 800.0 | \n",
" 4.00 | \n",
" 1.0 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 795 | \n",
" 397 | \n",
" 1 | \n",
" 0.0 | \n",
" 460.0 | \n",
" 2.63 | \n",
" 2.0 | \n",
"
\n",
" \n",
" | 796 | \n",
" 398 | \n",
" 0 | \n",
" 0.0 | \n",
" 700.0 | \n",
" 3.65 | \n",
" 2.0 | \n",
"
\n",
" \n",
" | 797 | \n",
" 398 | \n",
" 1 | \n",
" 0.0 | \n",
" 700.0 | \n",
" 3.65 | \n",
" 2.0 | \n",
"
\n",
" \n",
" | 798 | \n",
" 399 | \n",
" 0 | \n",
" 0.0 | \n",
" 600.0 | \n",
" 3.89 | \n",
" 3.0 | \n",
"
\n",
" \n",
" | 799 | \n",
" 399 | \n",
" 1 | \n",
" 0.0 | \n",
" 600.0 | \n",
" 3.89 | \n",
" 3.0 | \n",
"
\n",
" \n",
"
\n",
"
800 rows × 6 columns
\n",
"
"
],
"text/plain": [
" id alt admit gre gpa rank\n",
"0 0 0 0.0 380.0 3.61 3.0\n",
"1 0 1 0.0 380.0 3.61 3.0\n",
"2 1 0 1.0 660.0 3.67 3.0\n",
"3 1 1 1.0 660.0 3.67 3.0\n",
"4 2 0 1.0 800.0 4.00 1.0\n",
".. ... ... ... ... ... ...\n",
"795 397 1 0.0 460.0 2.63 2.0\n",
"796 398 0 0.0 700.0 3.65 2.0\n",
"797 398 1 0.0 700.0 3.65 2.0\n",
"798 399 0 0.0 600.0 3.89 3.0\n",
"799 399 1 0.0 600.0 3.89 3.0\n",
"\n",
"[800 rows x 6 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 6
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "to9eEKhfFog1"
},
"source": [
"Before fitting the model, let's create dummy representations of the `rank` variable to include them in the model."
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 424
},
"id": "lunx66smGy5d",
"outputId": "f5358725-260d-465f-df7b-dca1cfd5de3d"
},
"source": [
"df['rank'] = df['rank'].astype(int) # Convert to int first for better column naming\n",
"df = df.join(pd.get_dummies(df['rank'], prefix='rank'))\n",
"df"
],
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" id | \n",
" alt | \n",
" admit | \n",
" gre | \n",
" gpa | \n",
" rank | \n",
" rank_1 | \n",
" rank_2 | \n",
" rank_3 | \n",
" rank_4 | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0 | \n",
" 0 | \n",
" 0.0 | \n",
" 380.0 | \n",
" 3.61 | \n",
" 3 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" | 1 | \n",
" 0 | \n",
" 1 | \n",
" 0.0 | \n",
" 380.0 | \n",
" 3.61 | \n",
" 3 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" | 2 | \n",
" 1 | \n",
" 0 | \n",
" 1.0 | \n",
" 660.0 | \n",
" 3.67 | \n",
" 3 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" | 3 | \n",
" 1 | \n",
" 1 | \n",
" 1.0 | \n",
" 660.0 | \n",
" 3.67 | \n",
" 3 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" | 4 | \n",
" 2 | \n",
" 0 | \n",
" 1.0 | \n",
" 800.0 | \n",
" 4.00 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" | 795 | \n",
" 397 | \n",
" 1 | \n",
" 0.0 | \n",
" 460.0 | \n",
" 2.63 | \n",
" 2 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 796 | \n",
" 398 | \n",
" 0 | \n",
" 0.0 | \n",
" 700.0 | \n",
" 3.65 | \n",
" 2 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 797 | \n",
" 398 | \n",
" 1 | \n",
" 0.0 | \n",
" 700.0 | \n",
" 3.65 | \n",
" 2 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 798 | \n",
" 399 | \n",
" 0 | \n",
" 0.0 | \n",
" 600.0 | \n",
" 3.89 | \n",
" 3 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" | 799 | \n",
" 399 | \n",
" 1 | \n",
" 0.0 | \n",
" 600.0 | \n",
" 3.89 | \n",
" 3 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
800 rows × 10 columns
\n",
"
"
],
"text/plain": [
" id alt admit gre gpa rank rank_1 rank_2 rank_3 rank_4\n",
"0 0 0 0.0 380.0 3.61 3 0 0 1 0\n",
"1 0 1 0.0 380.0 3.61 3 0 0 1 0\n",
"2 1 0 1.0 660.0 3.67 3 0 0 1 0\n",
"3 1 1 1.0 660.0 3.67 3 0 0 1 0\n",
"4 2 0 1.0 800.0 4.00 1 1 0 0 0\n",
".. ... ... ... ... ... ... ... ... ... ...\n",
"795 397 1 0.0 460.0 2.63 2 0 1 0 0\n",
"796 398 0 0.0 700.0 3.65 2 0 1 0 0\n",
"797 398 1 0.0 700.0 3.65 2 0 1 0 0\n",
"798 399 0 0.0 600.0 3.89 3 0 0 1 0\n",
"799 399 1 0.0 600.0 3.89 3 0 0 1 0\n",
"\n",
"[800 rows x 10 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 7
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UOYVPLTc20up"
},
"source": [
"Now we can use these data in `xlogit`, as shown below. Note that the base category for `rank`, called `rank_1`, was not included in the model."
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "QqKxaMwgz7bq",
"outputId": "25f39957-e9f9-47b6-9ea0-2d37cb619e2f"
},
"source": [
"from xlogit import MultinomialLogit\n",
"\n",
"model = MultinomialLogit()\n",
"varnames = ['gpa', 'gre', 'rank_2', 'rank_3', 'rank_4']\n",
"model.fit(X=df[varnames],\n",
" y=df['admit'],\n",
" varnames=varnames,\n",
" isvars=varnames,\n",
" ids=df['id'],\n",
" alts=df['alt'],\n",
" fit_intercept=True)\n",
"model.summary()"
],
"execution_count": 8,
"outputs": [
{
"output_type": "stream",
"text": [
"Estimation time= 0.0 seconds\n",
"---------------------------------------------------------------------------\n",
"Coefficient Estimate Std.Err. z-val P>|z|\n",
"---------------------------------------------------------------------------\n",
"_intercept.1 -3.9901156 1.0668058 -3.7402454 0.00081 ***\n",
"gpa.1 0.8040565 0.2998838 2.6812267 0.0224 * \n",
"gre.1 0.0022645 0.0009793 2.3124578 0.0556 . \n",
"rank_2.1 -0.6754427 0.3038699 -2.2228022 0.068 . \n",
"rank_3.1 -1.3402258 0.3329743 -4.0250129 0.000278 ***\n",
"rank_4.1 -1.5514728 0.4055016 -3.8260580 0.000591 ***\n",
"---------------------------------------------------------------------------\n",
"Significance: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1\n",
"\n",
"Log-Likelihood= -229.259\n",
"AIC= 470.517\n",
"BIC= 494.466\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "oo903wY5JLbX"
},
"source": [
"Note that the estimation results are identical to those displayed in the [original example in UCLA IDRE's website](https://stats.idre.ucla.edu/stata/dae/logistic-regression/).\n",
"The `.1` suffix for the coefficient names represents the alternative that they are associated with. Given that in this case it is a binary logit, the coefficients are interpreted as being relative to the base alternative `0`."
]
}
]
}