Multinomial Logit¶

This is a step-by-step guide on how to estimate Multinomial Logit models using the xlogit package. You can interactively execute the code in this guide by opening it Google Colab using the following link:

Install `xlogit` package¶

Install xlogit using pip as shown below.

[ ]:

!pip install xlogit

Collecting xlogit
  Downloading https://files.pythonhosted.org/packages/60/5f/9bc576d180c366af77bc04e268536e9e34be23c52a520918aa0cb56b438e/xlogit-0.1.3-py3-none-any.whl
Requirement already satisfied: numpy>=1.13.1 in /usr/local/lib/python3.7/dist-packages (from xlogit) (1.19.5)
Requirement already satisfied: scipy>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from xlogit) (1.4.1)
Installing collected packages: xlogit
Successfully installed xlogit-0.1.3

Route Choice Dataset¶

This dataset contains choices of 151 commuters among three Home-to-work route alternatives. The three alternatives are arterial, rural, and freeway roads. This dataset was taken from Example 13.1 of the book “Statistical and econometric methods for transportation data analysis” (Washintong et. al., 2011).

Read data¶

We start by importing the data using pandas and renaming the columns of interest (choice, distance, male, and vehicle model). In addition, we create a column with the name of the alternatives and a column that uniquely identifies every observation in the dataset. Note that this dataset is long format.

[ ]:

import pandas as pd
import numpy as np
df = pd.read_csv("https://engineering.purdue.edu/~flm/StatEcon-Files/Ex13-1.txt",
                 sep="\t", header=None, prefix="x")
df.rename(columns={'x0': 'choice', 'x6': 'dist', 'x10': 'male', 'x14': 'vehmodel'},
          inplace=True)  # Rename columns of interest
df['alt'] = np.tile(['arterial', 'rural', 'freeway'], len(df)//3)  # Add column with alternatives
df['ids'] = np.repeat(np.arange(len(df)//3), 3)  # Add column with unique ids
df['vehage'] = 86 - df['vehmodel']
df

	choice	x1	x2	x3	x4	x5	dist	x7	x8	x9	male	x11	x12	x13	vehmodel	x15	x16	alt	ids	vehage
0	1	1	0	0	460	14	48	0	0	2	0	1	0	1	86	0	28	arterial	0	0
1	0	0	1	0	440	7	44	0	0	2	0	1	0	1	86	0	28	rural	0	0
2	0	0	0	1	130	7	61	0	0	2	0	1	0	1	86	0	28	freeway	0	0
3	1	1	0	0	595	13	59	1	0	2	1	1	0	2	85	0	27	arterial	1	1
4	0	0	1	0	515	13	70	1	0	2	1	1	0	2	85	0	27	rural	1	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
448	0	0	1	0	325	10	70	0	1	2	1	1	0	1	84	1	24	rural	149	2
449	1	0	0	1	200	5	74	0	1	2	1	1	0	1	84	1	24	freeway	149	2
450	0	1	0	0	900	14	51	1	0	2	1	1	0	3	83	1	18	arterial	150	3
451	0	0	1	0	550	7	47	1	0	2	1	1	0	3	83	1	18	rural	150	3
452	1	0	0	1	220	7	64	1	0	2	1	1	0	3	83	1	18	freeway	150	3

453 rows × 20 columns

Create model specification¶

The following code creates the model specification by including additional variables in the dataframe to accomodate the specification needs. The newly added variables are simply the product of existing variables with dummy variables for the different alternatives. Note that the specification in the code below corresponds to the following utility maximization formulation:

\begin{equation} \begin{array}{} V_{arterial} & = & \quad \beta_3DIST_{arterial} \\ V_{rural} & = \beta_1ASC_{rural} & + \beta_4DIST_{rural} & + \beta_6VEHAGE_{rural} \\ V_{freeway} & = \beta_2ASC_{freew} & + \beta_5DIST_{freeway} & + \beta_7{VEHAGE_{freeway}} & + \beta_8MALE_{freeway} \end{array} \end{equation}

[ ]:

# Alternative specific constants
df['asc_rural'] = np.ones(len(df)) * (df['alt'] == 'rural')
df['asc_freeway'] = np.ones(len(df)) * (df['alt'] == 'freeway')

# Distance
df['dist_arterial'] = df['dist'] * (df['alt'] == 'arterial')
df['dist_rural'] = df['dist'] * (df['alt'] == 'rural')
df['dist_freeway'] = df['dist'] * (df['alt'] == 'freeway')

# Vehicle age
df['vehage_rural'] = df['vehage'] * (df['alt'] == 'rural')
df['vehage_freeway'] = df['vehage'] * (df['alt'] == 'freeway')

# Male driver
df['male_freeway'] = df['male'] * (df['alt'] == 'freeway')

Estimate model¶

After creating the model specification, we can use xlogit to estimate the model as follows:

[ ]:

from xlogit import MultinomialLogit
varnames=['asc_rural', 'asc_freeway', 'dist_arter', 'dist_rural',
          'dist_freew', 'vehage_rural', 'vehage_freew', 'male_freew']
model = MultinomialLogit()
model.fit(X=df[varnames], y=df['choice'], varnames=varnames,
          ids=df['ids'], alts=df['alt'])
model.summary()

Estimation time= 0.0 seconds
---------------------------------------------------------------------------
Coefficient              Estimate      Std.Err.         z-val         P>|z|
---------------------------------------------------------------------------
asc_rural               2.8131275     1.1504189     2.4453072        0.0416 *
asc_freeway            -2.6868817     2.2817329    -1.1775619         0.398
dist_arterial          -0.1229102     0.0240053    -5.1201358      4.14e-06 ***
dist_rural             -0.1773579     0.0279818    -6.3383224       1.3e-08 ***
dist_freeway           -0.0956391     0.0369681    -2.5870738        0.0295 *
vehage_rural            0.1236721     0.0535597     2.3090535         0.057 .
vehage_freeway          0.2268642     0.0755401     3.0032267       0.00969 **
male_freeway            0.5990000     0.6202114     0.9657996         0.499
---------------------------------------------------------------------------
Significance:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-Likelihood= -94.440
AIC= 204.881
BIC= 229.019

Swissmetro Dataset¶

In this example, we will estimate a Multinomial Logit where each alternative is defined with a different utility specification. The swissmetro dataset is an SP/RP survey dataset popularly used in Biogeme and Pylogit examples. The dataset is available at http://transp-or.epfl.ch/data/swissmetro.dat and Bierlaire et. al., (2001) provides a detailed discussion of the data as wells as its context and collection process. . Note that the dataset is available in wide format; therefore, we need to convert it to long format for xlogit.

Read data¶

The dataset is imported to the Python environment using pandas. Then, two types of samples, ones with a trip purpose different to commute or business and ones with an unknown choice, are filtered out. The original dataset contains 10,729 records, but after filtering, 6,768 records remain for following analysis. Finally, a new column that uniquely identifies each sample is added to the dataframe and the CHOICE column, which originally contains a numerical coding of the choices, is mapped to a description that is consistent with the alternatives in the column names.

[ ]:

import pandas as pd
import numpy as np

df_wide = pd.read_table("http://transp-or.epfl.ch/data/swissmetro.dat", sep='\t')

# Keep only observations for commute and business purposes that contain known choices
df_wide = df_wide[(df_wide['PURPOSE'].isin([1, 3]) & (df_wide['CHOICE'] != 0))]

df_wide['custom_id'] = np.arange(len(df_wide))  # Add unique identifier
df_wide['CHOICE'] = df_wide['CHOICE'].map({1: 'TRAIN', 2:'SM', 3: 'CAR'})
df_wide

	GROUP	SURVEY	SP	ID	PURPOSE	FIRST	TICKET	WHO	LUGGAGE	AGE	MALE	INCOME	GA	ORIGIN	DEST	TRAIN_AV	CAR_AV	SM_AV	TRAIN_TT	TRAIN_CO	TRAIN_HE	SM_TT	SM_CO	SM_HE	SM_SEATS	CAR_TT	CAR_CO	CHOICE	custom_id
0	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	1	1	1	112	48	120	63	52	20	0	117	65	SM	0
1	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	1	1	1	103	48	30	60	49	10	0	117	84	SM	1
2	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	1	1	1	130	48	60	67	58	30	0	117	52	SM	2
3	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	1	1	1	103	40	30	63	52	20	0	72	52	SM	3
4	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	1	1	1	130	36	60	63	42	20	0	90	84	SM	4
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
8446	3	1	1	939	3	1	7	3	1	5	1	2	0	1	2	1	1	1	108	13	30	50	17	30	0	130	64	TRAIN	6763
8447	3	1	1	939	3	1	7	3	1	5	1	2	0	1	2	1	1	1	108	12	30	53	16	10	0	80	80	TRAIN	6764
8448	3	1	1	939	3	1	7	3	1	5	1	2	0	1	2	1	1	1	108	16	60	50	16	20	0	80	64	TRAIN	6765
8449	3	1	1	939	3	1	7	3	1	5	1	2	0	1	2	1	1	1	128	16	30	53	17	30	0	80	104	TRAIN	6766
8450	3	1	1	939	3	1	7	3	1	5	1	2	0	1	2	1	1	1	108	13	60	53	21	30	0	100	80	TRAIN	6767

6768 rows × 29 columns

Reshape data¶

Given that xlogit requires the dataset to be provided in the long format, we reshape the dataset using the wide_to_long utility provided by xlogit. This function takes as input the column that uniquely identifies each sample (id_col), the name of column to save the alternatives (alt_name), the list of alternatives (alt_list), the columns that vary across alternatives (varying), and whether the alternative names are prefix in the column names (alt_is_prefix).The wide_to_long method fills with NaN the columns that do not have certain alternatives (e.g. SEATS and HE). Depending on your specification needs, you can ignore the NaN or replace them with zeros. In this case we replaced them with zeros using the empty_val parameter. Additional details about the wide_to_long function can be found in the xlogit’s documentation.

[ ]:

from xlogit.utils import wide_to_long

df = wide_to_long(df_wide, id_col='custom_id', alt_name='alt', sep='_',
                  alt_list=['TRAIN', 'SM', 'CAR'], empty_val=0,
                  varying=['TT', 'CO', 'HE', 'AV', 'SEATS'], alt_is_prefix=True)
df

	custom_id	alt	TT	CO	HE	AV	SEATS	GROUP	SURVEY	SP	ID	PURPOSE	FIRST	TICKET	WHO	LUGGAGE	AGE	MALE	INCOME	GA	ORIGIN	DEST	CHOICE
0	0	CAR	117	65	0	1	0	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	SM
1	0	SM	63	52	20	1	0	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	SM
2	0	TRAIN	112	48	120	1	0	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	SM
3	1	CAR	117	84	0	1	0	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	SM
4	1	SM	60	49	10	1	0	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	SM
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
20299	6766	SM	53	17	30	1	0	3	1	1	939	3	1	7	3	1	5	1	2	0	1	2	TRAIN
20300	6766	TRAIN	128	16	30	1	0	3	1	1	939	3	1	7	3	1	5	1	2	0	1	2	TRAIN
20301	6767	CAR	100	80	0	1	0	3	1	1	939	3	1	7	3	1	5	1	2	0	1	2	TRAIN
20302	6767	SM	53	21	30	1	0	3	1	1	939	3	1	7	3	1	5	1	2	0	1	2	TRAIN
20303	6767	TRAIN	108	13	60	1	0	3	1	1	939	3	1	7	3	1	5	1	2	0	1	2	TRAIN

20304 rows × 23 columns

We then scale some variables as per the examples in Biogeme and Pylogit. The time and headway variables are converted to hours, the price is scaled, and the new variables single_luggage, free ticket, multiple_luggage, regular_class and train_survey are created to accommodate the model specification as shown below:

[ ]:

# Scale travel time and headway variables to hours
df['time'] = df['TT'] / 60.0
df['headway'] = df['HE'] / 60.0
df['cost'] = df['CO'] / 100.0

# We set the cost as zero for individuals with an annual pass paid by employer
annual_pass = (df['alt'].isin(['TRAIN', 'SM'])) & ((df["GA"] == 1) | (df["WHO"] == 2))
df["cost"] = df["cost"] * (~annual_pass)

#Travellers carrying only single luggage
df["single_luggage"] = (df["LUGGAGE"] == 1).astype(int)

#Travellers carrying more than one luggage
df["multiple_luggage"] = (df["LUGGAGE"] == 3).astype(int)

# Travellers travelling in classes other than First class
df["regular_class"] = 1 - df["FIRST"]

# Travellers who responded to the survey while on a train
df["train_survey"] = 1 - df["SURVEY"]
df

	custom_id	alt	TT	CO	HE	AV	SEATS	GROUP	SURVEY	SP	ID	PURPOSE	FIRST	TICKET	WHO	LUGGAGE	AGE	MALE	INCOME	GA	ORIGIN	DEST	CHOICE	time	headway	cost	single_luggage	multiple_luggage	regular_class	train_survey
0	0	CAR	117	65	0	1	0	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	SM	1.950000	0.000000	0.65	0	0	1	1
1	0	SM	63	52	20	1	0	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	SM	1.050000	0.333333	0.52	0	0	1	1
2	0	TRAIN	112	48	120	1	0	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	SM	1.866667	2.000000	0.48	0	0	1	1
3	1	CAR	117	84	0	1	0	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	SM	1.950000	0.000000	0.84	0	0	1	1
4	1	SM	60	49	10	1	0	2	0	1	1	1	0	1	1	0	3	0	2	0	2	1	SM	1.000000	0.166667	0.49	0	0	1	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
20299	6766	SM	53	17	30	1	0	3	1	1	939	3	1	7	3	1	5	1	2	0	1	2	TRAIN	0.883333	0.500000	0.17	1	0	0	0
20300	6766	TRAIN	128	16	30	1	0	3	1	1	939	3	1	7	3	1	5	1	2	0	1	2	TRAIN	2.133333	0.500000	0.16	1	0	0	0
20301	6767	CAR	100	80	0	1	0	3	1	1	939	3	1	7	3	1	5	1	2	0	1	2	TRAIN	1.666667	0.000000	0.80	1	0	0	0
20302	6767	SM	53	21	30	1	0	3	1	1	939	3	1	7	3	1	5	1	2	0	1	2	TRAIN	0.883333	0.500000	0.21	1	0	0	0
20303	6767	TRAIN	108	13	60	1	0	3	1	1	939	3	1	7	3	1	5	1	2	0	1	2	TRAIN	1.800000	1.000000	0.13	1	0	0	0

20304 rows × 30 columns

Create model specification¶

By operating the dataframe columns, highly-flexible utility specifications can be modeled in xlogit. As shown below, alternative specific constants or coefficients can be included in the specification by strategically creating new columns and setting their values depending on the alternative. This flexibility allows even the specification of one or multiple coefficients per alternative or group of alternatives.

[ ]:

# Create model specification
# Alternative Specific Constants
df['asc_train'] = np.ones(len(df))*(df['alt'] == 'TRAIN')
df['asc_sm'] = np.ones(len(df))*(df['alt'] == 'SM')

# Travel cost (One coefficient per alternative)
df['cost_train'] = df['cost']*(df['alt'] == 'TRAIN')
df['cost_sm'] = df['cost']*(df['alt'] == 'SM')
df['cost_car'] = df['cost']*(df['alt'] == 'CAR')

# Travel time (One coefficient for train and sm and other for car)
df['time_train_sm'] = df['time']*((df['alt'] == 'TRAIN') | (df['alt'] == 'SM'))
df['time_car'] = df['time']*(df['alt'] == 'CAR')

# Headway (One coefficient per alternative, except for car)
df['headway_train'] = df['headway']*(df['alt'] == 'TRAIN')
df['headway_sm'] = df['headway']*(df['alt'] == 'SM')

# Seat config (Coefficient only for swissmetro)
df['seatconf_sm'] = df['SEATS']*(df['alt'] == 'SM')

# Train Survey (Coefficient only for swissmetro)
df['survey_train_sm'] = df['train_survey']* ((df['alt'] == 'TRAIN') | (df['alt'] == 'SM'))

# Regular class (Coefficient only for swissmetro)
df['regular_class_sm'] = df['regular_class']*(df['alt'] == 'TRAIN')

# Luggage (Coefficient only for car)
df['single_lug_car'] = df['single_luggage']*(df['alt'] == 'CAR')
df['multip_lug_car'] = df['multiple_luggage']*(df['alt'] == 'CAR')

Estimate model¶

The swissmetro dataset contains unbalanced choice situations across individuals (i.e., some individuals do not have observations for all alternatives). The avail option enables estimation for such datasets. avail takes the values that indicate the availability of each alternative across individuals. Once the model specification is complete, the model is estimated as follows:

[ ]:

from xlogit import MultinomialLogit

varnames=['asc_train', 'asc_sm', 'time_train_sm', 'time_car', 'cost_train',
          'cost_sm', 'cost_car', 'headway_train', 'headway_sm', 'seatconf_sm',
          'survey_train_sm', 'regular_class_sm', 'single_lug_car',
          'multip_lug_car']
model = MultinomialLogit()
model.fit(X=df[varnames],
          y=df['CHOICE'],
          varnames=varnames,
          alts=df['alt'],
          ids=df['custom_id'],
          avail=df['AV'])
model.summary()

Estimation time= 0.1 seconds
---------------------------------------------------------------------------
Coefficient              Estimate      Std.Err.         z-val         P>|z|
---------------------------------------------------------------------------
asc_train              -1.2929512     0.1237556   -10.4476139      2.43e-24 ***
asc_sm                 -0.5026152     0.1032927    -4.8659312      5.87e-06 ***
time_train_sm          -0.6990098     0.0396510   -17.6290608      8.13e-67 ***
time_car               -0.7229887     0.0442625   -16.3340968      1.18e-57 ***
cost_train             -0.5618773     0.0807075    -6.9618968      2.59e-11 ***
cost_sm                -0.2816843     0.0417373    -6.7489902       1.1e-10 ***
cost_car               -0.5139009     0.0970406    -5.2957304      6.66e-07 ***
headway_train          -0.3143519     0.0505955    -6.2130370      3.49e-09 ***
headway_sm             -0.3773753     0.1652542    -2.2836046        0.0589 .
seatconf_sm            -0.7824379     0.0758912   -10.3100010      9.91e-24 ***
survey_train_sm         2.5424946     0.0921336    27.5957408      1.5e-157 ***
regular_class_sm        0.5650259     0.0652226     8.6630441      4.93e-17 ***
single_lug_car          0.4227658     0.0611684     6.9115077      3.66e-11 ***
multip_lug_car          1.4141058     0.2373032     5.9590672      1.62e-08 ***
---------------------------------------------------------------------------
Significance:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-Likelihood= -5159.258
AIC= 10346.517
BIC= 10441.996

The estimates are identical to those provided by PyLogit (Brathwaite and Walker 2018) as shown in this link

Fishing Dataset¶

The following example illustrates the estimation of a Multinomial Logit model for choices of 1,182 individuals for sport fishing modes using xlogit. The goal is to analyze the market shares of four alternatives (i.e., beach, pier, boat, and charter) based on their cost and fish catch rate. Cameron (2005) provides additional details about this dataset. The following code illustrates how to use xlogit to estimate the model parameters.

Read data¶

The data to be analyzed can be imported to Python using any preferred method. In this example, the data in CSV format was imported using the popular pandas Python package. However, it is worth highlighting that xlogit does not depend on the pandas package, as xlogit can take any array-like structure as input. This represents an additional advantage because xlogit can be used with any preferred dataframe library, and not only with pandas.

[ ]:

import pandas as pd
df = pd.read_csv("https://raw.github.com/arteagac/xlogit/master/examples/data/fishing_long.csv")
df

	id	alt	choice	income	price	catch
0	1	beach	0	7083.33170	157.930	0.0678
1	1	boat	0	7083.33170	157.930	0.2601
2	1	charter	1	7083.33170	182.930	0.5391
3	1	pier	0	7083.33170	157.930	0.0503
4	2	beach	0	1249.99980	15.114	0.1049
...	...	...	...	...	...	...
4723	1181	pier	0	416.66668	36.636	0.4522
4724	1182	beach	0	6250.00130	339.890	0.2537
4725	1182	boat	1	6250.00130	235.436	0.6817
4726	1182	charter	0	6250.00130	260.436	2.3014
4727	1182	pier	0	6250.00130	339.890	0.1498

4728 rows × 6 columns

Estimate the model¶

Once the data is in the Python environment, xlogit can be used to fit the model, as shown below. The MultinomialLogit class is imported from xlogit, and its constructor is used to initialize a new model. The fit method estimates the model using the input data and estimation criteria provided as arguments to the method’s call. The arguments of the fit methods are described in `xlogit’s documentation <https://https://xlogit.readthedocs.io/en/latest/api/>`__.

[ ]:

from xlogit import MultinomialLogit

varnames = ['income','price', 'catch']
model = MultinomialLogit()
model.fit(X=df[varnames],
          y=df['choice'],
          varnames=varnames,
          isvars=['income'],
          ids=df['id'],
          alts=df['alt'],
          fit_intercept=True)
model.summary()

Estimation time= 0.0 seconds
---------------------------------------------------------------------------
Coefficient              Estimate      Std.Err.         z-val         P>|z|
---------------------------------------------------------------------------
_intercept.boat         0.5273413     0.2017396     2.6139700        0.0264 *
_intercept.charter      1.6943827     0.2096186     8.0831712       1.2e-14 ***
_intercept.pier         0.7779899     0.2051062     3.7931076      0.000622 ***
income.boat             0.0000894     0.0000473     1.8906643         0.134
income.charter         -0.0000333     0.0000480    -0.6936462         0.627
income.pier            -0.0001276     0.0000467    -2.7335425        0.0192 *
price                  -0.0251161     0.0015240   -16.4800784      5.89e-54 ***
catch                   0.3578374     0.0985344     3.6315992       0.00113 **
---------------------------------------------------------------------------
Significance:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-Likelihood= -1215.138
AIC= 2446.275
BIC= 2486.875

Heating Dataset¶

For this example, we use the Heating dataset from R’s mlogit package (Croissant 2020), which contains choice of heating systems in California house. The dataset contains 90 observations, with 8 explanatory variables and more information can be found in https://cran.r-project.org/web/packages/mlogit/vignettes/e1mlogit.html.

Read data¶

[ ]:

import pandas as pd
import numpy as np

df_wide = pd.read_csv("https://raw.github.com/arteagac/xlogit/master/examples/data/heating_wide.csv")
df_wide

	idcase	depvar	ic.gc	ic.gr	ic.ec	ic.er	ic.hp	oc.gc	oc.gr	oc.ec	oc.er	oc.hp	income	agehed	rooms	region
0	1	gc	866.00	962.64	859.90	995.76	1135.50	199.69	151.72	553.34	505.60	237.88	7	25	6	ncostl
1	2	gc	727.93	758.89	796.82	894.69	968.90	168.66	168.66	520.24	486.49	199.19	5	60	5	scostl
2	3	gc	599.48	783.05	719.86	900.11	1048.30	165.58	137.80	439.06	404.74	171.47	4	65	2	ncostl
3	4	er	835.17	793.06	761.25	831.04	1048.70	180.88	147.14	483.00	425.22	222.95	2	50	4	scostl
4	5	er	755.59	846.29	858.86	985.64	883.05	174.91	138.90	404.41	389.52	178.49	2	25	6	valley
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
895	896	gc	766.39	877.71	751.59	869.78	942.70	142.61	136.21	474.48	420.65	203.00	6	20	4	mountn
896	897	gc	1128.50	1167.80	1047.60	1292.60	1297.10	207.40	213.77	705.36	551.61	243.76	7	45	7	scostl
897	898	gc	787.10	1055.20	842.79	1041.30	1064.80	175.05	141.63	478.86	448.61	254.51	5	60	7	scostl
898	899	gc	860.56	1081.30	799.76	1123.20	1218.20	211.04	151.31	495.20	401.56	246.48	5	50	6	scostl
899	900	gc	893.94	1119.90	967.88	1091.70	1387.50	175.80	180.11	518.68	458.53	245.13	2	65	4	ncostl

900 rows × 16 columns

Reshape data¶

The dataset is available in wide format. Since xlogit requires the data in long format, we convert it as shown below:

[ ]:

from xlogit.utils import wide_to_long
df = wide_to_long(df_wide, id_col='idcase', alt_name='alt', varying=['ic', 'oc'],
                  alt_list=['ec', 'er', 'gc', 'gr', 'hp'], sep='.', empty_val=0)
df

	idcase	alt	ic	oc	depvar	income	agehed	rooms	region
0	1	ec	859.90	553.34	gc	7	25	6	ncostl
1	1	er	995.76	505.60	gc	7	25	6	ncostl
2	1	gc	866.00	199.69	gc	7	25	6	ncostl
3	1	gr	962.64	151.72	gc	7	25	6	ncostl
4	1	hp	1135.50	237.88	gc	7	25	6	ncostl
...	...	...	...	...	...	...	...	...	...
4495	900	ec	967.88	518.68	gc	2	65	4	ncostl
4496	900	er	1091.70	458.53	gc	2	65	4	ncostl
4497	900	gc	893.94	175.80	gc	2	65	4	ncostl
4498	900	gr	1119.90	180.11	gc	2	65	4	ncostl
4499	900	hp	1387.50	245.13	gc	2	65	4	ncostl

4500 rows × 9 columns

Estimate the model¶

We now import MultinomialLogit from xlogit and estimate the model as shown below:

[ ]:

from xlogit import MultinomialLogit

varnames = ['ic', 'oc']
model = MultinomialLogit()
model.fit(X=df[varnames],
          y=df['depvar'],
          varnames=varnames,
          alts=df['alt'],
          ids=df['idcase'])
model.summary()

Estimation time= 0.0 seconds
---------------------------------------------------------------------------
Coefficient              Estimate      Std.Err.         z-val         P>|z|
---------------------------------------------------------------------------
ic                     -0.0062318     0.0003516   -17.7222802      2.16e-59 ***
oc                     -0.0045800     0.0003208   -14.2767999      9.14e-41 ***
---------------------------------------------------------------------------
Significance:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log-Likelihood= -1095.237
AIC= 2194.474
BIC= 2204.079

Note that these results are identical to the ones estimated by the R mlogit package: https://cran.r-project.org/web/packages/mlogit/vignettes/e1mlogit.html

References¶

Bierlaire, M. (2018). PandasBiogeme: a short introduction. EPFL (Transport and Mobility Laboratory, ENAC).
Brathwaite, T., & Walker, J. L. (2018). Asymmetric, closed-form, finite-parameter models of multinomial choice. Journal of Choice Modelling, 29, 78–112.
Cameron, A. C., & Trivedi, P. K. (2005). Microeconometrics: methods and applications. Cambridge university press.
Croissant, Y. (2020). Estimation of Random Utility Models in R: The mlogit Package. Journal of Statistical Software, 95(1), 1-41.
Washington, S., Karlaftis, M., Mannering, F., & Anastasopoulos, P. (2020). Statistical and econometric methods for transportation data analysis. Chapman and Hall/CRC.

Multinomial Logit¶

Install xlogit package¶

Route Choice Dataset¶

Read data¶

Create model specification¶

Estimate model¶

Swissmetro Dataset¶

Read data¶

Reshape data¶

Create model specification¶

Estimate model¶

Fishing Dataset¶

Read data¶

Estimate the model¶

Heating Dataset¶

Read data¶

Reshape data¶

Estimate the model¶

References¶

Install `xlogit` package¶