Variable selection in DEA is a question that requires full attention before the results of an analysis can be used in a real case, because its results can be significantly modified depending on the variables included in the model. So, variable selection is a keystone step in each DEA application.
The selection procedure can lead to remove a variable that decision
maker could want to keep a variable in the model for political, tactical
or any other reason. But the contribution of that variable will be
negligible if nothing is done. cadea
function provides a
way force the contribution of a variable to a model be at least a given
value.
For more information about loads help of the package about
adea
or see (Fernandez-Palacin,
Lopez-Sanchez, and Munoz-Marquez 2018) and (Villanueva-Cantillo and Munoz-Marquez
2021).
Let’s load and have a look at the tokyo_libraries
dataset with
data(tokyo_libraries)
head(tokyo_libraries)
#> Area.I1 Books.I2 Staff.I3 Populations.I4 Regist.O1 Borrow.O2
#> 1 2.249 163.523 26 49.196 5.561 105.321
#> 2 4.617 338.671 30 78.599 18.106 314.682
#> 3 3.873 281.655 51 176.381 16.498 542.349
#> 4 5.541 400.993 78 189.397 30.810 847.872
#> 5 11.381 363.116 69 192.235 57.279 758.704
#> 6 10.086 541.658 114 194.091 66.137 1438.746
First of all let’s do an adea
with the following
call
<- tokyo_libraries[, 1:4]
input <- tokyo_libraries[, 5:6]
output <- adea(input, output)
m summary(m)
#>
#> Model name
#> Orientation input
#> Load orientation inoutput
#> Model load 0.455466997833526
#> Input load.Area.I1 0.455466997833526
#> Input load.Books.I2 1.33716872370689
#> Input load.Staff.I3 0.981885802948442
#> Input load.Populations.I4 1.22547847551114
#> Output load.Regist.O1 0.763942838453517
#> Output load.Borrow.O2 1.23605716154648
#> Inputs Area.I1 Books.I2 Staff.I3 Populations.I4
#> Outputs Regist.O1 Borrow.O2
#> nInputs 4
#> nOutputs 2
#> nVariables 6
#> nEfficients 6
#> Eff. Mean 0.775919227646031
#> Eff. sd 0.174702408743164
#> Eff. Min. 0.350010840234134
#> Eff. 1st Qu. 0.700942885344481
#> Eff. Median 0.784943740381793
#> Eff. 3rd Qu. 0.924285790399849
#> Eff. Max. 1
It shows that Area.I1
has a load under 0.6, which means
its contribution to DEA model is negligible.
With the following call to cadea
the contribution of
Area.I1
is force to be higher than 0.6:
<- cadea(input, output, load.min = 0.6, load.max = 4)
mc summary(mc)
#>
#> Model name
#> Orientation input
#> Load orientation inoutput
#> Model load 0.600000000000042
#> Input load.Area.I1 0.600000000000042
#> Input load.Books.I2 1.16440394470301
#> Input load.Staff.I3 0.932502044865763
#> Input load.Populations.I4 1.30309401043119
#> Output load.Regist.O1 0.912551322626857
#> Output load.Borrow.O2 1.08744867737314
#> Minimum for loads1 0.6
#> Minimum for loads2 0.6
#> Minimum for loads3 0.6
#> Minimum for loads4 0.6
#> Minimum for loads5 0.6
#> Minimum for loads6 0.6
#> Maximum for loads1 4
#> Maximum for loads2 4
#> Maximum for loads3 4
#> Maximum for loads4 4
#> Maximum for loads5 4
#> Maximum for loads6 4
#> Inputs Area.I1 Books.I2 Staff.I3 Populations.I4
#> Outputs Regist.O1 Borrow.O2
#> nInputs 4
#> nOutputs 2
#> nVariables 6
#> nEfficients 6
#> Eff. Mean 0.773704229966596
#> Eff. sd 0.174936730836523
#> Eff. Min. 0.349071771188186
#> Eff. 1st Qu. 0.700942885344227
#> Eff. Median 0.769117261231101
#> Eff. 3rd Qu. 0.924285790399358
#> Eff. Max. 1
Note that the maximum value of a variable load is the maximum number
of variables of its types, so load.max = 4
has no effect on
results.
Now load level raises to the given value of 0.6, efficiency average decreases a little.
To compare both efficiency set, observe that Spearman correlation
coefficient between them is 0.998. This can also be seen in the next
plot:
All these mean that in this case the change are small. Bigger change
can be expected if load.min
grows.
Universidad de Cádiz, fernando.fernandez@uca.es↩︎
Universidad de Cádiz, manuel.munoz@uca.es↩︎