1.1 Main questions
- What data are available on the topic?
- Is the available data sufficient and of good quality?
- Can we get any meaningful insights based on the available data?
- What data should be available in the future and in what form?
Compiled on 2020-09-21 by Maroun Bou Sleiman.
This project is a data-driven attempt to reconstruct the Lebanese energy sector using openly accessible data. I have never seen a day (for 32 years and counting) where this sector was able to provide consistent electricity 24/7 in the whole country - a primordial feat that the majority of countries achieve. I will not go into details into the economic and political reasons behind this incompetence as there are already many reports about the subject.
The (few) reports I read about the electricity sector in Lebanon explore at the trends in production and fossil fuel consumption from a global perspective. To my knowledge, there is no analysis of the experience from the end-user’s perspective. The EDL (Electricite du Liban) website contains information about hourly feeding information by electrical station and outlet, which I obtained (through automated scraping of their website).
The power grid is essentially split into three main components: the power plant (producer), the power station or electrical substation, and the outlets coming out of each substation. The EDL website has very scarce information about the power plants, and even less about the substations, and almost nothing about the outlets. For example, besides the name of these entities, we have very little information about their geographical positions and technical parameters. That’s really bad for a multi-billion dollar endeavor. So we have to work with what we get, and we’ll use data-science to infer other information.
This notebook was run on Renku. Here’s the link to the project. The project is using the “R basic template 4.0” with a few differences. Extra debian packages installed (see Dockerfile
), extra R packages(see install.R
).
Data from the EDL website was fetched using a custom web crawler/scraper and using this command renku run ./R/edl_feeding_crawler.R
. See README.rmd
for more information.
Then there is a post-processing step renku run ./R/edl_feeding_postprocess.R
. Here, some minor modifications are done to the data and a feeble attempt to geolocate these stations based on a simple search using the tidygeocoder
package. I had to do this since, not surprisingly, I could not find a better resource for the Lebanese power grid.
First load some R packages and the data, then make sure that the date column (dateoffeeding) is recognized as a date field (using the lubridate
package).
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(cowplot)) # for arranging and combining plots
suppressPackageStartupMessages(library(igraph))
suppressPackageStartupMessages(library(visNetwork))
suppressPackageStartupMessages(library(leaflet))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(lubridate))
suppressPackageStartupMessages(library(tidyr))
suppressPackageStartupMessages(library(plotly)) # for interactive plots
suppressPackageStartupMessages(library(readxl)) # for later
load("../data/edl_feeding_crawler_output_postprocessed.RData", verbose = T)
## Loading objects:
## feeding_all
## outlet_meta
## stations_dt
feeding_all <- feeding_all[-which(grepl("^Error", feeding_all[,1])),]
feeding_all$dateoffeeding <- ymd(feeding_all$dateoffeeding)
Some stats: we have 85 main stations, 1008 outlets, and 170554 data points from 3985 days from a temporal range of 4306 days.
Earliest data point is 2008-10-20, latest data point is 2020-08-04.
The following table shows the station metadata. This data was manually obtained by by examining the source code of the EDL website. The station_id
and global_id
were created by me.
DT::datatable(stations_dt, caption = "Station metadata")
And here is the outlet metadata.
DT::datatable(outlet_meta, caption = "Outlet metadata")
Here’s the distrubution of the number of outlets per substation:
outlet_meta_table <- table(outlet_meta$station_id)
g <- outlet_meta %>% group_by(station_id) %>% summarise(n_outlets = length(id)) %>%
ggplot(aes(n_outlets)) +
geom_histogram(binwidth = 1, color = "white", fill = "grey") +
theme_cowplot() +
xlab("Number of outlets per station") +
ylab("Number of stations")
ggplotly(g)
And here’s how the data looks like. 0 = no electricity, 1 = electricity
DT::datatable(head(feeding_all, n=100), caption = "First 100 records of data")
For convenience, one can reshape the dataset to a long format (larger in size). Then look at a subset of the data.
timecols <- paste0("time", 0:23)
feeding_all_reshaped <- feeding_all %>% gather(time, power, paste0("time", 0:23) )
feeding_all_reshaped$time <- as.numeric(gsub("time", "",feeding_all_reshaped$time))
feeding_all_reshaped$dateoffeeding_lub <- as_datetime(ymd(feeding_all_reshaped$dateoffeeding))
hour(feeding_all_reshaped$dateoffeeding_lub) <- feeding_all_reshaped$time
set.seed(123)
random_subsample <-sample(unique(feeding_all_reshaped$feedername), 50)
feeding_all_reshaped[feeding_all_reshaped$feedername %in% random_subsample, ] %>%
ggplot(aes(x = dateoffeeding_lub, y = feedername)) +
geom_tile(aes(fill = factor(power) )) +
theme_cowplot()+
scale_fill_manual(values = c("red", "green")) +
geom_vline(xintercept = as_date("2020-08-04"), color = "black") +
xlab("Date")
The black vertical line is the day the Beirut port explosion occured. As you can see above, the data is sparse (a lot of missing points). There are periods where no data is available at all. Maybe they coincide with vacations of the data-entry guy?
A closer look at more recent times 2020-today:
feeding_all_reshaped[(feeding_all_reshaped$feedername %in% random_subsample), ] %>%
filter(dateoffeeding_lub > "2020-01-01") %>%
ggplot(aes(x = dateoffeeding_lub, y = feedername)) +
geom_tile(aes(fill = factor(power) )) +
theme_cowplot()+
scale_fill_manual(values = c("red", "green")) +
geom_vline(xintercept = as_date("2020-08-04"), color = "black") +
xlab("Date")
Summary view of the global trend (blue):
summ <- feeding_all_reshaped %>% group_by(dateoffeeding, feederid) %>% summarise(hoursOn = sum(power == 1), hoursOff = sum(power == 0))
ggplot(summ, aes(x = dateoffeeding, y = hoursOn)) +
geom_line(aes(group = feederid), alpha = 0.3) +
geom_smooth(fill = "white") +
scale_y_continuous(breaks = seq(0,24,4), limits = c(0,24)) +
geom_vline(xintercept = as_date("2020-08-04"), color = "red") +
theme_cowplot() +
ylab("Hours of power per day") +
xlab("Date")
Zoom in on recent times, and include the real data in the background (in black):
summ %>% filter(dateoffeeding > "2020-01-01") %>%
ggplot(aes(x = dateoffeeding, y = hoursOn)) +
geom_line(aes(group = feederid), alpha = 0.3) +
geom_smooth(fill = "white") +
scale_y_continuous(breaks = seq(0,24,4), limits = c(0,24)) +
geom_vline(xintercept = as_date("2020-08-04"), color = "red") +
theme_cowplot() +
ylab("Hours of power per day") +
xlab("Date")
Get some derived statistics about the substations (ratio of outlets that are ON at any given time). Also, since there is a large variation in the number of outlets per substation, create arbitrary sub-groups of outlets to facilitate comparisons (Those that have 0-2 substations, those that have 3-6 etc…).
First a histogram to examine the grouping of substations. Based on the available data, a substation can have a different number of outlets at any given time (I cannot tell whether this is an artefact of sparsity or evolution in the network). I have therefore selected the maximum number of outlets for each substation in the histogram below:
summ_substation <- feeding_all_reshaped %>% group_by(dateoffeeding_lub, substationid) %>% summarise(ratio_outlets_on = sum(power == 1)/length(power), n_outlets = length(power)) %>% ungroup() %>% mutate(n_outlet_group = cut_width(n_outlets, width = 3, center = 1.5))
g <- summ_substation %>% group_by(substationid) %>% summarize(group = n_outlet_group[which.max(n_outlets)]) %>%
ggplot(aes(group)) +
geom_histogram(stat = "count") +
theme_cowplot() +
xlab("Substation group based on number of outlets") +
ylab("Number of Substations")
ggplotly(g)
General (smoothed) view of the ratios per group:
summ_substation_grouped <- summ_substation %>% group_by(dateoffeeding_lub, n_outlet_group) %>% summarize(mean_ratio_outlets_on = mean(ratio_outlets_on, na.rm = T))
g <- ggplot(summ_substation_grouped, aes(x = dateoffeeding_lub, y = mean_ratio_outlets_on, color = n_outlet_group)) +
# geom_line(aes(group = feederid), alpha = 0.3) +
# geom_line() +
geom_smooth(se = FALSE) +
ylim(0,1) +
geom_vline(xintercept = as_date("2020-08-04"), color = "red") +
theme_cowplot() +
ylab("Ratio of outlets online") +
xlab("Date")
ggplotly(g)
It is clear that starting 2020, there was a drop in feeding. This is no surprise and is known, but there is little talk in the media of how much exactly this is occuring. Also, if we know more about the exact locations that are fed by each outlet, we can integrate that data with many other datasets and ask some interesting questions:
How can we explain this dip? This section explores the EDL (Electricite du Liban) data that is made available by the Lebanese Central Administration of Statistics (CAS) in excel workbooks.
The EDL data sheet is downloaded from http://www.cas.gov.lb/index.php/thematic-time-series. This is the link to the file: http://www.cas.gov.lb/images/Excel/Thematic%20Time%20Series/EDL%201995-2020.xlsx
We do not know how often this sheet is updated, by whom, and what version it is. Also the sheet will be hard to parse since it is prepared in excel and in a non-machine-friendly format.
download.file("http://www.cas.gov.lb/images/Excel/Thematic%20Time%20Series/EDL%201995-2020.xlsx", "../data/EDL 1995-2020.xlsx")
This sheet (sheet 2) shows the monthly production and consumption of the network. The production is split by different EDL and non-EDL sources. Some notes and criticism:
dt_in <- readxl::read_xlsx("../data/EDL 1995-2020.xlsx", sheet = "Mensuel - Monthly - شهري", skip = 5)
# deal with tri-lingual column names
colnames(dt_in) <- sapply(strsplit(colnames(dt_in), "/"), function(x){
if(length(x) ==2){
gsub(" +", "", x[1])
}else{
gsub(" +", "", x[2])
}
})
# deal with duplicate column names. First instance is production, second is consumption
for (i in 1:ncol(dt_in)){
m <- which(colnames(dt_in) %in% colnames(dt_in)[i])
if(length(m) == 3){
colnames(dt_in)[m] <- paste0(c("production_", "purchases_", "consumption_"),colnames(dt_in)[m] )
}else if(length(m) == 2){
wpc <- which(colnames(dt_in) == "Purchases")
if(min(m) > wpc){
colnames(dt_in)[m] <- paste0(c("purchases_", "consumption_"),colnames(dt_in)[m] )
}else{
colnames(dt_in)[m] <- paste0(c("production_", "consumption_"),colnames(dt_in)[m] )
}
}
}
colnames(dt_in) <- gsub("^Safa$", "production_Safa", colnames(dt_in))
#deal with merged year cells
for(i in 1:nrow(dt_in)){
if(is.na(dt_in$Year[i])) {
dt_in$Year[i] <- curYear
}else{
curYear <- dt_in$Year[i]
}
}
# parse the month cell and derive the date
# remove all month rows with "Total"
rowRemove <- which(grepl("Total", dt_in$Month))
dt_in <- dt_in[-rowRemove,]
dt_in$Month <- sapply(strsplit(dt_in$Month, "/"), function(x){
gsub(" +", "", x[2])
})
dt_in$Date <- lubridate::dmy(paste0("01 ", substr(dt_in$Month,1,3), " ", dt_in$Year))
# remove production and consumption values of 0 and put NA in year 2020
first_zero_date <- dt_in$Date[min(which(dt_in$EDLfactoriesProduction == 0))]
dt_in <- dt_in[dt_in$Date < first_zero_date,]
Here’s the production and consumption data as a function of time. It seems there are a few missing or wrong data points (for example production 2000-12-01).
g <- ggplot(dt_in, aes(x = Date)) +
geom_line(aes(y = EDLfactoriesProduction + Purchases)) +
geom_line(aes(y = Networkconsumption), color = "red") +
theme_cowplot() +
scale_x_date(date_breaks = "5 years", date_labels = "%m-%Y")
# scale_x_date(date_breaks = "1 year")
ggplotly(g)
There are three different “levels”:
# cols <- grep("production_|purchases_", colnames(dt_in), value = T)
cols_level1 <- c("EDLfactoriesProduction", "Purchases")
cols_level2 <- c("production_HydraulicEnergy", "production_ThermalEnergy", "purchases_HydraulicEnergy", "purchases_ThermalEnergy")
cols_level3 <- c("production_Safa", "production_Qadisha","production_Steam", "production_Gas", "production_Mixed",
"production_Reciprocatingengines-Zouk+Jieh", "production_Naameh", "purchases_FromSyria", "purchases_FromKarpoweships", "purchases_FromEgypt")
dir_component <- function(x){
x$direction <- gsub("_(.)+", "", x$element)
x$component <- gsub("(.)+_", "", x$element)
return(x)
}
dt_breakdown_l1 <- dt_in[,c("Date", cols_level1)] %>% pivot_longer(cols = cols_level1, names_to = "element", values_to = "KWh") %>% dir_component()
dt_breakdown_l2 <- dt_in[,c("Date", cols_level2)] %>% pivot_longer(cols = cols_level2, names_to = "element", values_to = "KWh") %>% dir_component()
dt_breakdown_l3 <- dt_in[,c("Date", cols_level3)] %>% pivot_longer(cols = cols_level3, names_to = "element", values_to = "KWh") %>% dir_component()
g <- ggplot(dt_breakdown_l1, aes(x=Date, y=KWh, fill = element)) +
geom_bar(stat = "identity", position = "stack", color = "black", lwd = 0.1) +
# facet_wrap(~direction, ncol = 1) +
theme_cowplot() +
theme(legend.position = "none") +
scale_x_date(date_breaks = "5 years", date_labels = "%m-%Y") +
ggtitle("Level 1 - Breakdown")
ggplotly(g)
g <- ggplot(dt_breakdown_l2, aes(x=Date, y=KWh, fill = element)) +
geom_bar(stat = "identity", position = "stack", color = "black", lwd = 0.1) +
# facet_wrap(~direction, ncol = 1) +
theme_cowplot() +
theme(legend.position = "none") +
scale_x_date(date_breaks = "5 years", date_labels = "%m-%Y") +
ggtitle("Level 2 - Breakdown")
ggplotly(g)
g <- ggplot(dt_breakdown_l3, aes(x=Date, y=KWh, fill = element)) +
geom_bar(stat = "identity", position = "stack", color = "black", lwd = 0.1) +
# facet_wrap(~direction, ncol = 1) +
theme_cowplot() +
theme(legend.position = "none") +
scale_x_date(date_breaks = "5 years", date_labels = "%m-%Y") +
ggtitle("Level 3 - Breakdown")
ggplotly(g)
Level 3 plot as before
g <- ggplot(dt_breakdown_l3[year(dt_breakdown_l3$Date) > 2015,], aes(x=Date, y=KWh, fill = element)) +
geom_bar(stat = "identity", position = "stack", color = "black", lwd = 0.1) +
# facet_wrap(~direction, ncol = 1) +
theme_cowplot() +
theme(legend.position = "none")
ggplotly(g)
Breakdown by whether power is purchased or produced.
g <- ggplot(dt_breakdown_l3[year(dt_breakdown_l3$Date) > 2015,], aes(x=Date, y=KWh, fill = element)) +
geom_bar(stat = "identity", position = "stack", color = "black", lwd = 0.1) +
facet_wrap(~direction, ncol = 1) +
theme_cowplot() +
theme(legend.position = "none")
ggplotly(g)
It seems that the most affected element is the Karpowerships. Here’s the percentage of total power from the Karpowerships:
g <- dt_breakdown_l3 %>% group_by(Date) %>% dplyr::summarize(proportion_kpc = 100*(KWh/sum(KWh,na.rm = T))[which(element == "purchases_FromKarpoweships")]) %>%
filter(year(Date) > 2012) %>%
ggplot(aes(x = Date, y = proportion_kpc)) +
geom_bar(stat="identity") +
theme_cowplot() +
ylab("Percentage of power from KPC")
ggplotly(g)
This table is in the same excel file, sheet name is Transferts - Transfers - تحوي
.It contains information from the Ministry of Finance concerning the transfer of money (in Billion Lebanese Pounds) from the treasury (I presume).
dt_exp <- readxl::read_xlsx("../data/EDL 1995-2020.xlsx", sheet = "Transferts - Transfers - تحويل",
skip = 4)
# deal with tri-lingual column names
colnames(dt_exp) <- sapply(strsplit(colnames(dt_exp), "/"), function(x){
if(length(x) ==2){
gsub(" +", "", x[1])
}else{
gsub(" +", "", x[2])
}
})
# deal with tri-lingual first row
dt_exp$BillionLBP <- sapply(strsplit(dt_exp$BillionLBP, "/"), function(x){
if(length(x) ==2){
gsub(" +", "", x[1])
}else if(length(x) == 3){
gsub(" +", "", x[2])
}else{
x[1]
}
})
# remove last row
dt_exp <- dt_exp[-nrow(dt_exp),]
# remove the total row, and the detailed ones to keep the four constituents that make the total.
toremove <- c("EDLofwhich:", "Principalpayments","Interestpayments", "BDLGuaranteedLoanpayments", "KPC & SPC", "EGAS", "ReimbursementofKPCandSonatrach", "ReimbursementofKPCandSonatrach")
# remove the detailed rows (that are nested within the main ones)
dt_exp <- dt_exp[-which(dt_exp$BillionLBP %in% toremove),]
tmp_category <- dt_exp$BillionLBP
dt_exp$BillionLBP <- NULL
which_yrs <- which(grepl("^Total", colnames(dt_exp)))
yrs <- gsub("Total", "", colnames(dt_exp)[which_yrs])
names(which_yrs) <- yrs
which_no_yrs <- which(!grepl("^Total", colnames(dt_exp)))
colnames(dt_exp)[which_no_yrs] <- paste0("01-",
substr(colnames(dt_exp)[which_no_yrs], 1,3),
"-",sapply(which_no_yrs, function(x){
names(which_yrs)[min(which(which_yrs > x))]
}))
dt_exp <- dt_exp[,-which_yrs]
dt_exp$category <- tmp_category
dt_exp <- reshape::melt(as.data.frame(dt_exp),id.vars = "category", variable_name = "Date")
dt_exp$Date <- lubridate::dmy(dt_exp$Date)
DT::datatable(dt_exp)
The fields are very badly explained and there is a hierarchy in them. If I understand correctly:
“EDL of which” : this is basically the total of (Debt Service/Eurobonds, Reimbursement for purchase of gas and fuel, Transfers to Electricity Syria). These constituents are in bold in the sheet, and this is the only way one could figure it out (for a non-expert).
“Debt Service” is the umbrella term for the debt-related terms: C-loans and Eurobonds (of which printipal and interest payments), Banque Du Liban-Guaranteed Loan payments. This may not be very accurate, and I feel that the formula for calculating the total has changed with time due to some restructuring.
“Reimbursement for purchase of gas and fuel” : KPS and SPC (Karpowership and Sonatrach), EGAS
Unit is Billion LBP
Last data point is December 2019
g <- ggplot(dt_exp, aes(x = Date)) +
geom_bar(aes(y = value, fill = category), position = "stack", stat = "identity") +
scale_x_date(date_breaks = "5 years", date_labels = "%m-%Y") +
theme_cowplot() +
theme(legend.position = "none") +
ylab("Billions LBP")
# scale_x_date(date_breaks = "1 year")
ggplotly(g)
Was the KPC supply proportional to money transfers? Doesn’t seem to be the case, but doesn’t have to be the case either. There are so many other factors to take in mind. I am not familiar with the terms of the contracts.
g <- dt_exp %>% filter(category == "Reimbursementforpurchaseofgasandfuel") %>% dplyr::full_join(by = "Date", dt_in %>% select(Date, purchases_FromKarpoweships)) %>%
mutate(Year = factor(year(Date))) %>%
ggplot(aes(x=value, y = purchases_FromKarpoweships, date = Date, color = Year )) +
geom_point() +
theme_cowplot() +
xlab("LBP billions") +
ylab("Purchased KWh from Karpowerships") +
theme(legend.position = "none")
ggplotly(g)
This effort is an attempt to figure things out based on available (governmental) data. The quality, quantity, form, and content of the data available on the CAS website, in addition to the website itself, clearly indicate that the CAS has poor data management practices. The whole service needs to be radically improved as this service is supposed to be the beacon of sound data management and statistical methodology.
Criticism:
Minimal requirements:
Extra requirements:
Here goes my attempt to put things on a map!
First need to get powerplant data. The file ./data/edl_power_plant_2020.xls
was created by me from two major sources:
Substations and outlets were mapped in a very naive way using the tidygeocoder
package. This was done simply through a string search using the name. So this is of extremely low quality.
Here’s how the data look like:
power_plants <- readxl::read_xls("../data/edl_power_plant_2020.xls", skip = 2)
coords <- do.call(rbind, strsplit(gsub("( )+|( )+", "", power_plants$gps_coordinates), ","))
colnames(coords) <- c("latitude", "longitude")
coords <- apply(coords, MAR = c(1,2), as.numeric)
power_plants <- cbind(power_plants, coords)
DT::datatable(power_plants, width = "100%")
Now let’s infer some connectivity and plot the map. At the moment, we can place very few substations and outlets on the map due to lack of data, unfortunately. Also some links are nonsensical. For example, one substation was placed in Iraq because its name is “Tikrit / تكريت” (a city in Iraq).
graph_df <- data.frame(from = outlet_meta$global_station_id,
to = outlet_meta$global_id,
station_id = outlet_meta$station_id,
outlet_id = outlet_meta$id)
graph_df$from_lat <- stations_dt$latitude[match(outlet_meta$global_station_id, stations_dt$global_id)]
graph_df$from_lng <- stations_dt$longitude[match(outlet_meta$global_station_id, stations_dt$global_id)]
graph_df$to_lat <- outlet_meta$latitude
graph_df$to_lng <- outlet_meta$longitude
vertex_meta <- data.frame(name = unique(c(graph_df$from, graph_df$to)))
vertex_meta$type <- gsub("_(.)+", "", vertex_meta$name)
vertex_meta$internal_id <- gsub("(.)+_", "", vertex_meta$name)
vertex_meta <- split(vertex_meta, vertex_meta$type)
vertex_meta <- lapply(vertex_meta, function(x){
if(x$type[1] == "station"){
out <- x
out$label <- stations_dt$station_name[match(x$name, stations_dt$global_id)]
out
}else{
out <- x
out$label <- outlet_meta$text[match(x$name, outlet_meta$global_id)]
out
}
})
vertex_meta <- do.call(rbind, vertex_meta)
vertex_meta$shape <- plyr::mapvalues(vertex_meta$type, c("outlet", "station"), c("circle", "square"))
# g <- igraph::graph_from_data_frame(graph_df, vertices = vertex_meta)
#
# visNetwork::visIgraph(g) %>% visIgraphLayout(layout = "layout_nicely") %>%
# visNodes(size = 10) %>%
# visOptions(highlightNearest = list(enabled = T, hover = T),
# nodesIdSelection = T)
lf <- leaflet(stations_dt) %>%
addTiles() %>%
leaflet::addAwesomeMarkers(lng = stations_dt$longitude,
lat = stations_dt$latitude,
label = stations_dt$station_name,
icon = awesomeIcons(icon = "bolt", library = "fa"),
group = "Substations") %>%
leaflet::addCircles(lng = outlet_meta$longitude, lat = outlet_meta$latitude, label = outlet_meta$text,
group = "Outlets") %>%
leaflet::addAwesomeMarkers(lng = power_plants$longitude,
lat = power_plants$latitude,
label = power_plants$power_plant_group,
icon = awesomeIcons(icon = ifelse(power_plants$type=="Thermal", "industry", "tint"),
library = rep("fa", nrow(power_plants)),
iconColor = rep("black", nrow(power_plants)),
markerColor = rep("orange", nrow(power_plants))),
group = "Power plants") %>%
fitBounds(35, 33, 37, 35)
which_known_path <- which(rowSums(!is.na(graph_df)[,c("from_lat", "from_lng", "to_lat", "to_lng")]) == 4)
for(i in which_known_path){
lf <- lf %>% leaflet::addPolylines(lng = c(graph_df$from_lng[i], graph_df$to_lng[i]), lat = c(graph_df$from_lat[i], graph_df$to_lat[i]), group = "Connections")
}
lf %>%
addLayersControl(overlayGroups = c("Power plants", "Substations", "Outlets", "Connections"),
options = layersControlOptions(collapsed = FALSE))
Orange markers are power plants (whether thermal or hydroelectric). Blue markers are substations. Small red markers are outlets. Note that the substations and outlet locations are guessed programmatically from their names because there is no available data about their locations. Also note that outlets are not supposed to be points, but paths like roads, but this data is not available either.
This exercise led me to two main conclusions:
To do list:
sessionInfo()
## R version 4.0.0 (2020-04-24)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] readxl_1.3.1 plotly_4.9.2.1 lubridate_1.7.8 leaflet_2.0.3
## [5] visNetwork_2.0.9 igraph_1.2.5 cowplot_1.1.0 forcats_0.5.0
## [9] stringr_1.4.0 dplyr_0.8.5 purrr_0.3.4 readr_1.3.1
## [13] tidyr_1.0.3 tibble_3.0.1 ggplot2_3.3.0 tidyverse_1.3.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.4.6 lattice_0.20-41 assertthat_0.2.1 digest_0.6.25
## [5] plyr_1.8.6 R6_2.4.1 cellranger_1.1.0 backports_1.1.7
## [9] reprex_0.3.0 evaluate_0.14 httr_1.4.1 pillar_1.4.4
## [13] rlang_0.4.6 lazyeval_0.2.2 rstudioapi_0.11 data.table_1.12.8
## [17] Matrix_1.2-18 DT_0.13 rmarkdown_2.1 splines_4.0.0
## [21] labeling_0.3 htmlwidgets_1.5.1 munsell_0.5.0 broom_0.5.6
## [25] compiler_4.0.0 modelr_0.1.7 xfun_0.13 pkgconfig_2.0.3
## [29] mgcv_1.8-31 htmltools_0.4.0 tidyselect_1.1.0 bookdown_0.19
## [33] reshape_0.8.8 fansi_0.4.1 viridisLite_0.3.0 crayon_1.3.4
## [37] dbplyr_1.4.3 withr_2.2.0 grid_4.0.0 nlme_3.1-147
## [41] jsonlite_1.6.1 gtable_0.3.0 lifecycle_0.2.0 DBI_1.1.0
## [45] magrittr_1.5 scales_1.1.1 cli_2.0.2 stringi_1.4.6
## [49] farver_2.0.3 fs_1.4.1 xml2_1.3.2 ellipsis_0.3.1
## [53] generics_0.0.2 vctrs_0.3.0 tools_4.0.0 glue_1.4.1
## [57] hms_0.5.3 crosstalk_1.1.0.1 yaml_2.2.1 colorspace_1.4-1
## [61] rvest_0.3.5 knitr_1.28 haven_2.2.0