Introduction

Column

What is ELVIS?

The ELection Violence Intelligence System (ELVIS) is an early warning approach to predicting the risk of election violence for every major national election across the globe. ELVIS provides regularly updated forecasts for each election in the coming month, alongside an annual risk table and an archived risk table that dates back to 2000. We utilize an ensemble machine learning ecosystem to produce (1) predictions, (2) predictor importance measurements, (3) prediction/outcome relationship visualizations, and (4) evaluations of predictive accuracy.

ELVIS was developed at the One Earth Future Foundation under the research department with the intent of providing both substantive and actionable information about the risk of election violence for upcoming elections.

Column

How we make predictions.

Like our CoupCast project, we utilize original data alongside ensemble machine learning to forecast the risk of election violence. As a result, ELVIS contains three major components:

  • Data: We continue the use of coding rules used by the National Elections Across Democracy and Autocracy (NELDA) dataset to code relevant national elections each year. NELDA stopped coding election events ending in 2012, so we have expanded the data into the end of 2019. We code general election violence and government harassment/violence using the same rules as NELDA, and continue to code these outcomes for each new election event. Our predictor variables come from a combination of environmental, economic, social, and political measures that are found in our Rulers, Elections, and Irregular Governance (REIGN) dataset.

  • Modeling: We use a combination of methods to produce more accurate predictions of election violence. First, we start by implementing a rolling origin cross validation procedure for training our algorithm by taking into account temporal sequence. This allows us to avoid training on future data. Second, we use an ensemble of three classification algorithms. To do this, we use a greedy optimization procedure on top of a (1) random forest, (2) logistic regression, and (3) a neural network.

  • Insight: Using the above, we provide four distinct insights into election violence.
    • Predictions: We provide an annual list of election violence risk at the beginning of each year. As elections happen, we update our predictions each month with both new outcomes and new social/economic data as it is released.
    • Variable Importance: Based off the most up-to-date training data, we also provide estimates of predictor importance as they relate to the accurate classification of election violence.
    • Variable Relationships: We use partial dependence functions to visualize the relationship between our predictors and the risk of election violence.
    • Accuracy: We provide information on how well our system performs over the year and how it has performed historically.

Data Source and Citation.

You can download our monthly updated training data for ELVIS here. (UPDATED APRIL 5TH 2019)

If you wish you use the ELVIS data for your own projects, we recommend the following citation:

  • Citation TBD

If you are citing ELVIS, we also encourage the citation of the following data as well:

  • Susan D. Hyde and Nikolay Marinov, 2012, Which Elections Can Be Lost?, Political Analysis, 20(2), 191-201.

  • Bell, Curtis. 2016. The Rulers, Elections, and Irregular Governance Dataset (REIGN). Broomfield, CO: OEF Research. Available at oefresearch.org

For any questions concerning ELVIS, our modeling, or data, please contact Clayton Besaw ()

2019 Elections

Column

Data Table Information

For each year, we supply both an annual risk table that is updated at the beginning of the year (2019 ANNUAL RISK TABLE) and a monthly updated risk table for remaining elections (REMAINING 2019 NATIONAL ELECTIONS). Below we provide a brief description of each variable displayed, but feel free to contact Clayton Besaw () with any additional questions.

  • Country: Country in which the election event is taking place.

  • Election Date: Date that election event is set to take place in month/day/year format. This information is likely to change as election dates are often volatile or delayed.

  • Probability of Violence: Predicted probability of general election-related violence before/during/after an election event.
    • For annual risk table, this is the base-line prediction based on only the historic training data. For the monthly risk table, this is the updated prediction for each election event based on historic data and new events.
  • Percentile: The percentile ranking for each election event based on the complete distribution of election violence forecasts.

  • Risk Change since January: This is the change in the probability of election-related violence between the monthly updates and the base-line forecasts made at the beginning of the year (monthly risk table only).

  • Outcome: Dichotomous classification of whether an election event was peaceful or experienced election-related violence before/during/after the event. Updated as election events are completed (annual risk table only).

2019 Elections Completed (As of April 2019)

2019 Predictive Accuracy (As of March 2019)

Proportion of Violent Elections in 2018 (As of March 2019)

Column

Remaining 2019 National Elections (UPDATED April 2019)

2019 ANNUAL RISK TABLE (UPDATED January 2019)

Historical Data Archive

Column

Data Table Information

This archive provides information on over two thousand unique national election events regarding their dates and whether election-related violence occured before/during/after the event. You can use the search box to look up specific countries, and the columns can be used to sort each variable according to their measurement type. If you wish to have access to the full training data, which includes this information, you can download it here.

  • Country: Country in which the election event is taking place.

  • Election Date: Date that election event is set to take place in month/day/year format. This information is likely to change as election dates are often volatile or delayed.

  • Outcome: Dichotomous classification of whether an election event was peaceful or experienced election-related violence before/during/after the event. Updated as election events are completed.

Proportion of Violent Elections (1975 - 2019)

Column

Election Information and Violence Outcomes (1975 - 2019)

Accuracy

Column {data-width = 150}

How do we measure accuracy?

Because our system tires to classify the likelihood of election violence occuring, we use measurement of accuracy called the area under the curve (AUC) score. AUC scores range from .5 (random guessing) to 1 (percent accuracy), and can be interpreted using the following heuristic.

  • .91 - 1 (Excellent Accuracy)
  • .81 - .90 (Good Accuracy)
  • .71 - .80 (Fair/Acceptable Accuracy)
  • .61 - .70 (Poor Accuracy)
  • .50 - .60 (Almost Randomly Guessing)

Using rolling origin cross-validation, on historical data (1975 - 2017), we obtained an average AUC score of 0.88 corresponding to an average accuracy of 77 percent of elections correctly classified. The figures to the right display the change in AUC scores for out-of-sample data into 2017. As expected, the inclusion of more data over time has resulted in impressive AUC scores (0.91 in 2016 and 0.91 in 2017) for recent election years. Overall, our model has a good track record on historical data and suggests an actionable level of accuracy for each new slate of national elections.

As of October 26th, 2018, our model has achieved an AUC score of 0.83 corresponding to an accuracy of 80 percent.

Column {data-width = 750}

Model Accuracy on Historical Out-of-Sample Years (1975 - 2017)

Variable Importance

Column

Variable Descriptions

The bar-plots to the right display the variable importance for each predictor included in our model. Variable importance is an estimate of how much each individual variable contributes to predictive accuracy. This is achieved by running multiple permutations of our forecasting models in which every variable is replaced with random noise. If accuracy decreases when a variable is replaced, then that variable is deemed more important for predictive accuracy. Our greedy ensemble classifier allows us to utilize the idiosyncratic nature of individual classifiers to built an overall measure of predictor importance. Each predictor is briefly described below:

  • History of Election Violence: This predictor is created by calculating the first principle component (linear combination) of three factors:
    • (a): Did election-related violence take place in the previous election event?
    • (b): How many election events since the last instance of election-related violence?
    • (c): How much election-related violence has been experienced previously by a country?
  • GDP per Capita: Measurement of GDP per Capita for the country-year of the election event.

  • Population: Logged measure of population for the country-year of the election event.

  • Infant Mortality Rate: Logged measure of infant mortality rate for the country-year of the election event.

  • Coup Risk: Estimated risk of a military coup in the month of the election. Taken from REIGN.

  • Quality of Democracy: Measure of high quality democracy (10) or low quality/authoritarian (-10) for the country-year of the election event. Taken from Polity IV.

  • Economic Growth: Percent growth/decline in GDP for the country-year of the election event.

  • Relative Precipitation: Estimated relative level (SPI) of rain-fall in the month of the election. Base data taken from NOAA’s PRECipitation REConstruction over Land (PREC/L) data.

  • Regime Tenure: Number of months a regime has been in power during the month of an election. Taken from REIGN.

  • Political Competition: Level of political competition for the country-year of the election event. Taken from Polity IV.

  • Executive Constraints: Level of constraints on the political executive for the country-year of the election event. Taken from Polity IV.

  • Time Since Last Election: Number of months since the last successful election during the month of an election. Taken from REIGN.

  • Election in Next Six Months: Dichotomous measure of whether another election is expected within six-months of an election event. Taken from REIGN.

Column

Greedy Ensemble Classifier (Overall Importance)

Random Forest Classifier

Neural Net Classifier

Logistic Regression Classifier

Variable Relationships

Column

Variable Relationships

To further provide actionable information about ELVIS, we utilize partial dependence plots as a way to visualize the relationship between the values of our predictors and the expected risk of election-related violence. Partial dependence plots are similar in concept to the marginal effect. Essentially, these calculations can tell us the expected relationship between any one predictor variable and the expected classification of election-related violence while controlling for all other predictors simultaneously. For brief descriptions of each predictor, please see our variable importance page. For a brief guide on interpretation, see the description below.

  • Partial Dependence Plots (Single Variable):
    • X-Axis: Name corresponds to the predictor variable of interest. Values along the x-axis correspond to real values of the predictor variable.
    • Y-Axis: Values correspond to the expected classification of election-related violence based on a real value of the predictor variable. Higher values on the y-axis indicate greater expected classification of violence, while lower values indicate greater expected classification of peace.

 

  • Partial Dependence Plots (Multi-Variable):
    • X-Axis: Name corresponds to the first predictor variable of interest. Values along the x-axis correspond to real values of the predictor variable.
    • Y-Axis: Name corresponds to the second predictor variable of interest. Values along the x-axis correspond to real values of the predictor variable.
    • Z-Axis: Values correspond to the expected classification of election-related violence based on a real value of the predictor variable. Higher values on the y-axis indicate greater expected classification of violence, while lower values indicate greater expected classification of peace.
    • Note on Color: The multi-variable plot displays a 3D representation of the expected classification based on the values of two interacting predictors. The more blue/purple the color, the more likely our models is expected to classify peace. In contrast, the more green/yellow the color, the more likely our model is expected to classify election-related violence.

Column

Partial Dependence Plots (Single Variable)

Partial Dependence Plots (Multi Variable)