Predicting Future Daily Covid-19 Cases “Tunisia” Coronavirus

Wajdi HAJJI
4 min readMay 31, 2020

Get the latest Coronavirus data with a unique Python package
A tiny Python package for easy access to up-to-date Coronavirus (COVID-19, SARS-CoV-2) cases data.

In order install this package, simply run:!pip install COVID19Py##Collecting COVID19Py
##Downloading COVID19Py-0.3.0.tar.gz (4.9 kB)
##Successfully installed COVID19Py-0.3.0
# loading packages and libraries
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from sklearn.preprocessing import MinMaxScaler
from datetime import datetime
import COVID19Py
# Create a New Instance to Access Data Source
covid19 = COVID19Py.COVID19()
# Choosing a data source, Getting location by country code (https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2) or specific location for example Select Tunisia id 212timeline = covid19.getLocationById(212)['timelines']['confirmed']['timeline']

dic = { 'Date' : list(timeline.keys()),
'Cases': list(timeline.values())}

df = pd.DataFrame.from_dict(dic)
df
fig = go.Figure(data=go.Scatter(x=df.Date, y=df.Cases, mode='lines+markers'))
fig.show()

1- 41 days, Tunisia had 0 cases, we will delete these lines.
2- The number of cases is cumulative. We will cancel the accumulation.

corona = df.copy()
# delete the first 41 lines
corona = corona[41:]
corona = corona.reset_index(drop=True)
# cancel the accumulation
corona = corona.set_index('Date')
corona = corona.diff().fillna(0).astype(np.int64)
corona
fig = go.Figure(data=go.Scatter(x=corona.index, y=corona.Cases, mode='lines+markers'))
fig.show()

Preprocessing

# Splitting Data into a Training set and a Test set
test_data_size = 20
train_data = corona[:-test_data_size]
test_data = corona[-test_data_size:]
train_set = train_data.values
test_set = test_data.values
train_set = train_data.values
test_set = test_data.values
# To increase the training speed and performance of the model, we'll use the MinMaxScaler from scikit-learn ( scale the data values between 0 and 1 )
#Initialising the MinMaxscaler ()
scaler = MinMaxScaler(feature_range = (0, 1))#Transforming training and test values train_set = scaler.fit_transform(train_set)
test_set = scaler.fit_transform(test_set)
# Currently, we have a big sequence of daily cases. We'll convert it into smaller ones:def sequences(data, seq_length):
X_values = []
Y_label = []
for i in range(seq_length, len(data)):
X_values.append(data[i-seq_length:i, 0])
Y_label.append(data[i, 0])
return np.array(X_values), np.array(Y_label)
# Create sequences
train_X, train_Y = sequences(train_set, seq_length=7)
test_X, test_Y= sequences(test_set, seq_length=7)

Building the model

model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(train_X.shape[1], 1)))
model.add(Dropout(0.3))
model.add(LSTM(units=50, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(units=50, return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(units=50))
model.add(Dropout(0.3))
model.add(Dense(units = 1))
model.compile(optimizer = 'adam', loss = 'mean_squared_error')
history = model.fit(train_X, train_Y, epochs = 100, validation_data=(test_X,test_Y))### Let's have a look at the train and test loss:fig = go.Figure()
fig.add_trace(go.Scatter(x=[i for i in range(1,54)], y=history.history['loss'],mode='lines+markers',name='Training loss'))
fig.add_trace(go.Scatter(x=[i for i in range(1,54)], y=history.history['val_loss'],mode='lines+markers',name='Valid loss'))
fig.show()

Predicting daily cases

test_inputs = corona[-test_data_size-7:].values
test_inputs = test_inputs.reshape(-1,1)
test_inputs = scaler.transform(test_inputs)
features_X , features_Y = sequences(test_inputs, seq_length=7)features_X = np.array(features_X)
features_X = np.reshape(features_X, (features_X.shape[0], features_X.shape[1], 1))

features_Y = np.array(features_Y)
features_Y = np.reshape(features_Y, (features_X.shape[0], 1))
predictions = model.predict(features_X)# We have to reverse the scaling of the test data and the model predictions:features_Y = scaler.inverse_transform(features_Y)
predictions = scaler.inverse_transform(predictions)
features = [list(i)[0] for i in list(features_Y )]
predict = [list(i)[0] for i in list(predictions)]
fig = go.Figure()
fig.add_trace(go.Scatter(x=corona.index[:len(train_data)], y= corona.Cases[:len(train_data)], mode='lines+markers',name='Historical Daily Cases'))
fig.add_trace(go.Scatter(x=corona.index[-len(test_data):],y=features , mode='lines+markers',name='Real Daily Cases'))
fig.add_trace(go.Scatter(x=corona.index[-len(test_data):], y=predict, mode='lines+markers',name='Predicted Daily Cases'))
fig.show()

Conclusion

The model performance is not that great, but this is expected, given the small amounts of data. The problem of predicting daily Covid-19 cases is a hard one. We’re amidst an outbreak, and there’s more to be done. Hopefully, everything will be back to normal after some time.

--

--

Wajdi HAJJI

Data Scientist and Machine Learning Enthusiast ❤❤❤