Datenanalyse und Stochastische Modellierung - Dr. Philipp Meyer
Exercise 8

Sunspots

We want to predict the time series of sunspots. Monthly data can be found at https://www.sidc.be/SILSO/datafiles. We want to predict future sunspot numbers using neural networks.

  • Load the data into a numpy array and plot the time series in a figure. Normalize the data, so it lies in the range [0,1]
  • Plot one measure of autocorrelations (e.g. TAMSD, Power spectrum, ...). Which features can you see that contribute to predictability in a positive or negative way?
  • Divide the time series into a training set (3/4 of the data) and the validation set (1/4 of the data). Then create an array Y_train with the values to be fitted and X_train with the corresponding inputs. A reasonable number of inputs is 60 (the 60 values previous to Y) but you can experiment with different numbers. You can use the following code:
    
          linp = 60
          trainL = len(x)//4 *3
    
          X_train = np.zeros((trainL-linp-1,linp))
          Y_train = np.zeros((trainL-linp-1,1))
          for i in range(trainL-linp-1):
              X_train[i] = x[i:i+linp]
              Y_train[i] = x[i+linp]
          
    Create the arrays for testing on the remaining 1/4 of the data in the same way.
  • Create a neural network using the class sklearn.neural_network.MLPRegressor. Set early_stopping to False and validation_fraction to 0. You can try different numbers of layers, different layer sizes, and activation functions and compare their results - experiment with these parameters!
  • Create a for loop with at least 50 iterations (you might need more). At each iteration call the partial_fit(X_train,Y_train) function of your MLPRegressor and then calculate the mean squared error of your traing set and your validation set using the predict(X) function of your MLPRegressor. In the end, plot the time evolution of the mean squared error of both traing and validation set in one figure (it might be helpful to set a log scale). In this way you can see, if the number of iterations you did was adequate, to few or to many, dependent on whehter or not after the last prediction the error on the validation set reached its minimum.
  • Plot the last 100 datapoints of the sunspot time series together with the corresponding prediction of your model. How good do the predictions look?
  • Calculate the permutation importance of the 100 features using the module sklearn.inspection.permutation_importance on both the training set and the test set. Plot the result. Was the model trained well?
  • Download the dataset of temperatures in Potsdam from exercise 4 with a monthly resolution.
  • Plot both datasets - temperatures and sunspots - together in one figure
  • Divide the dataset into a training set and a test set
  • Fit a neural network to predict the next day's temperature from the past 12 temperature values and using the fit(X_train,Y_train) function with validation_fraction=0.1 und early_stopping=True. Print the error on the training set and the test set.
  • Fit a second neural network to predict the next day's temperature from the past 12 temperature values and the last 12 sunspot numbers. Did the model improve from using the sunspot numbers?
  • Calculate the permutation importance of the 24 features using sklearn.inspection.permutation_importance on both the training set and the test set. Plot the result.