This function is used to use ML to predict the returns that will later be used for a trading strategy

backtesting_returns(
  data_ml,
  return_prediction_object = NULL,
  return_label,
  features,
  rolling = TRUE,
  window_size,
  step_size = 1,
  offset = 0,
  in_sample = TRUE,
  ml_config,
  append = FALSE,
  num_cores = NULL
)

Arguments

return_prediction_object

an object of class returnPrediction that should be used to store the predictions. Defaul(NULL) creates a new one. In case an existing object is passed, given append=TRUE, new predictions are added. If append=FALSE, the object is overwritten.

return_label

the prediction label that should be used for the ML model. It should already be appropriately shifted (and date t the label should be from date t+1).

features

a vector of features that should be used for the ML model.

rolling

if TRUE, the function will use a rolling window approach to predict the returns. If FALSE, the function will use an expanding window approach.

window_size

(either in number of time steps or in years or months as "1 year" or "1 month"!) the size of the window that should be used for the rolling window approach. if rolling=FALSE this is the starting window for the expoaning window approach

step_size

the amount of days the prediction window should be moved forward. Default is 1. If (e.g.) set to three, returns will be predicted for t, t+1 and t+2 (corresponding to t+1 and t+2 and t+3) in the original datset. Only then will the ML model be retrained.

offset

(either in number of time steps or in years or months as "1 year" or "1 month"!) the size of data that should be left unused between training data and prediction (to avoid look-ahead bias). Default is 0.

in_sample

if TRUE, the function will also provide (in-sample) predictions for the training period (+ offfest)

ml_config

a list that contains the configuration for the ML model. It should contain the following elements:

append

if TRUE, the function will append the predicted returns to the original dataset. If FALSE, the function will return a new dataset that contains the predicted returns.

data

ML dataset (tibble/data.frame) in long format that should contain the features and the return_label as well as the stock_ids (first column) and dates (second column). FOr most ML algorithms to work this data set should not contain missing values. Sometimes it needs to be balanced in terms of number of stocks available at each point in time.

verbose

num_cores the number of cores that should be used for parallel processing. If set to NULL the ML iterations will be done sequentially.

Value

a tibble with the stock_id, date and the predicted returns

Examples

if (FALSE) {
data(data_ml)
data <- data_ml
return_label <- "R1M_Usd"
features <- c("Div_Yld", "Eps", "Mkt_Cap_12M_Usd", "Mom_11M_Usd", "Ocf", "Pb", "Vol1Y_Usd")
rolling <- TRUE; window_size= "5 years"; step_size = "3 months"; offset = "1 year"; in_sample = TRUE
ml_config <- list(ols_pred = list(pred_func="ols_pred", config=list()),
                  xgb_pred = list(pred_func="xgb_pred", config1=list(nrounds=100, max_depth=3, eta=0.3, objective="reg:squarederror"),
                                                    config2=list(nrounds=100, max_depth=4, eta=0.1, objective="reg:squarederror")))
rp <- backtesting_returns(data=data, return_prediction_object=NULL,
  return_label, features, rolling, window_size, step_size, offset, in_sample, ml_config, append=FALSE, num_cores=NULL)

}