R/backtesting_returns.R
backtesting_returns.Rd
This function is used to use ML to predict the returns that will later be used for a trading strategy
backtesting_returns(
data_ml,
return_prediction_object = NULL,
return_label,
features,
rolling = TRUE,
window_size,
step_size = 1,
offset = 0,
in_sample = TRUE,
ml_config,
append = FALSE,
num_cores = NULL
)
an object of class returnPrediction that should be used to store the predictions. Defaul(NULL) creates a new one. In case an existing object is passed, given append=TRUE, new predictions are added. If append=FALSE, the object is overwritten.
the prediction label that should be used for the ML model. It should already be appropriately shifted (and date t the label should be from date t+1).
a vector of features that should be used for the ML model.
if TRUE, the function will use a rolling window approach to predict the returns. If FALSE, the function will use an expanding window approach.
(either in number of time steps or in years or months as "1 year" or "1 month"!) the size of the window that should be used for the rolling window approach. if rolling=FALSE this is the starting window for the expoaning window approach
the amount of days the prediction window should be moved forward. Default is 1. If (e.g.) set to three, returns will be predicted for t, t+1 and t+2 (corresponding to t+1 and t+2 and t+3) in the original datset. Only then will the ML model be retrained.
(either in number of time steps or in years or months as "1 year" or "1 month"!) the size of data that should be left unused between training data and prediction (to avoid look-ahead bias). Default is 0.
if TRUE, the function will also provide (in-sample) predictions for the training period (+ offfest)
a list that contains the configuration for the ML model. It should contain the following elements:
if TRUE, the function will append the predicted returns to the original dataset. If FALSE, the function will return a new dataset that contains the predicted returns.
ML dataset (tibble/data.frame) in long format that should contain the features and the return_label as well as the stock_ids (first column) and dates (second column). FOr most ML algorithms to work this data set should not contain missing values. Sometimes it needs to be balanced in terms of number of stocks available at each point in time.
num_cores the number of cores that should be used for parallel processing. If set to NULL the ML iterations will be done sequentially.
a tibble with the stock_id, date and the predicted returns
if (FALSE) {
data(data_ml)
data <- data_ml
return_label <- "R1M_Usd"
features <- c("Div_Yld", "Eps", "Mkt_Cap_12M_Usd", "Mom_11M_Usd", "Ocf", "Pb", "Vol1Y_Usd")
rolling <- TRUE; window_size= "5 years"; step_size = "3 months"; offset = "1 year"; in_sample = TRUE
ml_config <- list(ols_pred = list(pred_func="ols_pred", config=list()),
xgb_pred = list(pred_func="xgb_pred", config1=list(nrounds=100, max_depth=3, eta=0.3, objective="reg:squarederror"),
config2=list(nrounds=100, max_depth=4, eta=0.1, objective="reg:squarederror")))
rp <- backtesting_returns(data=data, return_prediction_object=NULL,
return_label, features, rolling, window_size, step_size, offset, in_sample, ml_config, append=FALSE, num_cores=NULL)
}