There are already many open-source tools available out there for analyzing and creating factor-based trading strategies. Among those, perhaps the most popular one is Alphalens by the Quantopian team, which has had a pivotal role in the career of thousands of aspiring Quants like myself looking to learn industry standard techniques for factor analysis.
As a personal project, and with the objective to better understand these techniques, I did my own implementation (and sometimes interpretation) of some the features of Alphalens using my favorite engines LEAN and QuantConnect. I am now sharing this product to help QuantConnect users perform factor research and strategy backtesting in their favorite platform. You will find the full product at the end of this article, which can be directly cloned into your QuantConnect account.
I would like to add here that even though the features in this product look very similar to those in Alphalens, I did not use (or look at) any of their open-source code to build it.
I believe factor investing is a very interesting area to explore within financial trading and this is my contribution to the community. I hope this product will improve over time and strongly encourage questions and suggestions!
This article consists of the following parts:
- Brief introduction to factor investing.
- Case Study - Factor Research: How to use the research tools to analyse a long-short equity strategy based on the combination of two factors (momentum and volatility)
- Case Study - Risk Research: Explore the influence that some external risk factors might have in our strategy.
- Case Study - Backtesting Algorithm: After completing the research phase, we will show how to seamlessly move the strategy to the algorithm to test it against historical data, including slippage and commissions modelling.
About Factor Investing
Factor investing is a popular investment approach consisting of finding quantifiable attributes of companies that are associated with their future performance for a given period of time.
A Matter Of Relative Performance
The general hypothesis behind this type of strategy is that it is possible to rank stocks based on certain factors that successfully separate winners from losers, and then construct a large portfolio that goes long the top stocks and short the bottom ones profiting from the spread between their future returns.
The concept of winners and losers here refers to the performance of the stocks relative to each other, not in absolute terms. This means that even if all stocks go up or down at the same time, we can still profit if we are able to segregate the ones that do better than others.
This concept makes factor investing very attractive since it is not about predicting individual stocks or specific moves up or down, but rather finding groups of assets that on average perform consistently above others.
What Makes A Factor
In principle, a factor can be anything that we can quantify for a company using the data available to us at that moment in time. We can calculate factors based on fundamentals (revenues, earnings, future growth, return on equity, profit margins, etc.), historical pricing and volume data (momentum, volatility, liquidity, technical indicators, statistical factors, etc.) or alternative data (news, market sentiment, etc.).
Perhaps a most interesting question would be what makes a "good" factor. The short answer is a factor that proves to be a consistent driver of future returns in either direction across many stocks. This article will present a series of standard statistical techniques commonly used to assess the quality of factors.
In any case, most factor-based strategies will include multiple factors that are uncorrelated with each other and when combined into one model help explain the performance of the stocks better than any individual factor alone. One way of doing this, as we will show later, is by creating a factor from some linear combination of other factors. Ultimately, Machine Learning can be really helpful to identify the relative importance of each factor and find a meaningful, low-bias combination of them.
Cyclical Performance
Finding factors that always work through any economic period and market context is not realistic. Instead, factors tend to move between cycles of overperformance and underperformance over time. Factor timing is incredibly hard, but at least we can try to better understand the drivers behind these periods. Risk analysis will help us with this task which essentially consists of finding external factors that are somehow correlated with our factors in order to explain, and perhaps even anticipate, these performance cycles.
Market Neutral
These strategies are commonly executed using a large number of stocks on each side (long-short), resulting in a market-neutral (zero beta) portfolio whose performance will solely depend on the quality of the ranking system.
Case Study - Factor Research
This section corresponds to the FactorAnalysis
class whose purpose is to build a long-short portfolio based on statistically significant factors. The Notebook also contains detailed step by step instructions.
For this first version we have focused on factors created using historical price and volume data simply because we found historical fundamental data (when requested for many tickers and years) is still too computational expensive to do in QuantConnect. Once this is improved we will work to add fundamental factors to the product.
Initialize Data
The first thing we need to do is to add our start and end dates and initialize the FactorAnalysis
class by passing a list of tickers. We need to provide a manual list of tickers because at the time of this article QuantConnect does not offer dynamic universe in the research environment (looking forward to that!). In our example, we're using a list with all the SP500 constituents as of Q4 2020.
# select start and end date for analysis
startDate = datetime(2017, 1, 1)
endDate = datetime(2020, 10, 1)
# initialize factor analysis
factorAnalysis = FactorAnalysis(qb, tickers, startDate, endDate, Resolution.Daily)
This is going to create a MultiIndex Dataframe with the historical OHLCV daily data needed for analysis.
factorAnalysis.ohlcvDf
Create Factors
Now we are going to create two very simple factors, Momentum and Volatility, using the CustomFactor
function as follows.
# example of calculating multiple factors using the CustomFactor function
from scipy.stats import skew, kurtosis
def CustomFactor(x):
'''
Description:
Applies factor calculations to a SingleIndex DataFrame of historical data OHLCV by symbol
Args:
x: SingleIndex DataFrame of historical OHLCV data for each symbol
Returns:
The factor value for each day
'''
try:
# momentum factor --------------------------------------------------------------------------
closePricesTimeseries = x['close'].rolling(252) # create a 252 day rolling window of close prices
returns = x['close'].pct_change().dropna() # create a returns series
momentum = closePricesTimeseries.apply(lambda x: (x[-1] / x[-252]) - 1)
# volatility factor ------------------------------------------------------------------------
volatility = returns.rolling(252).apply(lambda x: np.nanstd(x, axis = 0))
# get a dataframe with all factors as columns --------------------------------------------
factors = pd.concat([momentum, volatility], axis = 1)
except BaseException as e:
factors = np.nan
return factors
What's going on there?
- Under the hood, this function gets applied to the OHLCV DataFrame grouped by symbol. That means we can perform calculations for each symbol using any of the OHLCV columns in the grouped 'x' DataFrame.
- We want to calculate factors in a rolling fashion so we get a value for each day that is calculated using data up until that day and including that day. By doing this we assume that in backtesting (and live trading) the calculations and trading decisions happen after the market close and before the next open.
- Finally, we concatenate the factors so we get the resulting DataFrame with one column per factor.
# example of a multiple factors
factorsDf = factorAnalysis.GetFactorsDf(CustomFactor)
factorsDf
In order to standardize the data, we apply winsorization and zscore normalization. We won't go over that here so please refer to the Notebook for more information on this.
We have two single factors and we want to combine them into one factor that is some linear combination of the two. We do this using the combinedFactorWeightsDict
that takes the factor names and the weights. Note how we could reverse the effect of a factor by assigning a negative weight here. In this example, we will just sum the two.
# dictionary containing the factor name and weights for each factor
combinedFactorWeightsDict = {'Factor_1': 1, 'Factor_2': 1}
#combinedFactorWeightsDict = None # None to not add a combined factor when using single factors
finalFactorsDf = factorAnalysis.GetCombinedFactorsDf(standardizedFactorsDf, combinedFactorWeightsDict)
finalFactorsDf
Create Quantiles And Add Forward Returns
It is time to create our factor quantiles and calculate forward returns in order to assess the relationship between the two.
# inputs for forward returns calculations
field = 'open' # choose between open, high, low, close prices to calculate returns
forwardPeriods = [1, 5, 21] # choose periods for forward return calculations
# inputs for quantile calculations
factor = 'Combined_Factor' # choose a factor to create quantiles
q = 5 # choose the number of quantile groups to create
factorQuantilesForwardReturnsDf = factorAnalysis.GetFactorQuantilesForwardReturnsDf(finalFactorsDf, field,
forwardPeriods,
factor, q)
factorQuantilesForwardReturnsDf
- In order to calculate forward returns, we need to choose the price we want to use for that and the different periods we want to get. In this example, we are calculating the 1, 5 and 21 forward returns based on Open prices. We use Open prices in order to replicate how the event-driven backtesting will work: we make all calculations after the market close (with data up until then and including that data point) and rebalance positions at the market open.
- We select the factor we want to use for the quantiles and how many quantile groups we want to create. We are using the Combined_Factor and 5 quintiles here.
Let's have a look at the mean returns.
The next step is to visualize the cumulative returns from each quintile over time. In order to do that, we are going to group by quintile every day and calculate the return for each quintile/day by either using equal-weighting (mean) or factor-based weighting (weight the return of each stock in the quintile by its factor value).
forwardPeriod = 1 # choose the forward period to use for returns
weighting = 'mean' # mean/factor
returnsByQuantileDf = factorAnalysis.GetReturnsByQuantileDf(factorQuantilesForwardReturnsDf,
forwardPeriod, weighting)
returnsByQuantileDf
Here we are ideally looking for returns series that deviate from each other in the direction of the quintiles order (top quintile going up while bottom quintile going down).
Create a Long-Short Portfolio
We are finally in a position to construct a portfolio that exploits the spread between the top and bottom quintiles. In order to do this, we need to select two quintiles and provide some weighting that we want to apply to each of them using the portfolioWeightsDict
. This allows for some flexibility in the way we create the portfolio as we can give more or less weight to one of the quintiles.
# dictionary containing the quintile group names and portfolio weights for each
portfolioWeightsDict = {'Group_5': 1, 'Group_1': -1}
portfolioLongShortReturnsDf = factorAnalysis.GetPortfolioLongShortReturnsDf(returnsByQuantileDf, portfolioWeightsDict)
portfolioLongShortReturnsDf
And the plot!
Spearman Rank Correlation Coefficient
A standard way of assessing the degree of correlation between our factor and forward returns is the Spearman Rank Correlation (Information Coefficient).
The Spearman Rank Correlation measures the strength and direction of association between two ranked variables. It is the non-parametric version of the Pearson correlation and focuses on the monotonic relationship between two variables rather than their linear relationship. Below we plot the daily IC between the factor values and each forward period return, along with a 21-day moving average.
factorAnalysis.PlotIC(factorQuantilesForwardReturnsDf)
Case Study - Risk Research
This section corresponds to the RiskAnalysis
class whose purpose is to discover what risk factors our strategy is exposed to and to what degree. As we will see below in more detail, these external factors can be any time series of returns that our portfolio could have some exposure to. Some popular risk factors are provided here (Fama-French Five Factors, Industry Factors), but the user can easily test any other by passing its time series of returns. The Notebook also contains detailed step by step instructions.
Initialize Data
Let's initialize the RiskAnalysis
class.
# initialize risk analysis
riskAnalysis = RiskAnalysis(qb)
After initializing the RiskAnalysis
class, we get two datasets with classic risk factors:
- Fama-French 5 Factors: Historical daily returns of Market Excess Return (Mkt-RF), Small Minus Big (SMB), High Minus Low (HML), Robust Minus Weak (RMW) and Conservative Minus Aggressive (CMA).
- 12 Industry Factors: Consumer Nondurables (NoDur), Consumer durables (Durbl), Manufacturing (Manuf), Energy (Enrgy), Chemicals (Chems), Business Equipment (BusEq), Telecommunications (Telcm), Utilities (Utils), Wholesale and Retail (Shops), Healthcare (Hlth), Finance (Money), Other (Other)
Visit this site for more factor datasets to add to this analysis.
# fama-french 5 factors
riskAnalysis.ffFiveFactorsDf.head()
# 12 industry factors
riskAnalysis.industryFactorsDf.head()
Let's take a look at the cumulative returns of our long-short strategy together with the returns of the Fama-French 5 Factors.
We can visualize the correlations between the risk factors and our strategy returns.
# plot correlation matrix
factorAnalysis.PlotFactorsCorrMatrix(combinedReturnsDf))
Run Regression Analysis
- Fit a Regression Model to the data to analyse linear relationships between our strategy returns and the external risk factors.
- Partial Regression plots. When performing multiple linear regression, these plots are useful in analysing the relationship between each independent variable and the response variable while accounting for the effect of all the other independent variables peresent in the model. Calculations are as follows (Wikipedia):
- Compute the residuals of regressing the response variable against the independent variables but omitting Xi.
- Compute the residuals from regressing Xi against the remaining independent variables.
- Plot the residuals from (1) against the residuals from (2).
riskAnalysis.PlotRegressionModel(combinedReturnsDf, dependentColumn = 'Strategy')
Plot Rolling Regression Coefficients
The above relationships are not static through time, therefore it is useful to visualize how these coefficients behave over time by running a Rolling Regression Model (with a given lookback period).
riskAnalysis.PlotRollingRegressionCoefficients(combinedReturnsDf, dependentColumn = 'Strategy', lookback = 126)
Plot Distribution Of Rolling Exposures
We can now visualize the historical distributions of the rolling regression coefficients in order to get a better idea of the variability of the data.
riskAnalysis.PlotBoxPlotRollingFactorExposure(combinedReturnsDf, dependentColumn = 'Strategy', lookback = 126)
Case Study - Backtesting Algorithm
The purpose of the research process illustrated above is purely to determine if there is a significant relationship between our factors and the future returns of those stocks. That means the cumulative returns we saw are not realistic and they assumed daily rebalancing of hundreds of stocks without accounting for any slippage or commissions.
In order to test how this strategy would have performed historically we need to run a proper backtest, and for that we have to move from the Research Notebook to the Algorithm Framework.
Below I will explain the most important features and scripts of this part of the product.
Algorithm Framework - main.py
The main.py
script includes the follwing user-defined inputs worth mentioning.
# date rule for rebalancing our portfolio by updating long-short positions based on factor values
rebalancingFunc = Expiry.EndOfMonth
# number of stocks to keep for factor modelling calculations
nStocks = 100
# number of positions to hold on each side (long/short)
positionsOnEachSide = 20
# lookback for historical data to calculate factors
lookback = 252
# select the leverage factor
leverageFactor = 1
- We first need to select how often we want to rebalance the portfolio (i.e. recalculate all factors and portfolio weights). Here you can choose among a number of date rules such as
Expiry.EndOfMonth
- At every rebalancing, the algorithm will create a dynamic universe of stocks based on DollarVolume and MarketCap that will be used to calculate the factors on (
nStocks
) and ultimately select the top and bottom stocks to trade (positionsOnEachSide
). - Finally, we need to decide how much historical data we want for our factors calculations using
lookback
- There is also a
leverageFactor
parameter that can be used to modify the account leverage.
Algorithm Framework - classSymbolData.py
The Backtesting Algorithm has been designed to make it easy to quickly add and remove the factors previously analysed in the notebook. We only need to add the function that calculates the factor to the SymbolData class. For example, have a look at how we add the Momentum and Volatility factors from before.
def CalculateMomentum(self, history):
closePrices = history.loc[self.Symbol]['close']
momentum = (closePrices[-1] / closePrices[-252]) - 1
return momentum
def CalculateVolatility(self, history):
closePrices = history.loc[self.Symbol]['close']
returns = closePrices.pct_change().dropna()
volatility = np.nanstd(returns, axis = 0)
return volatility
You can add as many functions as you want and then simply include them or exclude them from the strategy by simply commenting out the function call here. Note how there are a few functions for other factors that we commented out to leave out of the strategy.
def CalculateFactors(self, history, fundamentalDataBySymbolDict):
self.fundamentalDataDict = fundamentalDataBySymbolDict[self.Symbol]
self.momentum = self.CalculateMomentum(history)
self.volatility = self.CalculateVolatility(history)
#self.skewness = self.CalculateSkewness(history)
#self.kurt = self.CalculateKurtosis(history)
#self.distanceVsHL = self.CalculateDistanceVsHL(history)
#self.meanOvernightReturns = self.CalculateMeanOvernightReturns(history)
And at the end we just need to add the chosen factors to the property factorsList
@property
def factorsList(self):
technicalFactors = [self.momentum, self.volatility]
Algorithm Framework - HelperFunctions.py
In the Research Notebook we could only add factors created using OHLCV data for the reasons already stated above. However, the Backtesting Algorithm allows to add fundamental factors very easily by using the GetFundamentalDataDict
function. As you can see below, we create a dictionary containing the fundamental ratio and its desired direction in the model (1 for a positive effect or -1 for a negative one). For a list of all the fundamental data available in QuantConnect please refer to this page https://www.quantconnect.com/docs/data-library/fundamentals
# dictionary of symbols containing factors and the direction of the factor (1 for sorting descending and -1 for sorting ascending)
fundamentalDataBySymbolDict[x.Symbol] = {
#fundamental.ValuationRatios.BookValuePerShare: 1,
#fundamental.FinancialStatements.BalanceSheet.TotalEquity.Value: -1,
#fundamental.OperationRatios.OperationMargin.Value: 1,
#fundamental.OperationRatios.ROE.Value: 1,
#fundamental.OperationRatios.TotalAssetsGrowth.Value: 1,
#fundamental.ValuationRatios.PERatio: 1
}
Finally, very much like we did in the Research Notebook, in the Backtesting Algorithm we can also give different weights to each factor to create a combined factor. We do that using the GetLongShortLists
function in the HelperFunctions.py
script as per below.
normFactorsDf['combinedFactor'] = normFactorsDf['Factor_1'] * 1 + normFactorsDf['Factor_2'] * 1
Clone The Algorithm
And that was it! Now you can clone the below algorithm into your QuantConnect account and start playing with the different features yourself. The algorithm also has a number of interesting backtesting charts such as Drawdown and Total Portfolio Exposure %, so remember to activate those on the Select Chart box (top right corner of backtesting page).
FACTOR INVESTING SYSTEM (Research & Algorithm)
We're always here to answer your questions so please ask away!
Do you have a strategy of your own that you would like to backtest and automate? Learn about our consulting services and get in touch atadmin@innoquantivity.com
Found this post useful? You can support this site by making a donation using the link below!
Leave a comment