The next Lancet retraction? [“Subcortical brain volume differences in participants with attention deficit hyperactivity disorder in children and adults”]

By | ai, bigdata, machinelearning

Someone who prefers to remain anonymous asks for my thoughts on this post by Michael Corrigan and Robert Whitaker, “Lancet Psychiatry Needs to Retract the ADHD-Enigma Study: Authors’ conclusion that individuals with ADHD have smaller brains is belied by their own data,” which begins:

Lancet Psychiatry, a UK-based medical journal, recently published a study titled Subcortical brain volume differences in participants with attention deficit hyperactivity disorder in children and adults: A cross-sectional mega-analysis. According to the paper’s 82 authors, the study provides definitive evidence that individuals with ADHD have altered, smaller brains. But as the following detailed review reveals, the study does not come close to supporting such claims.

Below are tons of detail, so let me lead with my conclusion, which is that the criticisms coming from Corrigan and Whitaker seem reasonable to me. That is, based on my quick read, the 82 authors of that published paper seem to have made a big mistake in what they wrote.

I’d be interested to see if the authors have offered any reply to these criticisms. The article has just recently come out—the journal publication is dated April 2017—and I’d like to see what the authors have to say.

OK, on to the details. Here are Corrigan and Whitaker:

The study is beset by serious methodological shortcomings, missing data issues, and statistical reporting errors and omissions. The conclusion that individuals with ADHD have smaller brains is contradicted by the “effect-size” calculations that show individual brain volumes in the ADHD and control cohorts largely overlapped. . . .

Their results, the authors concluded, contained important messages for clinicians: “The data from our highly powered analysis confirm that patients with ADHD do have altered brains and therefore that ADHD is a disorder of the brain.” . . .

The press releases sent to the media reflected the conclusions in the paper, and the headlines reported by the media, in turn, accurately summed up the press releases. Here is a sampling of headlines:

Given the implications of this study’s claims, it deserves to be closely analyzed. Does the study support the conclusion that children and adults with ADHD have “altered brains,” as evidenced by smaller volumes in different regions of the brain? . . .

Alternative Headline: Large Study Finds Children with ADHD Have Higher IQs!

To discover this finding, you need to spend $31.50 to purchase the article, and then make a special request to Lancet Psychiatry to send you the appendix. Then you will discover, on pages 7 to 9 in the appendix, a “Table 2” that provides IQ scores for both the ADHD cohort and the controls.

Although there were 23 clinical sites in the study, only 20 reported comparative IQ data. In 16 of the 20, the ADHD cohort had higher IQs on average than the control group. In the other four clinics, the ADHD and control groups had the same average IQ (with the mean IQ scores for both groups within two points of each other.) Thus, at all 20 sites, the ADHD group had a mean IQ score that was equal to, or higher than, the mean IQ score for the control group. . . .

And why didn’t the authors discuss the IQ data in their paper, or utilize it in their analyses? . . . Indeed, if the IQ data had been promoted in the study’s abstract and to the media, the public would now be having a new discussion: Is it possible that children diagnosed with ADHD are more intelligent than average? . . .

They Did Not Find That Children Diagnosed with ADHD Have Smaller Brain Volumes . . .

For instance, the authors reported a Cohen’s d effect size of .19 for differences in the mean volume of the accumbens in children under 15. . . in this study, for youth under 15, it was the largest effect size of all the brain volume comparisons that were made. . . . Approximately 58% of the ADHD youth in this convenience sample had an accumbens volume below the average in the control group, while 42% of the ADHD youth had an accumbens volume above the average in the control group. Also, if you knew the accumbens volume of a child picked at random, you would have a 54% chance that you could correctly guess which of the two cohorts—ADHD or healthy control—the child belonged to. . . . The diagnostic value of an MRI brain scan, based on the findings in this study, would be of little more predictive value than the toss of a coin. . . .

The authors reported that the “volumes of the accumbens, amygdala, caudate, hippocampus, putamen, and intracranial volume were smaller in individuals with ADHD compared with controls in the mega-analysis” (p. 1). If this is true, then smaller brain volumes should show up in the data from most, if not all, of the 21 sites that had a control group. But that was not the case. . . . The problem here is obvious. If authors are claiming that smaller brain regions are a defining “abnormality” of ADHD, then such differences should be consistently found in mean volumes of ADHD cohorts at all sites. The fact that there was such variation in mean volume data is one more reason to see the authors’ conclusions—that smaller brain volumes are a defining characteristic of ADHD—as unsupported by the data. . . .

And now here’s what the original paper said:

We aimed to investigate whether there are structural differences in children and adults with ADHD compared with those without this diagnosis. In this cross-sectional mega-analysis [sic; see P.P.S. below], we used the data from the international ENIGMA Working Group collaboration, which in the present analysis was frozen at Feb 8, 2015. Individual sites analysed structural T1-weighted MRI brain scans with harmonised protocols of individuals with ADHD compared with those who do not have this diagnosis. . . .

Our sample comprised 1713 participants with ADHD and 1529 controls from 23 sites . . . The volumes of the accumbens (Cohen’s d=–0·15), amygdala (d=–0·19), caudate (d=–0·11), hippocampus (d=–0·11), putamen (d=–0·14), and intracranial volume (d=–0·10) were smaller in individuals with ADHD compared with controls in the mega-analysis. There was no difference in volume size in the pallidum (p=0·95) and thalamus (p=0·39) between people with ADHD and controls.

The above demonstrates some forking paths, and there are a bunch more in the published paper, for example:

Exploratory lifespan modelling suggested a delay of maturation and a delay of degeneration, as e ect sizes were highest in most subgroups of children (21 years): in the accumbens (Cohen’s d=–0·19 vs –0·10), amygdala (d=–0·18 vs –0·14), caudate (d=–0·13 vs –0·07), hippocampus (d=–0·12 vs –0·06), putamen (d=–0·18 vs –0·08), and intracranial volume (d=–0·14 vs 0·01). There was no di erence between children and adults for the pallidum (p=0·79) or thalamus (p=0·89). Case-control differences in adults were non-signi cant (all p>0·03). Psychostimulant medication use (all p>0·15) or symptom scores (all p>0·02) did not in uence results, nor did the presence of comorbid psychiatric disorders (all p>0·5). . . .

Outliers were identified at above and below one and a half times the interquartile range per cohort and group (case and control) and were excluded . . . excluding collinearity of age, sex, and intracranial volume (variance in ation factor <1·2) . . . The model included diagnosis (case=1 and control=0) as a factor of interest, age, sex, and intracranial volume as fixed factors, and site as a random factor. In the analysis of intracranial volume, this variable was omitted as a covariate from the model. Handedness was added to the model to correct for possible effects of lateralisation, but was excluded from the model when there was no significant contribution of this factor. . . . stratified by age: in children aged 14 years or younger, adolescents aged 15–21 years, and adults aged 22 years and older. We removed samples that were left with ten patients or fewer because of the stratification. . . .

Forking paths are fine; I have forking paths in every analysis I’ve ever done. But forking paths render published p-values close to meaningless; in particular I have no reason to take seriously a statement such as, “p values were significant at the false discovery rate corrected threshold of p=0·0156,” from the summary of the paper.

So let’s forget about p-values and just look at the data graphs, which appear in the published paper:



Unfortunately these are not raw data or even raw averages for each age; instead they are “moving averages, corrected for age, sex, intracranial volume, and site for the subcortical volumes.” But we’ll take what we’ve got.

From the above graphs, it doesn’t seem like much of anything is going on: the blue and red lines cross all over the place! So now I don’t understand this summary graph from the paper:

I mean, sure, I see it for Accumbens, I guess, if you ignore the older people. But, for the others, the lines in the displayed age curves cross all over the place.

The article in question has the following list of authors: Martine Hoogman, Janita Bralten, Derrek P Hibar, Maarten Mennes, Marcel P Zwiers, Lizanne S J Schweren, Kimm J E van Hulzen, Sarah E Medland, Elena Shumskaya, Neda Jahanshad, Patrick de Zeeuw, Eszter Szekely, Gustavo Sudre, Thomas Wolfers, Alberdingk M H Onnink, Janneke T Dammers, Jeanette C Mostert, Yolanda Vives-Gilabert, Gregor Kohls, Eileen Oberwelland, Jochen Seitz, Martin Schulte-Rüther, Sara Ambrosino, Alysa E Doyle, Marie F Høvik, Margaretha Dramsdahl, Leanne Tamm, Theo G M van Erp, Anders Dale, Andrew Schork, Annette Conzelmann, Kathrin Zierhut, Ramona Baur, Hazel McCarthy, Yuliya N Yoncheva, Ana Cubillo, Kaylita Chantiluke, Mitul A Mehta, Yannis Paloyelis, Sarah Hohmann, Sarah Baumeister, Ivanei Bramati, Paulo Mattos, Fernanda Tovar-Moll, Pamela Douglas, Tobias Banaschewski, Daniel Brandeis, Jonna Kuntsi, Philip Asherson, Katya Rubia, Clare Kelly, Adriana Di Martino, Michael P Milham, Francisco X Castellanos, Thomas Frodl, Mariam Zentis, Klaus-Peter Lesch, Andreas Reif, Paul Pauli, Terry L Jernigan, Jan Haavik, Kerstin J Plessen, Astri J Lundervold, Kenneth Hugdahl, Larry J Seidman, Joseph Biederman, Nanda Rommelse, Dirk J Heslenfeld, Catharina A Hartman, Pieter J Hoekstra, Jaap Oosterlaan, Georg von Polier, Kerstin Konrad, Oscar Vilarroya, Josep Antoni Ramos-Quiroga, Joan Carles Soliva, Sarah Durston, Jan K Buitelaar, Stephen V Faraone, Philip Shaw, Paul M Thompson, Barbara Franke.

I also found a webpage for their research group, featuring this wonderful map:

The number of sites looks particularly impressive when you include each continent twice like that. But they should really do some studies in Antarctica, given how huge it appears to be!

P.S. Following the links, I see that Corrigan and Whitaker come into this with a particular view:

Mad in America’s mission is to serve as a catalyst for rethinking psychiatric care in the United States (and abroad). We believe that the current drug-based paradigm of care has failed our society, and that scientific research, as well as the lived experience of those who have been diagnosed with a psychiatric disorder, calls for profound change.

This does not mean that the critics are wrong—presumably the authors of the original paper came into their research with their own strong views—; it can just be helpful to know where they’re coming from.

P.P.S. The paper discussed above uses the term “mega-analysis.” At first I thought this might be some sort of typo, but apparently the expression does exist and has been around for awhile. From my quick search, it appears that the term was first used by James Dillon in a 1982 article, “Superanalysis,” in Evaluation News, where he defined mega-analysis as “a method for synthesizing the results of a series of meta-analyses.”

But in the current literature, “mega-analysis” seems to simply refer to a meta-anlaysis that uses the raw data from the original studies.

If so, I’m unhappy with the term “mega-analysis” because: (a) The “mega” seems a bit hypey, (b) What if the original studies are small? Then even all the data combined might not be so “mega”?, and (c) I don’t like the implication that plain old “meta-analysis” doesn’t use the raw data. I’m pretty sure that the vast majority of meta-analyses use only published summaries, but I’ve always thought of it as the preferred version of meta-anlaysis to use the original data.

I bring up this mega-analysis thing not as a criticism of the Hoogman et al. paper—they’re just using what appears to be a standard term in their field—but just as an interesting side-note.

P.P.P.S. The above post represents my current impression. As I wrote, I’d be interested to see the original authors’ reply to the criticism. Lancet does have a pretty bad reputation—it’s known for publishing flawed, sensationalist work—but I’m sure they run the occasional good article too. So I wouldn’t want to make any strong judgments in this case before hearing more.

P.P.P.P.S. Regarding the title of this post: No, I don’t think Lancet would ever retract this paper, even if all the above criticisms are correct. It seems that retraction is used only in response to scientific misconduct, not in response to mere error. So when I say “retraction,” I mean what one might call “conceptual retraction.” The real question is: Will this new paper join the list of past Lancet papers which we would not want to take seriously, and which we regret were ever published?

The post The next Lancet retraction? [“Subcortical brain volume differences in participants with attention deficit hyperactivity disorder in children and adults”] appeared first on Statistical Modeling, Causal Inference, and Social Science.


Source link

Order Type and Parameter Optimization in quantstrat

By | ai, bigdata, machinelearning

(This article was first published on R – Curtis Miller's Personal Website, and kindly contributed to R-bloggers)

DISCLAIMER: Any losses incurred based on the content of this post are the responsibility of the trader, not the author. The author takes no responsibility for the conduct of others nor offers any guarantees.

Introduction

You may have noticed I’ve been writing a lot about quantstrat, an R package for developing and backtesting trading strategies. The package strikes me as being so flexible, there’s still more to write about. So far I’ve introduced the package here and here, then recently discussed the important of accounting for transaction costs (and how to do so).

Next up, I want to look at modelling different kinds of orders. The strategy I’ve been looking at so far places orders as they happen. However, traders may desire to place orders such as a stop-loss (to prevent further losses, with the belief that if a stock’s price moves down it will continue to move down, causing further losses), a target price (there should be a target price that, when reached, terminates a trade, locking in gains), and sometimes a trailing stop (if a stock has made gains, better to lock them in at a slightly lower price should a slight reversal be seen than lose all gains in a major reversal). When I took a class on trading, a recommended rule was to have an order representing the target price and a stop-loss order, always. This was mostly to enforce structure and trading discipline, helping take emotions out of the equation. Furthermore, the loss that would result from a stop-loss’s trigger should be half of potential gain, as per the justification that if half the time you “win” and half the time you “lose”, your gains would be twice as large as your losses.

I don’t have nearly enough experience in trading to have a respectable opinion, but what opinion I do have doesn’t look favorably on stop-loss orders as I learned them. I would implement the so-called 2-to-1 rule (that is, target profits are twice maximum losses) in simulations and watch as my orders would be wiped out by large, spurious, immediately corrected movements. It seemed as if reversals would inevitably follow the trigger of the stop-loss that closed my position, locking in losses in addition to transaction costs. I also closely tied, in my mind, the 2-to-1 rule with stop-loss orders in general, and it doesn’t take long to realize the logic of that rule is total bunk. If price movements are a random walk, having a shorter stop-loss means that there’s a good chance the stop-loss would be triggered before the profitable exit order. You might be making twice as much for profitable orders, but you’re triggering the stop-loss twice as often, nullifying the effect (while driving up transaction costs). If you were setting your profit target shorter than the stop-loss, at least more trades would be profits instead of most being losses. (In the random-walk situation it makes no difference; expected profits are zero in either case. But at least it feels better.) I never tried a trailing stop, but I’m sure my opinion would be about the same; “noise” is more likely to close a position than an actual reversal in trend, in my opinion.

I appreciate the call for combating personal psychology, enforcing discipline, having an exit plan and a guard against unacceptably high losses. I’m aware that risk management is an important part of any trading system, and as I write this I am sure that there are better approaches than what I had learned (I hold out hope for VaR assessment, though I have yet to try it out). But I have little love for the approach to stop-losses as I learned it, which is partly why the strategies I’ve looked at so far have closed positions only when the indicators pointed to a change in regime.

That being said, I’m still interested in modelling them. I could be wrong. And more intelligent approaches to risk management likely resemble a simple stop-loss.

I also want to look at parameter optimization with quantstrat. We’ve been using 20-day and 50-day moving averages without questioning what makes those numbers special. quantstrat provides functions that allow for backtesting a strategy while trying out multiple parameters so one can hopefully find a more profitable combination. Now, that said, what I will likely have done is overfit my strategy; the optimal combination looks good in backtesting thanks to torturing the data, but when deployed it will likely perform miserably (or at least not nearly as well as the backtest would suggest). A correction for this should be applied. In the future I may write about how to handle overfitting, but for now, I’m just getting started with the tools.

Adding Stop-Loss Rules

If you’ve been following along with my articles, the next few lines of code should look very familiar, so I won’t repeat the explanation. Let’s just say we’re going to pick up where the article on accounting for transaction costs left off. We’ll work with a $100,000 trading portfolio that faces $5 in total for transaction costs per trade. We’ll even tack on an additional 0.1% loss per trade to simulate slippage. We’ll be requiring that trades be done in batches of 100. All told, we end up with the following code (read earlier articles for better explanations):

if (!require("quantstrat")) {
  install.packages("quantstrat", repos="http://R-Forge.R-project.org")
  library(quantstrat)
}

sigCrossover2 <- function(label, data = mktdata, columns,
                          relationship = c("gt", "lt", "eq", "gte", "lte"),
                          offset1 = 0, offset2 = 0) {
  # A wrapper for sigCrossover, exhibiting the same behavior except returning
  # an object containing TRUE/FALSE instead of TRUE/NA
  res <- sigCrossover(label = label, data = data, columns = columns,
                      relationship = relationship, offset1 = offset1,
                      offset2 = offset2)
  res[is.na(res)] <- FALSE
  return(res)
}

# Based on Ilya Kipnis's osMaxDollar(); lots of recycled code
osMaxDollarBatch = function(data, timestamp, orderqty, ordertype, orderside,
                               portfolio, symbol, prefer = "Open", tradeSize,
                               maxSize, batchSize = 100, integerQty = TRUE,
                              ...) {
  # An order sizing function that limits position size based on dollar value of
  # the position, optionally controlling for number of batches to purchase
  #
  # Args:
  #   data: ??? (held over from Mr. Kipnis's original osMaxDollar function)
  #   timestamp: The current date being evaluated (some object, like a string,
  #              from which time can be inferred)
  #   orderqty: ??? (held over from Mr. Kipnis's original osMaxDollar function)
  #   ordertype: ??? (held over from Mr. Kipnis's original osMaxDollar
  #              function)
  #   orderside: ??? (held over from Mr. Kipnis's original osMaxDollar
  #              function)
  #   portfolio: A string representing the portfolio being treated; will be
  #              passed to getPosQty
  #   symbol: A string representing the symbol being traded
  #   prefer: A string that indicates whether the Open or Closing price is
  #           used for determining the price of the asset in backtesting (set
  #           to "Close" to use the closing price)
  #   tradeSize: Numeric, indicating the dollar value to transact (using
  #              negative numbers for selling short)
  #   maxSize: Numeric, indicating the dollar limit to the position (use
  #            negative numbers for the short side)
  #   batchSize: The number of stocks purchased per batch (only applies if
  #              integerQty is TRUE); default value is 100, but setting to 1
  #              effectively nullifies the batchSize
  #   integerQty: A boolean indicating whether or not to truncate to the
  #               nearest integer of contracts/shares/etc.
  #   ...: ??? (held over from Mr. Kipnis's original osMaxDollar function)
  #
  # Returns:
  #   A numeric quantity representing the number of shares to purchase

  pos = getPosQty(portfolio, symbol, timestamp)
  if (prefer == "Close") {
    price = as.numeric(Cl(mktdata[timestamp, ]))
  } else {
    price = as.numeric(Op(mktdata[timestamp, ]))
  }
  posVal = pos * price
  if (orderside == "short") {
    dollarsToTransact = max(tradeSize, maxSize - posVal)
    if (dollarsToTransact > 0) {
      dollarsToTransact = 0
    }
  } else {
    dollarsToTransact = min(tradeSize, maxSize - posVal)
    if (dollarsToTransact < 0) {
      dollarsToTransact = 0
    }
  }
  qty = dollarsToTransact/price
  if (integerQty) {
    # Controlling for batch size only makes sense if we were working with
    # integer quantities anyway; if we didn't care about being integers,
    # why bother?
    qty = trunc(qty / batchSize) * batchSize
  }
  return(qty)
}

fee <- function(TxnQty, TxnPrice, Symbol) {
  # A function for computing a transaction fee that is 2% of total value of
  # transaction
  #
  # Args:
  #   TxnQty: Numeric for number of shares being traded
  #   TxnPrice: Numeric for price per share
  #   Symbol: The symbol being traded (not used here, but will be passed)
  #
  # Returns:
  #   The fee to be applied

  return(-0.001 * abs(TxnQty * TxnPrice) - 5)
}

start <- as.Date("2010-01-01")
end <- as.Date("2016-10-01")

rm(list = ls(.blotter), envir = .blotter)  # Clear blotter environment
currency("USD")  # Currency being used
Sys.setenv(TZ = "MDT")  # Allows quantstrat to use timestamps
initDate <- "2009-12-31"  # A date prior to first close price; needed (why?)

# Get new symbols
symbols <- c("AAPL", "MSFT", "GOOG", "FB", "TWTR", "NFLX", "AMZN", "YHOO",
             "SNY", "NTDOY", "IBM", "HPQ")
getSymbols(Symbols = symbols, src = "http://feedproxy.google.com/yahoo", from = start, to = end,
           adjust = TRUE)  # The last argument tells getSymbols to use adjusted
                           # prices

stock(symbols, currency = "USD", multiplier = 1)

strategy_st <- "SMAC-20-50_STRAT"
portfolio_st <- "SMAC-20-50_PORTF"
account_st <- "SMAC-20-50_ACCT"
rm.strat(portfolio_st)
rm.strat(strategy_st)
initPortf(portfolio_st, symbols = symbols,
          initDate = initDate, currency = "USD")
initAcct(account_st, portfolios = portfolio_st,
         initDate = initDate, currency = "USD",
         initEq = 100000)
initOrders(portfolio_st, store = TRUE)

strategy(strategy_st, store = TRUE)

add.indicator(strategy = strategy_st, name = "SMA",
              arguments = list(x = quote(Cl(mktdata)),
                               n = 20),
              label = "fastMA")
add.indicator(strategy = strategy_st, name = "SMA",
              arguments = list(x = quote(Cl(mktdata)),
                               n = 50),
              label = "slowMA")

add.signal(strategy = strategy_st, name = "sigCrossover2",  # Remember me?
           arguments = list(columns = c("fastMA", "slowMA"),
                            relationship = "gt"),
           label = "bull")
add.signal(strategy = strategy_st, name = "sigCrossover2",
           arguments = list(columns = c("fastMA", "slowMA"),
                            relationship = "lt"),
           label = "bear")

Now I add the stop-loss order. This is done by adding a new rule. First, note that I start by adding the same rules I had before. Then, I add a rule labeled stop_loss (the name of the label has no special meaning; it’s useful to the programmer, not to blotter) with order type "stoplimit" that

add.rule(strategy = strategy_st, name = "ruleSignal",
         arguments = list(sigcol = "bull",
                          sigval = TRUE,
                          ordertype = "market",
                          orderside = "long",
                          replace = FALSE,
                          TxnFees = "fee",
                          prefer = "Open",
                          osFUN = osMaxDollarBatch,
                          maxSize = quote(floor(getEndEq(account_st,
                                                   Date = timestamp) * .1)),
                          tradeSize = quote(floor(getEndEq(account_st,
                                                   Date = timestamp) * .1)),
                          batchSize = 100),
         type = "enter", path.dep = TRUE, label = "buy")
add.rule(strategy = strategy_st, name = "ruleSignal",
         arguments = list(sigcol = "bear",
                          sigval = TRUE,
                          orderqty = "all",
                          ordertype = "market",
                          orderside = "long",
                          replace = FALSE,
                          TxnFees = "fee",
                          prefer = "Open"),
         type = "exit", path.dep = TRUE, label = "sell")

# Now for setting a stop-loss
add.rule(strategy = strategy_st, name = "ruleSignal",
         arguments = list(sigcol = "bull",
                          sigval = TRUE,
                          replace = FALSE,
                          orderside = "long",
                          TxnFees = "fee",
                          # Now we set up a stop-limit order
                          ordertype = "stoplimit",  # Order type is stop-limit
                          orderqty = "all",  # Clear all stocks
                          #osFUN = osMaxPos,  # Clear ALL stocks
                          tmult = TRUE,  # Forces threshold to be a percent
                                         # multiplier of the price
                          threshold = 0.01),  # With tmult set to TRUE, we are
                                              # saying, in effect, that if the
                                              # price drops below 1% of what it
                                              # was when we opened the position
                                              # then sell all stocks to close
                                              # the position
         type = "chain", parent = "buy", label = "stop_loss")

# Having set up the strategy, we now backtest
applyStrategy(strategy_st, portfolios = portfolio_st)

Let’s now look at the results:

updatePortf(portfolio_st)
dateRange <- time(getPortfolio(portfolio_st)$summary)[-1]
updateAcct(account_st, dateRange)
updateEndEq(account_st)
tStats <- tradeStats(Portfolios = portfolio_st, use="trades",
                     inclZeroDays = FALSE)
tStats[, 4:ncol(tStats)] <- round(tStats[, 4:ncol(tStats)], 2)
print(data.frame(t(tStats[, -c(1,2)])))
##                        AAPL       FB      HPQ     MSFT     NFLX    NTDOY
## Num.Txns              29.00    20.00    30.00    40.00    30.00    38.00
## Num.Trades            15.00    10.00    15.00    20.00    15.00    19.00
## Net.Trading.PL      4298.51 -2422.29 -1588.25   646.16  1049.23 -2548.65
## Avg.Trade.PL         311.17  -216.36   -77.40    59.31    96.16  -106.86
## Med.Trade.PL         -73.68  -155.00   -93.94   -89.85   -70.24  -100.31
## Largest.Winner      2226.34     0.00   457.03  1730.06  2227.61   975.16
## Largest.Loser       -182.85  -506.09  -366.77  -236.43  -411.43  -581.40
## Gross.Profits       5676.98     0.00   471.52  2827.93  2814.00  1067.00
## Gross.Losses       -1009.48 -2163.57 -1632.49 -1641.79 -1371.53 -3097.30
## Std.Dev.Trade.PL     813.66   170.03   166.41   470.98   620.71   318.32
## Percent.Positive      26.67     0.00     6.67    15.00    20.00    10.53
## Percent.Negative      73.33   100.00    93.33    85.00    80.00    89.47
## Profit.Factor          5.62     0.00     0.29     1.72     2.05     0.34
## Avg.Win.Trade       1419.24      NaN   471.52   942.64   938.00   533.50
## Med.Win.Trade       1672.59       NA   471.52  1025.28   486.86   533.50
## Avg.Losing.Trade     -91.77  -216.36  -116.61   -96.58  -114.29  -182.19
## Med.Losing.Trade     -74.72  -155.00   -94.03   -92.06   -86.53  -105.00
## Avg.Daily.PL         200.98  -229.18   -91.60    45.78    83.01  -120.45
## Med.Daily.PL         -86.29  -168.91  -108.24  -103.75   -82.19  -115.24
## Std.Dev.Daily.PL     747.88   170.59   166.30   470.24   619.83   317.84
## Ann.Sharpe             4.27   -21.33    -8.74     1.55     2.13    -6.02
## Max.Drawdown       -1213.74 -2438.98 -2089.63 -1262.90 -2406.00 -5441.27
## Profit.To.Max.Draw     3.54    -0.99    -0.76     0.51     0.44    -0.47
## Avg.WinLoss.Ratio     15.47      NaN     4.04     9.76     8.21     2.93
## Med.WinLoss.Ratio     22.38       NA     5.01    11.14     5.63     5.08
## Max.Equity          4550.51     0.00   501.38  1828.40  3091.81  2892.62
## Min.Equity          -588.23 -2438.98 -1588.25  -892.32  -672.04 -2548.65
## End.Equity          4298.51 -2422.29 -1588.25   646.16  1049.23 -2548.65
##                         SNY     TWTR     YHOO
## Num.Txns              44.00    10.00    44.00
## Num.Trades            22.00     5.00    22.00
## Net.Trading.PL     -2580.59  -812.73 -1687.53
## Avg.Trade.PL         -89.88  -135.57   -49.90
## Med.Trade.PL        -108.91   -88.38   -93.06
## Largest.Winner       771.70     0.00  1119.31
## Largest.Loser       -622.70  -348.17  -242.21
## Gross.Profits       1016.64     0.00  1134.00
## Gross.Losses       -2993.91  -677.84 -2231.74
## Std.Dev.Trade.PL     240.15   111.79   267.78
## Percent.Positive      13.64     0.00     4.55
## Percent.Negative      86.36   100.00    95.45
## Profit.Factor          0.34     0.00     0.51
## Avg.Win.Trade        338.88      NaN  1134.00
## Med.Win.Trade        145.86       NA  1134.00
## Avg.Losing.Trade    -157.57  -135.57  -106.27
## Med.Losing.Trade    -137.44   -88.38   -93.48
## Avg.Daily.PL        -103.54  -148.99   -63.28
## Med.Daily.PL        -123.11  -102.13  -107.27
## Std.Dev.Daily.PL     240.26   111.71   267.52
## Ann.Sharpe            -6.84   -21.17    -3.75
## Max.Drawdown       -3061.63 -3726.33 -3225.56
## Profit.To.Max.Draw    -0.84    -0.22    -0.52
## Avg.WinLoss.Ratio      2.15      NaN    10.67
## Med.WinLoss.Ratio      1.06       NA    12.13
## Max.Equity           472.26  2913.60  1538.03
## Min.Equity         -2589.37  -812.73 -1687.53
## End.Equity         -2580.59  -812.73 -1687.53
final_acct <- getAccount(account_st)
plot(final_acct$summary$End.Eq["2010/2016"], main = "Portfolio Equity")

Eesh. Not good. The fees were not helping, but they’re relatively small in this context, so the effect we see is likely thanks to the stop-loss orders. Positions are getting closed out before being given a chance to be profitable, locking in losses.

As mentioned before, there may be a better way to pick the stop-loss than simply saying, “if the value drops below 1%, exit the position.” I won’t discuss this now.

But let’s turn this awful rule off, and see what happens when we don’t use this stop-loss (you’ve now seen a new rule in the toolbox, enable.rule()):

# Disable the stop-loss rule
enable.rule(strategy_st, type = "chain", label = "stop_loss", enabled = FALSE)
# Clear the portfolio and account, and start over
rm.strat(portfolio_st)
rm.strat(account_st)
initPortf(portfolio_st, symbols = symbols,
          initDate = initDate, currency = "USD")
initAcct(account_st, portfolios = portfolio_st,
         initDate = initDate, currency = "USD",
         initEq = 100000)
initOrders(portfolio_st, store = TRUE)

# Retry the strategy
applyStrategy(strategy_st, portfolios = portfolio_st)

updatePortf(portfolio_st)
dateRange <- time(getPortfolio(portfolio_st)$summary)[-1]
updateAcct(account_st, dateRange)
updateEndEq(account_st)
tStats <- tradeStats(Portfolios = portfolio_st, use="trades",
                     inclZeroDays = FALSE)
tStats[, 4:ncol(tStats)] <- round(tStats[, 4:ncol(tStats)], 2)
print(data.frame(t(tStats[, -c(1,2)])))
##                        AAPL       FB      HPQ     MSFT     NFLX    NTDOY
## Num.Txns              29.00    20.00    29.00    39.00    29.00    37.00
## Num.Trades            15.00    10.00    15.00    20.00    15.00    19.00
## Net.Trading.PL      7696.42  8232.62  4731.16  1954.50 21583.36  1836.17
## Avg.Trade.PL         537.92   850.20   343.26   124.25  1465.49   123.37
## Med.Trade.PL         167.09   509.00   -82.40  -203.68   250.00  -140.00
## Largest.Winner      2226.34  6052.90  4974.94  1730.06 14571.00  4264.69
## Largest.Loser       -550.58 -1485.24 -1433.67 -1046.06 -2746.44  -785.14
## Gross.Profits       9450.71 10682.00 11623.43  7785.50 26293.29  7485.00
## Gross.Losses       -1381.90 -2180.00 -6474.56 -5300.49 -4311.00 -5141.00
## Std.Dev.Trade.PL     895.23  2025.78  1917.71   792.77  3965.07  1157.29
## Percent.Positive      60.00    70.00    40.00    45.00    66.67    42.11
## Percent.Negative      40.00    30.00    60.00    55.00    33.33    57.89
## Profit.Factor          6.84     4.90     1.80     1.47     6.10     1.46
## Avg.Win.Trade       1050.08  1526.00  1937.24   865.06  2629.33   935.62
## Med.Win.Trade        928.68   881.00   965.61   772.12  1088.36   240.00
## Avg.Losing.Trade    -230.32  -726.67  -719.40  -481.86  -862.20  -467.36
## Med.Losing.Trade    -144.28  -432.00  -822.36  -426.43  -355.71  -540.00
## Avg.Daily.PL         443.68   836.31   248.98    96.08  1537.80    40.63
## Med.Daily.PL         116.79   496.63  -291.88  -244.54   223.41  -161.50
## Std.Dev.Daily.PL     869.23  2023.81  1962.03   810.87  4095.34  1148.46
## Ann.Sharpe             8.10     6.56     2.01     1.88     5.96     0.56
## Max.Drawdown       -1380.74 -3840.91 -5105.22 -2847.65 -5136.97 -4923.73
## Profit.To.Max.Draw     5.57     2.14     0.93     0.69     4.20     0.37
## Avg.WinLoss.Ratio      4.56     2.10     2.69     1.80     3.05     2.00
## Med.WinLoss.Ratio      6.44     2.04     1.17     1.81     3.06     0.44
## Max.Equity          7948.42  8859.91  4990.56  3363.71 24293.63  4104.90
## Min.Equity          -852.19 -1349.93 -5105.22 -1915.50  -321.11 -3042.10
## End.Equity          7696.42  8232.62  4731.16  1954.50 21583.36  1836.17
##                         SNY     TWTR     YHOO
## Num.Txns              44.00     9.00    43.00
## Num.Trades            22.00     5.00    22.00
## Net.Trading.PL     -2804.64  2672.47  8541.60
## Avg.Trade.PL        -100.07   558.40   414.91
## Med.Trade.PL          32.83    -2.00  -111.00
## Largest.Winner      1423.13   136.69  5278.10
## Largest.Loser      -1638.89  -954.90  -623.00
## Gross.Profits       5610.84  4236.00 13265.00
## Gross.Losses       -7812.38 -1444.00 -4137.00
## Std.Dev.Trade.PL     781.97  2018.43  1370.80
## Percent.Positive      54.55    40.00    31.82
## Percent.Negative      45.45    60.00    68.18
## Profit.Factor          0.72     2.93     3.21
## Avg.Win.Trade        467.57  2118.00  1895.00
## Med.Win.Trade        293.30  2118.00  1134.00
## Avg.Losing.Trade    -781.24  -481.33  -275.80
## Med.Losing.Trade    -669.52  -500.00  -258.00
## Avg.Daily.PL        -113.73  -336.44   369.19
## Med.Daily.PL          19.11  -263.77  -128.76
## Std.Dev.Daily.PL     781.34   497.04  1394.80
## Ann.Sharpe            -2.31   -10.75     4.20
## Max.Drawdown       -3471.19 -4840.52 -2825.76
## Profit.To.Max.Draw    -0.81     0.55     3.02
## Avg.WinLoss.Ratio      0.60     4.40     6.87
## Med.WinLoss.Ratio      0.44     4.24     4.40
## Max.Equity           666.55  3144.99  8863.60
## Min.Equity         -3249.87 -1695.53 -1302.96
## End.Equity         -2804.64  2672.47  8541.60
final_acct <- getAccount(account_st)
plot(final_acct$summary$End.Eq["2010/2016"], main = "Portfolio Equity")

That’s much better.

Parameter Optimization

What makes a 20-day moving average special? What about a 50-day moving average? These numbers were chosen arbitrarily, so there’s no reason to believe that they will produce the best results. So how about we try different combinations of windows for the moving averages used in the moving averace crossover strategy, and pick the combination that obtains the best results?

quantstrat easily implements this behavior. We can keep the strategy we’ve already defined, but we will be varying the parameters used in the moving averages, ‘n’ in particular. This can be accomplished by adding a distribution to the parameters of interest in the strategy, via the function add.distribution().

# Possible windows for the fast moving average
add.distribution(strategy_st,  # The strategy being optimized
                 paramset.label = "MA",  # Some label to identify the parameter
                                         # set being optimized
                 component.type = "indicator",  # The component type we're
                                                # optimizing, in this case an
                                                # indicator
                 component.label = "fastMA",  # The name of the indicator we
                                              # are optimizing
                 variable = list(n = 5 * 1:10),  # A list with the name of the
                                                 # variable we are optimizing
                                                 # along with a vector of
                                                 # allowed values
                 label = "nFAST")  # The label for the distribution

# Possible windows for the slow moving average
add.distribution(strategy_st, paramset.label = "MA",
                 component.type = "indicator", component.label = "slowMA",
                 variable = list(n = 25 * 1:10), label = "nSLOW")

We need to add a constraint to the parameter values since we do not want the window size of the fast moving average to exceed the window size of the slow moving average; this would not make sense. This constraint can be added via add.distribution.constraint():

add.distribution.constraint(strategy_st,  # The name of strategy to apply the
                                          # constraint to
                            paramset.label = "MA",  # The name of the parameter
                                                    # set (which was defined
                                                    # when we created the
                                                    # distributions) containing
                                                    # the parameters being
                                                    # optimized
                            distribution.label.1 = "nFAST",  # First parameter
                                                             # involved in the
                                                             # constraint
                            distribution.label.2 = "nSLOW",  # Second parameter
                            operator = "<",  # The operator for the relation
                                             # that defines the constraint;
                                             # translate this to mean,
                                             # nFAST < nSLOW
                            label = "MA.Constraint")  # The label for the
                                                      # constraint

Having given distributions, we now optimize. The function apply.paramset() repeatedly applies the strategy with different combinations of the parameters in the parameter set. The parameter nsamples controls how many combinations to try, and the combinations actually chosen are random (and presumably don’t repeat). If this is zero, all combinations will be chosen. This is fine if there are few parameters to try, but otherwise, beware; the time spent trying out different combinations could explode.

This step is naturally computationally intense. A single backtest can take some time, so just imagine how much time it would take to complete 40! One way to speed up the process is to parallellize it. quantstrat supports parallellization, and implementing it is as simple as loading the package doParallel and registering cores to be used in the parallelization; apply.paramset() already uses %dopar% and foreach() from the foreach package. I don’t show this here, just because there would be no advantage to it with the system I’m currently using.

We will try 40 different combinations here.

set.seed(2017052319)  # Set seed for reproducibility
results <- apply.paramset(strategy_st,  # Strategy to optimize
                          paramset.label = "MA",  # The parameter set to
                                                  # optimize
                          portfolio.st = portfolio_st,  # The portfolio to use;
                                                        # copies will be made
                                                        # in the blotter
                                                        # environment with
                                                        # their own names so
                                                        # their statistics can
                                                        # be analyzed
                          account.st = account_st,  # The name of the account
                                                    # to initialize with
                          nsamples = 40)  # The number of combinations of
                                          # parameters to try

results is a list containing some summary statistics for each backtest performed. Below I show its structure:

# Global view of the list
names(results)
##  [1] "SMAC-20-50_PORTF.5"  "tradeStats"          "SMAC-20-50_PORTF.14"
##  [4] "SMAC-20-50_PORTF.34" "SMAC-20-50_PORTF.35" "SMAC-20-50_PORTF.55"
##  [7] "SMAC-20-50_PORTF.85" "SMAC-20-50_PORTF.3"  "SMAC-20-50_PORTF.26"
## [10] "SMAC-20-50_PORTF.56" "SMAC-20-50_PORTF.66" "SMAC-20-50_PORTF.76"
## [13] "SMAC-20-50_PORTF.8"  "SMAC-20-50_PORTF.37" "SMAC-20-50_PORTF.67"
## [16] "SMAC-20-50_PORTF.87" "SMAC-20-50_PORTF.28" "SMAC-20-50_PORTF.58"
## [19] "SMAC-20-50_PORTF.78" "SMAC-20-50_PORTF.88" "SMAC-20-50_PORTF.10"
## [22] "SMAC-20-50_PORTF.19" "SMAC-20-50_PORTF.29" "SMAC-20-50_PORTF.59"
## [25] "SMAC-20-50_PORTF.79" "SMAC-20-50_PORTF.60" "SMAC-20-50_PORTF.80"
## [28] "SMAC-20-50_PORTF.90" "SMAC-20-50_PORTF.21" "SMAC-20-50_PORTF.31"
## [31] "SMAC-20-50_PORTF.41" "SMAC-20-50_PORTF.61" "SMAC-20-50_PORTF.71"
## [34] "SMAC-20-50_PORTF.13" "SMAC-20-50_PORTF.32" "SMAC-20-50_PORTF.92"
## [37] "SMAC-20-50_PORTF.33" "SMAC-20-50_PORTF.43" "SMAC-20-50_PORTF.73"
## [40] "SMAC-20-50_PORTF.83" "SMAC-20-50_PORTF.93"
# Seeing the structure of one element
str(results[[10]])
## List of 3
##  $ param.combo :'data.frame':  1 obs. of  2 variables:
##   ..$ nFAST: num 15
##   ..$ nSLOW: num 175
##   ..- attr(*, "out.attrs")=List of 2
##   .. ..$ dim     : Named int [1:2] 10 10
##   .. .. ..- attr(*, "names")= chr [1:2] "nFAST" "nSLOW"
##   .. ..$ dimnames:List of 2
##   .. .. ..$ nFAST: chr [1:10] "nFAST= 5" "nFAST=10" "nFAST=15" "nFAST=20" ...
##   .. .. ..$ nSLOW: chr [1:10] "nSLOW= 25" "nSLOW= 50" "nSLOW= 75" "nSLOW=100" ...
##  $ portfolio.st: chr "SMAC-20-50_PORTF.56"
##  $ tradeStats  :'data.frame':  8 obs. of  30 variables:
##   ..$ Portfolio         : Factor w/ 1 level "SMAC-20-50_PORTF.56": 1 1 1 1 1 1 1 1
##   ..$ Symbol            : Factor w/ 8 levels "AAPL","HPQ","MSFT",..: 1 2 3 4 5 6 7 8
##   ..$ Num.Txns          : num [1:8] 7 13 19 7 17 26 5 15
##   ..$ Num.Trades        : int [1:8] 3 6 9 3 8 13 2 7
##   ..$ Net.Trading.PL    : num [1:8] 13829 7186 5425 33271 -1341 ...
##   ..$ Avg.Trade.PL      : num [1:8] 4382 715 577 11134 -433 ...
##   ..$ Med.Trade.PL      : num [1:8] 4921 -970 -436 5894 -905 ...
##   ..$ Largest.Winner    : num [1:8] 9399 5370 5663 28419 3965 ...
##   ..$ Largest.Loser     : num [1:8] -1174 -1848 -912 -910 -2037 ...
##   ..$ Gross.Profits     : num [1:8] 14321 9250 8748 34313 3988 ...
##   ..$ Gross.Losses      : num [1:8] -1174 -4960 -3559 -910 -7453 ...
##   ..$ Std.Dev.Trade.PL  : num [1:8] 5307 3082 2128 15350 1922 ...
##   ..$ Percent.Positive  : num [1:8] 66.7 33.3 33.3 66.7 25 ...
##   ..$ Percent.Negative  : num [1:8] 33.3 66.7 66.7 33.3 75 ...
##   ..$ Profit.Factor     : num [1:8] 12.195 1.865 2.458 37.72 0.535 ...
##   ..$ Avg.Win.Trade     : num [1:8] 7160 4625 2916 17157 1994 ...
##   ..$ Med.Win.Trade     : num [1:8] 7160 4625 1611 17157 1994 ...
##   ..$ Avg.Losing.Trade  : num [1:8] -1174 -1240 -593 -910 -1242 ...
##   ..$ Med.Losing.Trade  : num [1:8] -1174 -1079 -591 -910 -1192 ...
##   ..$ Avg.Daily.PL      : num [1:8] 4382 715 577 11134 -433 ...
##   ..$ Med.Daily.PL      : num [1:8] 4921 -970 -436 5894 -905 ...
##   ..$ Std.Dev.Daily.PL  : num [1:8] 5307 3082 2128 15350 1922 ...
##   ..$ Ann.Sharpe        : num [1:8] 13.11 3.68 4.3 11.51 -3.58 ...
##   ..$ Max.Drawdown      : num [1:8] -5505 -8591 -5372 -16197 -8023 ...
##   ..$ Profit.To.Max.Draw: num [1:8] 2.512 0.836 1.01 2.054 -0.167 ...
##   ..$ Avg.WinLoss.Ratio : num [1:8] 6.1 3.73 4.92 18.86 1.61 ...
##   ..$ Med.WinLoss.Ratio : num [1:8] 6.1 4.29 2.73 18.86 1.67 ...
##   ..$ Max.Equity        : num [1:8] 17334 11288 7986 42611 2058 ...
##   ..$ Min.Equity        : num [1:8] -96.2 -2155.7 -2455.7 -701.2 -7623.2 ...
##   ..$ End.Equity        : num [1:8] 13829 7186 5425 33271 -1341 ...

There is an entry in results called tradeStats. This is a combined array of all trade statistics for all portfolios. I preview it below:

head(results$tradeStats)
##   nFAST nSLOW          Portfolio Symbol Num.Txns Num.Trades Net.Trading.PL
## 1     5    50 SMAC-20-50_PORTF.5   AAPL       61         30      2855.0673
## 2     5    50 SMAC-20-50_PORTF.5     FB       39         19     11345.9006
## 3     5    50 SMAC-20-50_PORTF.5    HPQ       37         18      7894.2676
## 4     5    50 SMAC-20-50_PORTF.5    IBM        4          2       708.6119
## 5     5    50 SMAC-20-50_PORTF.5   MSFT       65         32      2823.2581
## 6     5    50 SMAC-20-50_PORTF.5   NFLX       45         22     28273.6792
##   Avg.Trade.PL Med.Trade.PL Largest.Winner Largest.Loser Gross.Profits
## 1     53.47184    -227.9242       2059.360     -968.2612      9048.215
## 2    560.41601     -84.9241       8080.600    -1534.9605     17062.586
## 3    273.62865    -210.1213       7138.418    -1353.8012     12826.443
## 4    372.62668     372.6267       1268.772     -523.5185      1268.772
## 5     63.63691    -184.3744       1846.308    -1215.2486     10221.590
## 6   1286.11249    -139.8422      15127.305    -3202.4340     39031.541
##   Gross.Losses Std.Dev.Trade.PL Percent.Positive Percent.Negative
## 1   -7444.0601         747.8161         33.33333         66.66667
## 2   -6414.6823        2139.0063         47.36842         52.63158
## 3   -7901.1269        2080.1697         33.33333         66.66667
## 4    -523.5185        1267.3407         50.00000         50.00000
## 5   -8185.2090         777.3225         34.37500         65.62500
## 6  -10737.0660        4016.0092         45.45455         54.54545
##   Profit.Factor Avg.Win.Trade Med.Win.Trade Avg.Losing.Trade
## 1      1.215495      904.8215      752.8397        -372.2030
## 2      2.659927     1895.8429      908.8868        -641.4682
## 3      1.623369     2137.7404      857.1655        -658.4272
## 4      2.423547     1268.7719     1268.7719        -523.5185
## 5      1.248788      929.2355     1059.7691        -389.7719
## 6      3.635215     3903.1541     1987.2268        -894.7555
##   Med.Losing.Trade Avg.Daily.PL Med.Daily.PL Std.Dev.Daily.PL Ann.Sharpe
## 1        -310.2304     53.47184    -227.9242         747.8161   1.135091
## 2        -493.6332    560.41601     -84.9241        2139.0063   4.159094
## 3        -600.3886    273.62865    -210.1213        2080.1697   2.088157
## 4        -523.5185    372.62668     372.6267        1267.3407   4.667462
## 5        -366.3162     63.63691    -184.3744         777.3225   1.299595
## 6        -694.1988   1286.11249    -139.8422        4016.0092   5.083754
##   Max.Drawdown Profit.To.Max.Draw Avg.WinLoss.Ratio Med.WinLoss.Ratio
## 1   -2921.3607          0.9773074          2.430989          2.426712
## 2   -3311.4004          3.4263149          2.955474          1.841219
## 3   -4740.7357          1.6651988          3.246738          1.427685
## 4    -837.6125          0.8459901          2.423547          2.423547
## 5   -2964.6203          0.9523169          2.384050          2.893045
## 6   -7533.7804          3.7529205          4.362258          2.862619
##   Max.Equity  Min.Equity End.Equity
## 1   4086.341 -1292.90199  2855.0673
## 2  12361.105 -2716.54800 11345.9006
## 3   7894.268 -4336.38823  7894.2676
## 4   1343.862  -767.35374   708.6119
## 5   2978.699 -2371.47934  2823.2581
## 6  34903.375   -66.57307 28273.6792

Let’s see which combination did best. The last column of each tradeStats data frame in the list is End.Equity. It contains the total profit/loss for each stock tried in the portfolio. If we sum these we get the portfolio’s total profit/loss. Below I create a data frame containing the combination of parameters and their portfolio’s final profit.

library(dplyr)

(profit_dat <- results$tradeStats %>%
  select(nFAST, nSLOW, Portfolio, End.Equity) %>%
  group_by(Portfolio) %>%
  summarize(Fast = mean(nFAST),
            Delta = mean(nSLOW - nFAST),
            Profit = sum(End.Equity)) %>%
  select(Fast, Delta, Profit) %>%
  arrange(desc(Profit)))
## # A tibble: 40 × 3
##     Fast Delta   Profit
##         
## 1     20   105 99179.34
## 2      5   120 93345.49
## 3      5    70 88892.14
## 4     15    85 86325.87
## 5     10   115 83710.54
## 6     50    75 82715.10
## 7     40    85 81516.30
## 8     30    70 80884.53
## 9     30    20 80319.89
## 10    40    35 76679.24
## # ... with 30 more rows
plot(Profit ~ Fast, data = profit_dat, main = "Profit vs. Fast MA Window")

plot(Profit ~ I(Fast + Delta), data = profit_dat,
     main = "Profit vs. Slow MA Window", xlab = "Slow")

Profit appears to be maximized with the fast moving average’s window is 20 days (4 weeks) and the slow moving average’s window is 125 days (25 weeks), which can be nicely interpreted as a monthly and a biannual period, respectively. With these periods, we manage to double the profit we had.

We have not searched the entire parameter space. One idea I’ve had to address this without actually trying every combination (a computationally intense task) is to fit a linear model to the profit, depending on the parameters of interest. I want to fit a model of the form:

text{profit} = alpha_1 (text{fast} - beta_1)^2 + alpha_2 (text{delta} - beta_2)^2 + gamma

Why this model? Convenience; if both alpha_1 and alpha_2 are negative, there will be a unique global maximum and, thus, a combination of fast and slow moving averages that will maximize profit. This is the simplest model with that guarantee. It translates to a linear function of the form:

text{profit}_i = beta_0 +beta_1 text{fast}_i + beta_2 text{fast}_i^2 + beta_3 text{delta}_i + beta_4 text{delta}_i^2 + epsilon_i

We will compute the least-squares fit with lm():

fit |t|)    
## (Intercept) 75734.7811  7909.1511   9.576  2.6e-11 ***
## Fast         -277.3674   493.5298  -0.562   0.5777    
## I(Fast^2)      -0.8506     8.4920  -0.100   0.9208    
## Delta         162.8430    95.5322   1.705   0.0971 .  
## I(Delta^2)     -0.9096     0.3820  -2.381   0.0228 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9548 on 35 degrees of freedom
## Multiple R-squared:  0.3694, Adjusted R-squared:  0.2974 
## F-statistic: 5.127 on 4 and 35 DF,  p-value: 0.002334

Take partial derivatives of the above fit with respect to both text{fast}_i and text{delta}_i and set it equal to zero to conclude that the critical point is at:

(text{fast}^*, text{delta}^*) = (-frac{beta_1}{2 beta_2}, -frac{beta_3}{2 beta_4})

Whether this is a minimum or maximum depends on beta_2 and beta_4. If these both are positive, the critical point is a global minimum, and if both are negative, the critical point is a global maximum. Unfortunately, while we have a global maximum, it is around (-163, 90), which is nonsense, so we’ll replace -163 with the smallest possible value for the fast moving average considered: 5. This can be interpreted as a moving average computed over a week, and the slow moving average is 95 days, which roughly corresponds to a quarterly cycle.

Here we try out the optimized strategy.

strategy_st_opt <- "SMAC-1-90_Strat"
rm.strat(strategy_st_opt)

strategy(strategy_st_opt, store = TRUE)

add.indicator(strategy = strategy_st_opt, name = "SMA",
              arguments = list(x = quote(Cl(mktdata)),
                               n = 5),
              label = "fastMA")
add.indicator(strategy = strategy_st_opt, name = "SMA",
              arguments = list(x = quote(Cl(mktdata)),
                               n = 95),
              label = "slowMA")

add.signal(strategy = strategy_st_opt, name = "sigCrossover2",  # Remember me?
           arguments = list(columns = c("fastMA", "slowMA"),
                            relationship = "gt"),
           label = "bull")
add.signal(strategy = strategy_st_opt, name = "sigCrossover2",
           arguments = list(columns = c("fastMA", "slowMA"),
                            relationship = "lt"),
           label = "bear")
add.rule(strategy = strategy_st_opt, name = "ruleSignal",
         arguments = list(sigcol = "bull",
                          sigval = TRUE,
                          ordertype = "market",
                          orderside = "long",
                          replace = FALSE,
                          TxnFees = "fee",
                          prefer = "Open",
                          osFUN = osMaxDollarBatch,
                          maxSize = quote(floor(getEndEq(account_st,
                                                   Date = timestamp) * .1)),
                          tradeSize = quote(floor(getEndEq(account_st,
                                                   Date = timestamp) * .1)),
                          batchSize = 100),
         type = "enter", path.dep = TRUE, label = "buy")
add.rule(strategy = strategy_st_opt, name = "ruleSignal",
         arguments = list(sigcol = "bear",
                          sigval = TRUE,
                          orderqty = "all",
                          ordertype = "market",
                          orderside = "long",
                          replace = FALSE,
                          TxnFees = "fee",
                          prefer = "Open"),
         type = "exit", path.dep = TRUE, label = "sell")

rm.strat(portfolio_st)
rm.strat(account_st)
initPortf(portfolio_st, symbols = symbols,
          initDate = initDate, currency = "USD")
initAcct(account_st, portfolios = portfolio_st,
         initDate = initDate, currency = "USD",
         initEq = 100000)
initOrders(portfolio_st, store = TRUE)

# Retry the strategy
applyStrategy(strategy_st_opt, portfolios = portfolio_st)

updatePortf(portfolio_st)
dateRange <- time(getPortfolio(portfolio_st)$summary)[-1]
updateAcct(account_st, dateRange)
updateEndEq(account_st)
tStats <- tradeStats(Portfolios = portfolio_st, use="trades",
                     inclZeroDays = FALSE)
tStats[, 4:ncol(tStats)] <- round(tStats[, 4:ncol(tStats)], 2)
print(data.frame(t(tStats[, -c(1,2)])))
##                        AAPL       FB      HPQ     MSFT     NFLX    NTDOY
## Num.Txns              26.00    22.00    29.00    35.00    17.00    33.00
## Num.Trades            13.00    11.00    15.00    18.00     9.00    17.00
## Net.Trading.PL      8302.25 12501.64  7699.08  2741.82 51860.00 -1497.25
## Avg.Trade.PL         663.78  1163.36   540.86   178.38  5792.79   -61.12
## Med.Trade.PL        -122.34  -108.00   -53.17  -107.60   124.00  -615.00
## Largest.Winner      4539.65  9949.11  5148.97  2901.77 41689.39  5538.01
## Largest.Loser       -579.22  -448.06 -1547.47  -566.57 -2344.88 -1045.17
## Gross.Profits      10613.98 14085.00 13338.63  6965.03 56350.29  7150.00
## Gross.Losses       -1984.83 -1288.00 -5225.69 -3754.14 -4215.14 -8189.00
## Std.Dev.Trade.PL    1501.47  2988.05  2012.72   898.07 14014.85  1563.62
## Percent.Positive      46.15    45.45    33.33    33.33    55.56    11.76
## Percent.Negative      53.85    54.55    66.67    66.67    44.44    88.24
## Profit.Factor          5.35    10.94     2.55     1.86    13.37     0.87
## Avg.Win.Trade       1769.00  2817.00  2667.73  1160.84 11270.06  3575.00
## Med.Win.Trade       1260.55  1006.00  3443.84   818.41  3599.86  3575.00
## Avg.Losing.Trade    -283.55  -214.67  -522.57  -312.84 -1053.79  -545.93
## Med.Losing.Trade    -267.20  -175.00  -368.29  -340.92  -773.86  -660.00
## Avg.Daily.PL         650.88  1149.36   319.06   149.59  6452.24  -178.16
## Med.Daily.PL        -135.14  -120.62  -108.87  -121.90  -117.48  -650.87
## Std.Dev.Daily.PL    1500.18  2984.97  1913.27   922.20 14807.61  1552.16
## Ann.Sharpe             6.89     6.11     2.65     2.57     6.92    -1.82
## Max.Drawdown       -1847.42 -4584.00 -3892.16 -3093.52 -9422.66 -7200.49
## Profit.To.Max.Draw     4.49     2.73     1.98     0.89     5.50    -0.21
## Avg.WinLoss.Ratio      6.24    13.12     5.11     3.71    10.69     6.55
## Med.WinLoss.Ratio      4.72     5.75     9.35     2.40     4.65     5.42
## Max.Equity          9150.73 14511.62  8123.84  5297.22 56502.52  2515.24
## Min.Equity           -63.46     0.00 -1656.13  -701.82     0.00 -6039.29
## End.Equity          8302.25 12501.64  7699.08  2741.82 51860.00 -1497.25
##                         SNY     TWTR     YHOO
## Num.Txns              44.00     7.00    41.00
## Num.Trades            22.00     4.00    21.00
## Net.Trading.PL      2613.26  1340.73 11957.32
## Avg.Trade.PL         146.45   359.50   596.48
## Med.Trade.PL        -111.49  -499.00  -198.00
## Largest.Winner      4123.32     0.00 11649.93
## Largest.Loser       -672.60 -1644.84  -935.19
## Gross.Profits       6897.70  4068.00 18011.00
## Gross.Losses       -3675.80 -2630.00 -5485.00
## Std.Dev.Trade.PL     961.31  2531.05  2697.18
## Percent.Positive      36.36    25.00    38.10
## Percent.Negative      63.64    75.00    61.90
## Profit.Factor          1.88     1.55     3.28
## Avg.Win.Trade        862.21  4068.00  2251.38
## Med.Win.Trade        387.24  4068.00   603.00
## Avg.Losing.Trade    -262.56  -876.67  -421.92
## Med.Losing.Trade    -246.41  -612.00  -442.00
## Avg.Daily.PL         132.54  -889.98   456.37
## Med.Daily.PL        -125.51  -625.56  -214.82
## Std.Dev.Daily.PL     960.19   663.42  2700.19
## Ann.Sharpe             2.19   -21.30     2.68
## Max.Drawdown       -3238.67 -5173.34 -4777.13
## Profit.To.Max.Draw     0.81     0.26     2.50
## Avg.WinLoss.Ratio      3.28     4.64     5.34
## Med.WinLoss.Ratio      1.57     6.65     1.36
## Max.Equity          5489.53  2146.08 13225.13
## Min.Equity          -442.19 -3027.27 -2146.87
## End.Equity          2613.26  1340.73 11957.32
final_acct <- getAccount(account_st)
plot(final_acct$summary$End.Eq["2010/2016"], main = "Portfolio Equity")

The results look promising. Let’s compare to the performance of SPY:

getSymbols("SPY", from = start, to = end, adjust = TRUE)

plot(final_acct$summary$End.Eq["2010/2016"] / 100000,
     main = "Portfolio Equity", ylim = c(0.8, 2.5))
lines(SPY$SPY.Adjusted / SPY$SPY.Adjusted[[1]], col = "blue")

The strategy isn’t being beaten by SPY quite so badly as it was before, which is good news, and that’s after accounting for transaction costs (notice that we’ve kept those costs low; presumably we’ve found a very cheap brokerage service) and even a little slippage (that’s the 0.1% fee applied per transaction). So not bad. We can deploy this strategy and expect decent results.

Right?

Not so fast. We optimized the strategy using a data set, then evaluated its performance on the same data set. Any expert in statistics or econometrics or machine learning will protest this. Our approach may be leadng to overfitting, a phenomenon where a model describes the data it was trained on very well but does not generalize well to other data. I believe that my tactic of looking at a model obtained by maximizing a quadratic function may help combat this, but that’s just a hunch; we need to check the performance of the strategy on out-of-sample data (that is, data the strategy has never seen) in order to get a sense of how it would actually perform if deployed.

Notice that never did I look at stock data after October 2016, so why not see how the strategy would perform on more recent data?

start2 <- as.Date("2016-06-01")
end2 <- as.Date("2017-04-24")

getSymbols(Symbols = symbols, src = "http://feedproxy.google.com/yahoo", from = start2, to = end2,
           adjust = TRUE)

stock(symbols, currency = "USD", multiplier = 1)

rm.strat(portfolio_st)
rm.strat(account_st)
initPortf(account_st, symbols = symbols,
          initDate = initDate, currency = "USD")
initAcct(account_st, portfolios = account_st,
         initDate = initDate, currency = "USD",
         initEq = 100000)
initOrders(account_st, store = TRUE)

# Retry the strategy
applyStrategy(strategy_st_opt, portfolios = account_st)

updatePortf(account_st)
dateRange <- time(getPortfolio(account_st)$summary)[-1]
updateAcct(account_st, dateRange)
updateEndEq(account_st)
tStats <- tradeStats(Portfolios = account_st, use="trades",
                     inclZeroDays = FALSE)
tStats[, 4:ncol(tStats)] <- round(tStats[, 4:ncol(tStats)], 2)
print(data.frame(t(tStats[, -c(1,2)])))
##                        HPQ    NTDOY     SNY    TWTR
## Num.Txns              3.00     3.00    3.00    6.00
## Num.Trades            2.00     2.00    2.00    3.00
## Net.Trading.PL     2404.89 -1070.06  687.21 -701.57
## Avg.Trade.PL       1224.55  -514.50  363.00 -205.00
## Med.Trade.PL       1224.55  -514.50  363.00 -140.00
## Largest.Winner      491.99     0.00    0.00    0.00
## Largest.Loser       -14.84 -1599.88 -194.85 -454.30
## Gross.Profits      2449.10   558.00  908.00    0.00
## Gross.Losses          0.00 -1587.00 -182.00 -615.00
## Std.Dev.Trade.PL   1014.30  1516.74  770.75  210.18
## Percent.Positive    100.00    50.00   50.00    0.00
## Percent.Negative      0.00    50.00   50.00  100.00
## Profit.Factor           NA     0.35    4.99    0.00
## Avg.Win.Trade      1224.55   558.00  908.00     NaN
## Med.Win.Trade      1224.55   558.00  908.00      NA
## Avg.Losing.Trade       NaN -1587.00 -182.00 -205.00
## Med.Losing.Trade        NA -1587.00 -182.00 -140.00
## Avg.Daily.PL        491.99 -1599.88 -194.85 -219.33
## Med.Daily.PL        491.99 -1599.88 -194.85 -154.35
## Std.Dev.Daily.PL        NA       NA      NA  210.16
## Ann.Sharpe              NA       NA      NA  -16.57
## Max.Drawdown       -933.45 -1802.59 -694.00 -701.57
## Profit.To.Max.Draw    2.58    -0.59    0.99   -1.00
## Avg.WinLoss.Ratio       NA     0.35    4.99     NaN
## Med.WinLoss.Ratio       NA     0.35    4.99      NA
## Max.Equity         2542.89   153.54  999.21    0.00
## Min.Equity            0.00 -1649.06 -337.03 -701.57
## End.Equity         2404.89 -1070.06  687.21 -701.57
final_acct <- getAccount(account_st)
plot(final_acct$summary$End.Eq["2016/2017"], main = "Portfolio Equity")

getSymbols("SPY", from = as.Date("2016-10-01"), to = end2, adjust = TRUE)

# I don't get quite as much SPY because our trading strategy has a warm-up
# period
plot(final_acct$summary$End.Eq["2016/2017"] / 100000,
     main = "Portfolio Equity", ylim = c(0.95, 1.15))
lines(SPY$SPY.Adjusted / SPY$SPY.Adjusted[[1]], col = "blue")

Our strategy’s performance does not at all rival that of SPY out of sample. Admittedly, SPY’s behavior over this period is spectacular even by its own standards, with an annualized rate of return of about 21% (~10% is about normal), but the strategy has an annualized rate of return of about 5%, far below SPY’s “typical” performance.

Conclusion

If you were thinking about quitting your day job to day-trade full time, I hope I have curbed your expectations. Beating the market should be looking rather difficult by now.

I recently read an article on Bloomberg about why financial products always seem to burn investors. The article pointed the finger at p-hacking, backtesting without sufficiently checking out-of-sample performance, and overall sloppiness. This paragraph in particular jumped out at me:

The old adage applies: If asset managers and finance professors are super-smart, why ain’t they super-rich? The big money is being made by firms that ignore finance theory. Renaissance Technologies on Long Island is dripping with mathematicians and physicists but will not hire a finance Ph.D. Two Sigma Investments is run by computer scientists and mathematicians. D.E. Shaw was founded by a computational biologist. And so on. Reflecting mathematicians’ disdain for sloppiness in finance, a 2014 essay in the Notices of the American Mathematical Society referred to backtest overfitting as “pseudo-mathematics and financial charlatanism.”

quantstrat is a backtesting package. You need to backtest. Backtesting, though, is not nearly enough. You need to check out-of-sample performance, and even this must be done carefully lest you succumb to the same flaws of backtesting alone (that is, torturing the data until you get a confession).

In other words, we will need to look at some machine-learning techniques such as cross-validation to figure out what choice of parameters would be best. Packages such as caret may facilitate this, and I may look into them in the future. With that said, I doubt that the simple moving average crossover strategy will be very profitable, but that’s okay; I’ve been using it for pedagogical reasons only. With it, we’ve been able to see basically all of quantstrat‘s major features. We’ll keep using it as long as it’s useful.

To leave a comment for the author, please follow the link and comment on their blog: R – Curtis Miller's Personal Website.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...




Source link

Intel Edison Compute Module IoT (Internet of Things) On-Board Antenna Single Components EDI1.SPON.AL.S

By | iot, machinelearning

Insert Alt Text here

view larger
Insert Alt Text here

view larger
Insert Alt Text here

view larger
Intel Edison

Lowering the barriers to entry to create wearables and other products for the Internet of Things

The Intel Edison development platform is the first in a series of low-cost, product-ready, general purpose compute platforms that help lower the barriers to entry for entrepreneurs of all sizes—from pro-makers to consumer electronics and companies working in the Internet of Things (IoT).

Intel Edison packs a robust set of features into its small size, delivering great performance, durability, and a broad spectrum of I/O and software support. Those versatile features help meet the needs of a wide range of customers.

The unique combination of small size, power, rich capabilities, and ecosystem support inspires creativity and enables rapid innovation from prototype to production for entrepreneurs of all sizes.

  • Intel Atom system-on-a-chip (SoC) based on leading-edge 22 nm Silvermont microarchitecture including a dual-core CPU and single core microcontroller (MCU)
  • Integrated Wi-Fi, Bluetooth* LE, memory, and storage
  • Support for more than 30 industry-standard I/O interfaces via a 70-pin connector
  • A series of Intel & partner created expansion boards to support the long tail of innovation
  • Support for Yocto Linux*, Arduino*, Python*, and Node.js*
  • Open source community software tools enabling ease of adoption that will inspire developers
  • Integrates with cloud platforms providing visualization of streamed data, dynamic rule evaluation/alerting and analytics.

Intel Atom system-on-a-chip (SoC) based on leading-edge 22 nm Silvermont microarchitecture including a dual-core CPU and single core microcontroller (MCU)
Integrated Wi-Fi, Bluetooth LE, memory, and storage
Support for more than 30 industry-standard I/O interfaces via a 70-pin connector
A series of Intel & partner created expansion boards to support the long tail of innovation
Support for Yocto Linux, Arduino, Python, and Node.js
Open source community software tools enabling ease of adoption that will inspire developers
Integrates with cloud platforms providing visualization of streamed data, dynamic rule evaluation/alerting and analytics

$59.99



#ICLR2017 Tuesday Afternoon Program

By | machinelearning

ICLR 2017 continues this afternoon in Toulon, there will be a blog post for each half day that features directly links to papers from the Open review section. The meeting will be featured live on Facebook here at: https://www.facebook.com/iclr.cc/ . If you want to say hi, I am around.and we’re hiring.
14.00 – 16.00 Poster Session 2 (Conference Papers, Workshop Papers)
16.00 – 16.15 Coffee Break
16.15 – 17.00 Invited talk 2: Riccardo Zecchina
17.00 – 17.20 Contributed Talk 3: Learning to Act by Predicting the Future
17.20 – 17.40 Contributed Talk 4: Reinforcement Learning with Unsupervised Auxiliary Tasks
17.40 – 18.00 Contributed Talk 5: Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
18.00 – 18.10 Group photo at the Stade Félix Mayol
19.00 – 24.00 Gala dinner offered by ICLR

C1: Sigma Delta Quantized Networks
( code)
C2: Paleo: A Performance Model for Deep Neural Networks
C3: DeepCoder: Learning to Write Programs
C4: Topology and Geometry of Deep Rectified Network Optimization Landscapes
C5: Incremental Network Quantization: Towards Lossless CNNs with Low-precision Weights
C6: Learning to Perform Physics Experiments via Deep Reinforcement Learning
C7: Decomposing Motion and Content for Natural Video Sequence Prediction
C8: Calibrating Energy-based Generative Adversarial Networks
C9: Pruning Convolutional Neural Networks for Resource Efficient Inference
C10: Incorporating long-range consistency in CNN-based texture generation
( code )
C11: Lossy Image Compression with Compressive Autoencoders
C12: LR-GAN: Layered Recursive Generative Adversarial Networks for Image Generation
C13: Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data
C14: Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data
C15: Mollifying Networks
C16: beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
C17: Categorical Reparameterization with Gumbel-Softmax
C18: Online Bayesian Transfer Learning for Sequential Data Modeling
C19: Latent Sequence Decompositions
C20: Density estimation using Real NVP
C21: Recurrent Batch Normalization
C22: SGDR: Stochastic Gradient Descent with Restarts
C23: Variable Computation in Recurrent Neural Networks
C24: Deep Variational Information Bottleneck
C25: SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
C26: TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency
C27: Frustratingly Short Attention Spans in Neural Language Modeling
C28: Offline Bilingual Word Vectors, Orthogonal Transformations and the Inverted Softmax
C29: LEARNING A NATURAL LANGUAGE INTERFACE WITH NEURAL PROGRAMMER
C30: Designing Neural Network Architectures using Reinforcement Learning
C31: Metacontrol for Adaptive Imagination-Based Optimization (spaceship dataset )
C32: Recurrent Environment Simulators
C33: EPOpt: Learning Robust Neural Network Policies Using Model Ensembles

W1: Lifelong Perceptual Programming By Example
W2: Neu0
W3: Dance Dance Convolution
W4: Bit-Pragmatic Deep Neural Network Computing
W5: On Improving the Numerical Stability of Winograd Convolutions
W6: Fast Generation for Convolutional Autoregressive Models
W7: THE PREIMAGE OF RECTIFIER NETWORK ACTIVITIES
W8: Training Triplet Networks with GAN
W9: On Robust Concepts and Small Neural Nets
W10: Pl@ntNet app in the era of deep learning
W11: Exponential Machines
W12: Online Multi-Task Learning Using Biased Sampling
W13: Online Structure Learning for Sum-Product Networks with Gaussian Leaves
W14: A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Samples
W15: Compositional Kernel Machines
W16: Loss is its own Reward: Self-Supervision for Reinforcement Learning
W17: REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
W18: Precise Recovery of Latent Vectors from Generative Adversarial Networks
W19: Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization (code)



Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche’s feed, there’s more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.



Source link