One of my favorite financial graphics was included in the NY times in 2011. The chart was produced by Crestmont Research and it showcases the returns of an investment in the S&P 500 starting in any given year since 1920 and ending any other year (until 2011).
The reason I like the chart so much is that it addresses some concepts that I find difficult to conceptualize: historic longterm market returns, shortterm volatility, and what lies in between.
I decided to reproduce a few different versions of this chart myself using Robert Shiller's publicly available S&P 500 data. This gives me control over both the data used and the assumptions. Here is the first one:
This chart includes reinvested dividends and is adjusted for inflation via CPI but does not account for any taxes or fees.
I chose to do this because the original graphic sparked some curiosities:
 The Times/Crestmont chart suggests a median return of 4.1%, which seems low to me based on other estimates. The chart accounts for inflation and dividends. It also includes "taxes and fees", but doesn't specify what those are in the article (though their website does have an assumptions page that were likely used).
 More data could be included. Almost a decade of data has accumulated since this article was published and there exists S&P 500 data dating all the way back to the 1870s^{1}.
 The analysis shows how a lump sum investment has faired over the last century or so, but it would also be interesting to see how periodic investments and dollar cost averaging affect the results.
How do the charts compare?
According to the Times, the original chart was made by Ed Easterling, a financial researcher at Crestmont Research, after a debate with a client regarding expected market returns. If you are interested, Crestmont research has updated matrices on their website, which, though styled differently, contain additional data such as P/E ratios, significant events of the year, and more.
My chart has the same general shape of the Crestmont/Times graph. You can see various lengthy bull markets in 1920s, 1950s, 1980s, and 1990s. There is the great depression, high inflation of 1970s, and the 2008 financial meltdown.
However, you'll also notice that the returns in my version of the graphic are significantly higher. This is most likely due to my exclusion of taxes and fees. But which is more relevant for your situation? Personally, I really just wanted to know what expected return to plug into my retirement calculator. If you don't care for all the assumptions talk, just skip to the answer.
Assumptions Affect Historic LongTerm Market Returns
I've seen so many different estimates for long term expected market returns that are supposedly based on historical performance of the S&P500. As stated earlier, the Times chart suggests a median annual return of 4.1%, while I've seen other estimates as high as 12%. Why is such a concrete question so hard to get a consistent answer to?
One reason there are many different estimates is that there are many variables.
 How is the average return calculated?
 Which funds are considered?
 How long is the time period?
 Does it account for inflation?
 Does it include dividends?
 What about taxes? In which capital gains bracket? Are taxes applied to reinvested dividends?
 What kind of fees are applied?
Luckily for us, we can narrow this down because there are reasonable answers to many of these questions. Here are the parameters I'm using for this post:

How is the average return calculated?
Using CAGR
Grey Box Tangent: CAGR Calculation
Say you invest $100 in a stock. It goes up 200% the next day (cha ching), to $300. The day after that, it goes down 100%, to $0. You now have no money, but you can brag to your friends that your investment averaged ((200% + 100%)/2
) a 50% daily return!
This kind of average is useless for our purposes, for a couple of reasons:
 The 200% and the 100% in the example are not percentages of the same number, so it doesn't make sense to average them.

Even if you were a bit smarter and calculated average gain from the start price to the end price (instead of using percentages), and divided that by the number of periods, your answer would still be wrong because the growth isn't linear, it's compounded.
THIS IS WRONG: AVG RETURN = (endValue  startValue) / (startValue)) / numberOfPeriods
Calculating the average returns this way makes it look higher then it really is.
What we really want when we talk about average yearly returns: What would the yearly return be if it was evenly spread out over each year of the investment period? This is the kind of number we can plug into a financial function/calculator as 'expected returns' or 'discount rate'.
To calculate this, we need to use a type of average that accounts for compounded growth: the geometric mean. In finance, it's known as the Compound Annual Growth Rate, or CAGR.
CAGR = (endValue / startValue) ^ (1 / numberOfPeriods)  1
CAGR provides the equivalent constant rate of return over the given time interval, which is much more useful.
For example, from 1999 to 2004:
Adjusted S&P 500 price in Jan 1999: $829,362.65
Adjusted S&P 500 price in Jan 2004: $716,325.33
CAGR = ($716,325.33 / $829,362.65) ^ (1 / (20041999))  1
CAGR = 0.0289 = 2.89%
Not a great stretch for the S&P 500

Which funds?
We'll use the underlying S&P 500 index. This is widely considered the "market". There are lots of different funds that track this index. You can purchase them as an ETF (Exchange Traded Fund) or a mutual fund. It doesn't matter much for our purposes, but you can read about the differences here.

Time period?
The chart shows returns for every yearly period between 1871 and 2020 (~mid Jan to ~mid Jan of the following year). For the final CAGR number, we'll use the longest period available (18712020).

Inflation?
Included, using CPI provided in Shilller's data.

Reinvested dividends?
Included, using values from Shilller's data.

Taxes?
Determining the capital gains tax rate to apply gets complicated for several reasons:
 Different accounts are subject to different tax rates (e.g., pretax, Roth, not taxsheltered)
 Different income levels are subject to different tax rates
 Different capital gains tax rates through history
 Different tax rates depending on how long assets are held
 Different tax rates on capital gains in different states/countries
 Different funds produce different amounts of taxable gains when selling underlying assets
So, in order to keep the charts on this page broadly applicable, taxes have been excluded. However, to get a feel for how taxes affect longterm returns, I've included a table with longterm CAGR calculations for all the current federal U.S. capital gains tax rates.
In the table we'll account for capital gains that show up in two different places:

Each time an investor is paid a dividend in a nontax sheltered account, taxes usually must be paid on that dividend in that year. This tax reduces the amount of the dividend that gets reinvested.
I'm assuming all dividends are qualified dividends, which makes them subject to longterm capital gains tax rates instead of the higher shortterm capital gains rates. The percentage of dividends paid out that are qualified is called Qualified Dividend Income or QDI. QDI can vary from year to year, but for a fund tracking the S&P 500, the vast majority of dividends are qualified. For example, Vanguard's QDI for S&P 500 ETF
VOO
for 2019 was 100% and so far in 2020, it's 99.80%I'm also taking out the capital gains tax as soon as the dividends are paid, not waiting until the end of the year.
 Capital gains are applied in the final year when all the money is withdrawn. Because of reinvested dividends, the basis varies. For these calculations I'm using the average cost basis method to account for this variation.
And we will not account for capital gains that can show up in one other place:

Capital gains also occur when a fund sells underlying appreciated securities. Funds don't pay capital gains taxes; their shareholders do. Funds are required by the IRS to distribute almost all of their capital gains to investors; the investors must pay the capital gains on those distributions if the funds are held in a taxable account.
The words used to describe this distribution are "capital gains disbursements". The word "disbursement" makes it sound exciting, like you are getting a dividend, but really it's just a forced liquidation of a portion of the fund you own. The value of the assets within the fund drop by the amount of the disbursement. So all you're really getting is tax bill which increases the ongoing cost of holding the investment.
The good news is that funds that track indices (like the S&P 500) are less susceptible to taxinducing churn than actively managed funds that might buy and sell more frequently because of management transitions or their investing philosophies.
ETFs are particularly tax efficient in this regard because funds can interact with brokers via special IRS rules, allowing ETFs to defer gains. How exactly this works is topic for another post.

Fees?
Just like taxes, fees can vary significantly from situation to situation. So, in order to keep the charts on this page broadly applicable, fees have been excluded as well. However, like with capital gains, I've included longterm CAGR calculations for a handful of different fee levels.
There are a few kinds of fees I'm aware of when dealing with S&P 500 ETFs or mutual funds. Many fees are reasonable to exclude because they are zero or are practically zero for several popular large S&P 500 ETFs/Mutual Funds. Therefore, the only fee we take into consideration is the expense ratio.
 Expense Ratio (included)  This fee is applied, over a yearlong period, on the whole value of the investment. This covers the operating expenses for the fund.

Advisory Fee (excluded)  If you hold the S&P 500 ETF/Mutual Fund indirectly through a financial advisor/wealth manager/roboadvisor, you're likely paying this fee. Many (but not all) advisors charge clients a percentage fee based on the client's total assets under their management. This fee is typically in the neighborhood of ~1% for human advisors and ~0.25% for roboadvisors.
These percentages may sound small, but you'll see that they have a significant negative effect on returns. If this applies to you, you can add your advisory fee percentage to the expense ratio when looking up your CAGR in the table below. That said, you can hold the fund directly in a brokerage account (e.g., Vanguard, Fidelity) to avoid paying this fee.
 Commissions (excluded)  Any kinds of sales fees, including frontend or backend load fees on mutual funds. You can easily find commissionfree trades for S&P 500 ETFs or noload S&P 500 mutual funds.

AskBid Spread and Net Asset Value (NAV) vs Market Price Differences (excluded)  These fees are a side effect of ETF intraday trading. Neither of these apply to mutual funds, which are priced daily. Unlike the expense ratio, these fees apply only on buy/sell transactions.
For large S&P 500 ETFs, ask/bid spread is typically, small e.g. ~0.01%. NAV differences can end up effectively being a fee or a credit. On average, it's a wash. You can read more details on these ETF fees here.
So What's the Answer?
For your reference, based of the Shiller data:
Between January 1871 and January 2020, the S&P 500's annualized real return, measured using CAGR, including dividends, excluding any taxes and fees is 6.95%.
How do taxes and fees affect this value? Here is a table.
Long Term Capital Gains Tax Rate 


Expense Ratio 
0%  15%  18.8%  20%  23.8%  
0%  6.95%  6.14%  5.94%  5.87%  5.66%  
0.015%  6.93%  6.13%  5.92%  5.86%  5.65%  
0.03%  6.92%  6.11%  5.91%  5.84%  5.63%  
0.25%  6.68%  5.88%  5.67%  5.61%  5.40%  
0.5%  6.42%  5.62%  5.41%  5.34%  5.14%  
0.75%  6.15%  5.35%  5.15%  5.08%  4.88%  
1%  5.89%  5.09%  4.88%  4.82%  4.61%  
2%  4.83%  4.04%  3.84%  3.78%  3.58% 
Note: This table assumes a constant capital gains rate and expense ratio for the entire life of the investment. There are at least two major reasons this might not be the case:
 You are in a different capital gains bracket due to income differences when reinvesting dividends vs. withdrawing money.
 Capital gains tax rates have varied over the ~150 years, and will likely continue to do so.
Overall, my assumptions about taxes and fees represent a departure from the philosophy of the original Crestmont/Times chart. The intention of that chart seems to be tracking the returns of an actual investment from one point in history to another (including historic fees and tax rates).
In contrast, the analysis in this table only takes the raw growth and dividend numbers from history and applies modern taxes and fees. Essentially, I assume that modern taxes and fees are more indicative of future taxes and fees than historical taxes and fees. This has the added benefit of being easier to calculate.
How much does a percentage point matter? Quite a bit. Here are the returns of hypothetical $100 investment at various rates and periods:
Investment Period In Years  

10  20  30  40  100  
CAGR  7.00%  $197  $387  $761  $1,497  $86,772 
6.00%  $179  $321  $574  $1,029  $33,930  
5.00%  $163  $265  $432  $704  $13,150  
4.00%  $148  $219  $324  $480  $5,050  
3.00%  $134  $181  $243  $326  $1,922  
2.00%  $122  $149  $181  $221  $724  
1.00%  $110  $122  $135  $149  $270  
2.02%  $82  $67  $54  $44  $13 
The last row shows the effect of holding cash, which loses purchasing power due to an an inflation CAGR of about 2% since 1871 based on CPI data.
You can see all my data and calculations here.
How Short Is Short Term Volatility
150 years is a long time. How long does it take for this 6.95% rate to come to fruition?
Over small time intervals, the market is unpredictable. Day traders make investments that can last just hours, minutes, or even seconds. Besides the whiplash at this timescale, these kinds of investments are also subject to a shortterm capital gains tax rates and have the potential to accumulate lots of transaction fees.
Zoom out a bit and it's still pretty wild; let's take a look at what you might expect historically for a one year investment period.
You shouldn't be surprised if you lose money in a 1 year period. 45 of 148 years in the data above (~30%) had negative real returns. But how long is the "short term"? It might be longer than you think.
If your timing is really bad, e.g., buying in the midst of the tech bubble in 1999 and selling right after the great recession in 2009, you'd have lost money even after 10 years (which doesn't seem like a short amount of time). But if this theoretical badluck investor hadn't sold in 2009, he or she would still be up over 4% annually as of Jan 2020 thanks to recent returns.
This following table shows the best/worst/median return for investment periods of various lengths. Keep in mind that these figures are Jan of one year to Jan of the next, so it's possible there are some interyear periods with bigger or smaller returns.
Period in Years  Worst Return  Worst Return Years  Best Return  Best Return Years  Median Return 

5  10.13%  1916  1921  29.60%  1924  1929  7.56% 
10  4.41%  1999  2009  18.33%  1919  1929  6.88% 
20  0.69%  1962  1982  12.97%  1980  2000  6.68% 
30  3.24%  1892  1922  10.24%  1932  1962  6.66% 
40  3.62%  1881  1921  9.94%  1921  1961  6.43% 
50  4.72%  1929  1979  9.18%  1949  1999  6.35% 
60  5.25%  1882  1942  8.23%  1942  2002  6.72% 
70  5.38%  1912  1982  7.85%  1932  2002  6.81% 
80  5.26%  1902  1982  8.44%  1921  2001  6.73% 
How Does Spreading Out Investments Affect the Return?
While interesting, this isn't always how investments work. Most people don't just have a stockpile of money they invest in totality in a single year. Many people invest money as they can, every paycheck, with consistent contributions to a retirement plan or taxable investment account.
Say an investor decided to invest a fixed (but inflation adjusted) dollar amount each year. How does this affect risk and average return? CAGR won't be sufficient here since we have multiple investment start times. Instead, we can use Internal Rate of Return or IRR.
Grey Box Tangent: IRR Calculation
For this example, we'll discuss a 5 year period from 1999 to 2004. We'll treat each time we buy and sell a unit of S&P 500 as a separate investment. So over 5 years, we have 5 investments, with their respective CAGRs:
19992004: 2.89% per year over 5 years
20002004: 6.39% per year over 4 years
20012004: 5.64% per year over 3 years
20022004: 0.90% per year over 2 years
20032004: 26.15% per year over 1 year
My first inclination to determine the effective rate of all these investments was to take their arithmetic average, weighted by their hold period (in years). This is wrong.
THIS IS WRONG:
0.0289 * 5 +
0.0639 * 4 +
0.0564 * 3 +
0.0090 * 2 +
+0.2615 * 1
/ (5 + 4 + 3 + 2 + 1)
= 0.0217 = 2.17%
Why is this wrong? Once again, we are comparing rates of different values. The arithmetic average method treats all years of a given investment equally. But we know that's not true because these rates are compounded each year. For positive rates, we are giving too much credence to the earlier years and not enough to the later years. For negative rates, it's the opposite.
So what's the right way? We can account for compounding by using the time value of money equation. Time value of money allows us to see the final value of each investment if the initialValue was $1. This $1 figure was chosen to make the math easy; you can use any value here  the math will come out the same:
Time value of money: finalValue = initialValue(1 + rate)^(numberOfTimePeriods)
19992004: 2.89%, 5 years  $1(1 + 0.0289)^5 = $0.8636
20002004: 6.39%, 4 years  $1(1 + 0.0639)^4 = $0.7679
20012004: 5.64%, 3 years  $1(1 + 0.0564)^3 = $0.8402
20022004: 0.90%, 2 years  $1(1 + 0.0090)^2 = $0.9821
20032004: 26.15%, 1 year  $1(1 + +0.2615)^1 = $1.2615
Total = $4.7152
Now, the tricky part. What single rate of return for the five different $1 investments would have give us an ending value of $4.7152?
The same question, but in mathematical terms:
Solve for 'r' (rate):
$1(1+r)^5 + $1(1+r)^4 + $1(1+r)^3 + $1(1+r)^2 + $1(1+r)^1 = $4.7152
Often, this type of equation can't be solved algebraically, so we usually resort to numerical methods. Type that equation into Wolfram Alpha and it will calculate the value of 'r' for you.
... [Magic] ...
r = IRR = 0.0195 = 1.95%
Most people resort to Excel Google Sheets, which resorts to numerical methods. The IRR function in Sheets fits our use case, as it accepts cash flows as an argument. The cash flows in this example are $1 each year, then a $4.7152 payout at the end:
=IRR({1, 1, 1, 1, 1, 4.7152})
... [Magic] ...
r = 1.95%
This results in a better annualized rate of return than the 2.89% CAGR from 19992004 (single lumpsum investment in 1999). So, in this particular case, periodic investments did reduce losses.
In the following charts and tables, I use the IRR calculator supplied here. I believe this function was ported from Apache OpenOffice, an open source spreadsheet program.
I regenerated the earlier chart with yearly investments instead of a single lumpsum investment in the first year:
A few observations about this chart:
 Each square represents a combination of all squares below its position on the original chart. So it's sort of like smearing the chart upward.
 The further in the past an investment is, the more likely it will stabilize to roughly the longterm median value (somewhere around 6% to 7%). One might think that periodic purchases reduce volatility, but in reality, they expose you to volatility for longer because recent investments are a bigger part of your portfolio.
The following table summarizes the worst/best/median return for investment periods of various lengths, if investing an equal amount every year.
Period in Years  Worst Return  Worst Return Years  Best Return  Best Return Years  Median Return 

5  14.20%  1970  1975  32.17%  1924  1929  7.28% 
10  7.97%  1965  1975  23.32%  1919  1929  7.16% 
20  1.85%  1901  1921  14.31%  1980  2000  6.94% 
30  1.18%  1891  1921  10.90%  1970  2000  6.68% 
40  2.64%  1881  1921  9.49%  1919  1959  6.47% 
50  4.24%  1871  1921  9.25%  1916  1966  6.45% 
60  4.93%  1882  1942  8.63%  1940  2000  6.66% 
70  5.31%  1879  1949  8.25%  1930  2000  6.85% 
80  5.63%  1902  1982  8.22%  1920  2000  6.95% 
Over shorter periods, you can see the increased volatility in the magnitudes of the best returns and the worst returns. In contrast, the median returns are largely the same at all time periods measured. It takes about 40 or 50 years, but eventually all the numbers start to look similar to the lump sum case.
Dollar Cost Averaging (DCA)
But what if you do have a giant pile of money? Is it better to...
 Invest it all right away? or
 Embrace the concept of dollar cost averaging and spread it out over time?
Grey Box Tangent: IRR and CAGR with Dollar Cost Averaging
We're back to having a pile of cash at the beginning of our investment period  we're not earning money a little bit at a time. We will reserve all our cash at the beginning of the investment period. Our cash will be in two different buckets:

The first bucket does no work, earning a 0% return.
Note that this is a 0% inflation adjusted return, so it would need to be invested via some "safe" mechanism that keeps up with inflation, e.g., TIPS, maybe a money market investment fund or a highyield savings account.
 The second bucket is invested in the market.
The share of the initial cash allocated to each bucket look like this each year:
Once again, we'll calculate the ending price by summing each of the individual investments returns:
S&P 500 Bucket
19992004: 2.89%, 5 years  $1(1 + 0.0289)^5 = $0.8636
20002004: 6.39%, 4 years  $1(1 + 0.0639)^4 = $0.7679
20012004: 5.64%, 3 years  $1(1 + 0.0564)^3 = $0.8402
20022004: 0.90%, 2 years  $1(1 + 0.0090)^2 = $0.9821
20032004: 26.15%, 1 year  $1(1 + +0.2615)^1 = $1.2615
SubTotal = $4.7152
0% Bucket
19992000: 0%, 1 year  $1(1 + 0)^1 = $0
19992001: 0%, 2 years  $1(1 + 0)^2 = $0
19992002: 0%, 3 years  $1(1 + 0)^3 = $0
19992003: 0%, 4 years  $1(1 + 0)^4 = $0
SubTotal = $0
Total = $4.7152 + $0.00 = $4.7152
The good news is we once again have a single startValue
and single endValue
. Which means we can use CAGR again!
CAGR = (endValue / startValue) ^ (1 / numberOfPeriods)  1
CAGR = ($4.7152/ $5) ^ (1 / (20041999))  1
CAGR = 0.01166 ~= 1.17%
With some changes to the "yearly investment" IRR approach, we can also use IRR to calculate the annualized return. Instead of a series of $1 cash flows followed by a withdrawal windfall, we put all $5 into our investments in year one. It doesn't matter at this point that they are in two separate investment buckets.
Each subsequent year we take out a dollar from our 0% investment bucket (+$1) and reinvest it into the S&P 500 bucket ($1) resulting intermediate net cashflows of zero.
At the end of the period, we get the same return value we just calculated. This last part is where we've accounted for the different investment buckets. So, the IRR calculation for yearly DCA from 19992004 would look like this:
=IRR({5, 0, 0, 0, 0, 4.7152}) = 1.17%
It's the same!
Here is the same table as above, but with DCA instead of a lump sum or yearly investment. I've bolded the returns that are higher than the corresponding returns in the lump sum investment table.
Period in Years  Worst Return  Worst Return Years  Best Return  Best Return Years  Median Return 

5  8.35%  1970  1975  20.04%  1924  1929  4.41% 
10  4.20%  1965  1975  14.20%  1919  1929  4.08% 
20  0.95%  1901  1921  8.79%  1980  2000  3.97% 
30  0.62%  1891  1921  6.81%  1970  2000  3.92% 
40  1.46%  1881  1921  6.07%  1919  1959  3.91% 
50  2.49%  1871  1921  6.12%  1916  1966  4.02% 
60  3.03%  1882  1942  5.82%  1940  2000  4.30% 
70  3.39%  1879  1949  5.69%  1930  2000  4.57% 
80  3.71%  1902  1982  5.81%  1920  2000  4.76% 
Comparing to the lump sum investment returns reveals that any money that is earning nothing when it could be invested in the market takes a significant chunk out of the median longterm earnings (about 23%, which out of 7% is about 2943% of your total returns!). DCA also reduces the best case returns in all measured time intervals.
Yearly DCA has a smaller beneficial effect on the worst case returns, but that effect tails off somewhere between a 10 and 20 year investment period. In fact, longterm worst case returns are lower for DCA than they are for lump sum investments!
It's potentially worse though. Remember that these numbers are adjusted for inflation. Therefore, the values above only apply if the dollars that aren't in the market are in some sort of investment vehicle that keeps up with inflation. If that money is all in cash, the cash drag will have an even larger negative effect.
In summary, DCA is sometimes portrayed as a good longterm strategy, but this analysis shows just the opposite, it reduces shortterm downside risk at the expense of:
 Reduced best and median returns for all time periods included in the table
 Reduced longterm returns in the worst case
So unless you have an important reason to minimize downside risk in the short term, it makes sense to invest the entire giant pile of money right away, even if your market timing is really bad.
Here is the chart for DCA:
And I'll leave you with a table of all three investment strategies (lump sum, yearly, yearly DCA) so you can compare them side by side.
Period in Years  Worst Return  Median Return  Best Return  

Lump Sum  Yearly  Yearly DCA  Lump Sum  Yearly  Yearly DCA  Lump Sum  Yearly  Yearly DCA  
5  10.13%  14.20%  8.35%  7.56%  7.28%  4.41%  29.60%  32.17%  20.04% 
10  4.41%  7.97%  4.20%  6.88%  7.15%  4.08%  18.33%  23.32%  14.20% 
20  0.69%  1.85%  0.95%  6.68%  6.94%  3.97%  12.97%  14.31%  8.79% 
30  3.24%  1.18%  0.62%  6.67%  6.68%  3.92%  10.24%  10.90%  6.81% 
40  3.62%  2.64%  1.46%  6.43%  6.47%  3.91%  9.94%  9.49%  6.07% 
50  4.72%  4.24%  2.49%  6.35%  6.45%  4.02%  9.18%  9.25%  6.12% 
60  5.25%  4.93%  3.03%  6.72%  6.66%  4.30%  8.23%  8.63%  5.82% 
70  5.38%  5.31%  3.39%  6.81%  6.85%  4.57%  7.85%  8.26%  5.69% 
80  5.26%  5.63%  3.71%  6.74%  6.95%  4.76%  8.44%  8.22%  5.81% 
Disclaimer
I, the author of this post, have no formal tax, accounting, or financial background. I've done my best to ensure the information is accurate, but it's possible that I've missed important information, miscalculated something, or made some other errors/omissions. If you see something that's incorrect, please contact me. As always, the site disclaimer applies.

Even though the S&P 500 index was established in 1926, work has been done to project the index backward into the early 1800s. This projection is known as the Cowles extension and it's included as part of Shiller's data set.
↩