The National Weather Service is currently in the process of transitioning its primary computer model, the Global Forecast System (GFS), from an old supercomputer to a brand new one.  However, before the switch can be approved, the GFS model on the new computer must generate forecasts indistinguishable from the forecasts on the old one.

One expects that ought not to be a problem, and to the best of my 30+ years of personal experience at the NWS, it has not been.  But now, chaos has unexpectedly become a factor and differences have emerged in forecasts produced by the identical computer model but run on different computers.

This experience closely parallels Ed Lorenz’s experiments in the 1960s, which led serendipitously to development of chaos theory (aka “butterfly effect). What Lorenz found – to his complete surprise – was that forecasts run with identically the same (simplistic) weather forecast model diverged from one another as forecast length increased solely due to even minute differences inadvertently introduced into the starting analyses (“initial conditions”).

An example is shown here, with an explanation of likely causation following. Displayed are standard 7-day GFS forecasts of winds and pressure at high altitudes (500 mb or about 18,000 feet initialized at 8:00 a.m. or 12 UTC on June 23) run on both the current operational computer system and the new supercomputer, known as Weather and Climate Operational Supercomputer System (WCOSS).

Two forecasts of high altitude winds and pressure run on the same computer model (the GFS) but on different computers (left old computer, right new computer). (NOAA)

Significant differences in the location and intensity of weather relevant features are readily apparent. For example, the trough of low pressure over the eastern northern tier of the U.S. in the operational run is stronger and further east than the corresponding WCOSS-based forecast.

On the other hand, the associated circulation in WCOSS (stronger southwesterly winds ahead and northerly winds behind the trough) extends considerably further south through the eastern half of the country. Significant differences are also apparent in the trough/ridge patterns upstream along the U.S. Canadian border and westward into the eastern Pacific.

Shown below are the corresponding (for the same time interval) precipitation forecasts focusing on the eastern half of the U.S.

Two forecasts of  precipitation on the same computer model (the GFS) but on different computers (left old computer, right new computer). (NOAA)

A quick glance might leave the impression only that it’s to be very rainy, especially along the eastern seaboard. But closer examination leaves no doubt there are significant differences in the devil’s details. Note, for example, the differences in the area of maximum precipitation just to the east and southeast of D.C. Note, too, the extension of the heavy rainfall in WCOSS southwest into Louisiana (consistent with the southward extension of the upper-level trough in WCOSS in the high altitude plot).

So what lay behind the chaotic like divergence of solutions between the identically same GFS run on different computer systems?  Simply speaking, the error in model’s sequence of short range (3 hour) forecasts, which provide the “first guess” in assimilation of the latest observations, does not result in precisely the same initial conditions for the next pair of GFS extended range forecasts (see schematic illustration below).

The differences in the simulations arise solely from exceedingly small, but apparently consequential differences in numerical calculations.  These are associated with differences in the computer systems’ structure and logical organization (architecture) and compilers which translate programming codes (e.g., versions of Fortran)  to machine language – and probably other factors way over my head to understand.

The example shown was selected arbitrarily as the latest run at the time of this writing. In another case, which led me to this issue by NCEP’s Bill Bua, differences in extended range forecasts were less but quite noticeable. These two cases (above) may or may not be representative.

For the last few months, NCEP has been running the GFS on WCOSS in parallel with the operational GFS on the current computers. An in depth evaluation by NCEP and other NOAA personnel is not yet complete, but must soon reckon with the advertised expectation that WCOSS will be accepted as the operational computer system in mid-July.

[Note: you can be your own judge (with no vote) by comparing the operational GFS and the parallel GFS run on WCOSS]. The same is true with NCEP’s regional models. For example, in the NMM model, I’ve found differences between parallel runs as likely indistinguishable. This is not surprising since the forecast range (84 hours at most) is too short for initial condition differences to grow meaningful levels.]

Before drawing any conclusions, however, consider this important caveat: What is being shown above are differences in GFS forecasts, not whether one verifies better or worse than the other (i.e. is more accurate). I suspect that differences are random so that on average they are performing statistically about the same. On that basis, it’s reasonable to believe case by case differences can be viewed as an unavoidable fact of life and conclude there is little, if any, reason not to accept WCOSS becoming the operational computer system in July.

Stay tuned.