GEFS#
This notebook demonstrates using the Global Ensemble Forecast System. It is available on AWS for 2017 to present.
[1]:
from herbie import Herbie
from herbie.toolbox import EasyMap, pc
from herbie import paint
import matplotlib.pyplot as plt
The GEFS model product could be any of the following:
'atmos.5'
- Half degree atmos PRIMARY fields (pgrb2ap5); ~83 most common variables.'atmos.5b'
- Half degree atmos SECONDARY fields (pgrb2bp5); ~500 least common variables'atmos.25'
- Quarter degree atmos PRIMARY fields (pgrb2sp25); ~35 most common variables'wave'
- Global wave products.'chem.5'
- Chemistry fields on 0.5 degree grid'chem.25'
- Chemistry fields on 0.25 degree grid
You also must specify a member
. For the atmos output, this should be something like 0
or "c00"
for control member, 1
-30
or "p01"
-"p30"
for members 1-30, or 'avg'
or 'mean'
for the ensemble mean, 'spr'
for ensemble spread.
[2]:
H = Herbie(
"2023-01-04 12:00",
model="gefs",
product="atmos.5",
member="mean",
)
H
β
Found β model=gefs β product=atmos.5 β 2023-Jan-04 12:00 UTC F00 β GRIB2 @ aws β IDX @ aws
[2]:
ββHerbie GEFS model atmos.5 product initialized 2023-Jan-04 12:00 UTC F00 β source=aws
[3]:
# Show all the available products (from the model template file)
H.PRODUCTS
[3]:
{'atmos.5': 'Half degree atmos PRIMARY fields (pgrb2ap5); ~83 most common variables.',
'atmos.5b': 'Half degree atmos SECONDARY fields (pgrb2bp5); ~500 least common variables',
'atmos.25': 'Quarter degree atmos PRIMARY fields (pgrb2sp25); ~35 most common variables',
'wave': 'Global wave products.',
'chem.5': 'Chemistry fields on 0.5 degree grid',
'chem.25': 'Chemistry fields on 0.25 degree grid'}
[4]:
df = H.inventory()
df
[4]:
grib_message | start_byte | end_byte | range | reference_time | valid_time | variable | level | forecast_time | ? | search_this | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 224943.0 | 0-224943 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | HGT | 10 mb | anl | ens mean | :HGT:10 mb:anl:ens mean |
1 | 2 | 224944 | 353985.0 | 224944-353985 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | TMP | 10 mb | anl | ens mean | :TMP:10 mb:anl:ens mean |
2 | 3 | 353986 | 387270.0 | 353986-387270 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | RH | 10 mb | anl | ens mean | :RH:10 mb:anl:ens mean |
3 | 4 | 387271 | 641626.0 | 387271-641626 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | UGRD | 10 mb | anl | ens mean | :UGRD:10 mb:anl:ens mean |
4 | 5 | 641627 | 881485.0 | 641627-881485 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | VGRD | 10 mb | anl | ens mean | :VGRD:10 mb:anl:ens mean |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
66 | 67 | 13100932 | 13361478.0 | 13100932-13361478 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | VGRD | 10 m above ground | anl | ens mean | :VGRD:10 m above ground:anl:ens mean |
67 | 68 | 13361479 | 13520799.0 | 13361479-13520799 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | PWAT | entire atmosphere (considered as a single layer) | anl | ens mean | :PWAT:entire atmosphere (considered as a singl... |
68 | 69 | 13520800 | 13665976.0 | 13520800-13665976 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | CAPE | 180-0 mb above ground | anl | ens mean | :CAPE:180-0 mb above ground:anl:ens mean |
69 | 70 | 13665977 | 13762088.0 | 13665977-13762088 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | CIN | 180-0 mb above ground | anl | ens mean | :CIN:180-0 mb above ground:anl:ens mean |
70 | 71 | 13762089 | NaN | 13762089- | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | PRMSL | mean sea level | anl | ens mean | :PRMSL:mean sea level:anl:ens mean |
71 rows Γ 11 columns
[5]:
df.variable.unique()
[5]:
array(['HGT', 'TMP', 'RH', 'UGRD', 'VGRD', 'VVEL', 'PRES', 'TSOIL',
'SOILW', 'WEASD', 'SNOD', 'ICETK', 'PWAT', 'CAPE', 'CIN', 'PRMSL'],
dtype=object)
[6]:
H.inventory("HGT")
[6]:
grib_message | start_byte | end_byte | range | reference_time | valid_time | variable | level | forecast_time | ? | search_this | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 224943.0 | 0-224943 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | HGT | 10 mb | anl | ens mean | :HGT:10 mb:anl:ens mean |
5 | 6 | 881486 | 1113633.0 | 881486-1113633 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | HGT | 50 mb | anl | ens mean | :HGT:50 mb:anl:ens mean |
10 | 11 | 1829912 | 2058353.0 | 1829912-2058353 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | HGT | 100 mb | anl | ens mean | :HGT:100 mb:anl:ens mean |
15 | 16 | 2853920 | 3078453.0 | 2853920-3078453 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | HGT | 200 mb | anl | ens mean | :HGT:200 mb:anl:ens mean |
20 | 21 | 3750171 | 3972302.0 | 3750171-3972302 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | HGT | 250 mb | anl | ens mean | :HGT:250 mb:anl:ens mean |
25 | 26 | 4665076 | 4885222.0 | 4665076-4885222 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | HGT | 300 mb | anl | ens mean | :HGT:300 mb:anl:ens mean |
30 | 31 | 5575673 | 5819612.0 | 5575673-5819612 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | HGT | 500 mb | anl | ens mean | :HGT:500 mb:anl:ens mean |
35 | 36 | 6505417 | 6760482.0 | 6505417-6760482 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | HGT | 700 mb | anl | ens mean | :HGT:700 mb:anl:ens mean |
40 | 41 | 7642597 | 7908461.0 | 7642597-7908461 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | HGT | 850 mb | anl | ens mean | :HGT:850 mb:anl:ens mean |
46 | 47 | 9144964 | 9419014.0 | 9144964-9419014 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | HGT | 925 mb | anl | ens mean | :HGT:925 mb:anl:ens mean |
55 | 56 | 11237567 | 11523718.0 | 11237567-11523718 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | HGT | 1000 mb | anl | ens mean | :HGT:1000 mb:anl:ens mean |
57 | 58 | 11777787 | 11930299.0 | 11777787-11930299 | 2023-01-04 12:00:00 | 2023-01-04 12:00:00 | HGT | surface | anl | ens mean | :HGT:surface:anl:ens mean |
[7]:
ds = H.xarray("PWAT")
ds
π¨π»βπ Created directory: [/home/blaylock/data/gefs/20230104]
[7]:
<xarray.Dataset> Size: 1MB Dimensions: (latitude: 361, longitude: 720) Coordinates: time datetime64[ns] 8B 2023-01-04T12:00:00 step timedelta64[ns] 8B 00:00:00 atmosphereSingleLayer float64 8B 0.0 * latitude (latitude) float64 3kB 90.0 89.5 89.0 ... -89.5 -90.0 * longitude (longitude) float64 6kB 0.0 0.5 1.0 ... 359.0 359.5 valid_time datetime64[ns] 8B 2023-01-04T12:00:00 gribfile_projection object 8B None Data variables: pwat (latitude, longitude) float32 1MB 4.3 4.3 ... 1.0 1.0 Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 2 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: gefs product: atmos.5 description: Global Ensemble Forecast System (GEFS) remote_grib: https://noaa-gefs-pds.s3.amazonaws.com/gefs.2023... local_grib: /home/blaylock/data/gefs/20230104/subset_f4ef1a3... search: PWAT
[8]:
ds2 = H.xarray("HGT:500 mb")
ds2
[8]:
<xarray.Dataset> Size: 1MB Dimensions: (latitude: 361, longitude: 720) Coordinates: time datetime64[ns] 8B 2023-01-04T12:00:00 step timedelta64[ns] 8B 00:00:00 isobaricInhPa float64 8B 500.0 * latitude (latitude) float64 3kB 90.0 89.5 89.0 ... -89.5 -90.0 * longitude (longitude) float64 6kB 0.0 0.5 1.0 ... 359.0 359.5 valid_time datetime64[ns] 8B 2023-01-04T12:00:00 gribfile_projection object 8B None Data variables: gh (latitude, longitude) float32 1MB 5.248e+03 ... 5.02... Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 2 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: gefs product: atmos.5 description: Global Ensemble Forecast System (GEFS) remote_grib: https://noaa-gefs-pds.s3.amazonaws.com/gefs.2023... local_grib: /home/blaylock/data/gefs/20230104/subset_f4efbd3... search: HGT:500 mb
[17]:
ax = EasyMap("50m", crs=ds.herbie.crs, figsize=[10, 10]).STATES().BORDERS().ax
p = ax.pcolormesh(
ds.longitude,
ds.latitude,
ds.pwat,
transform=pc,
cmap=paint.NWSPrecipitation.cmap,
vmin=0,
vmax=80,
)
plt.colorbar(
p,
ax=ax,
orientation="horizontal",
pad=0.01,
shrink=0.8,
label=f"{ds.pwat.GRIB_name}({ds.pwat.GRIB_units})",
)
# Add 500 hPa Geopotential height
ax.contour(
ds2.longitude,
ds2.latitude,
ds2.gh,
colors="k",
linewidths=0.5,
levels=range(0, 10000, 120),
)
ax.set_title(
f"{ds.model.upper()} ensemble mean\nValid: {ds.valid_time.dt.strftime('%H:%M UTC %d %b %Y').item()}",
loc="left",
)
ax.set_title(ds.pwat.GRIB_name, loc="right")
[17]:
Text(1.0, 1.0, 'Precipitable water')

[18]:
# Get a specific member
H5 = Herbie(
"2022-01-01",
model="gefs",
product="atmos.5",
member=5,
)
H5.xarray("TMP:2 m")
## If a user wants all the members in
## a single Dataframe, I'll let the user
## concat it themselves.
β
Found β model=gefs β product=atmos.5 β 2022-Jan-01 00:00 UTC F00 β GRIB2 @ aws β IDX @ aws
π¨π»βπ Created directory: [/home/blaylock/data/gefs/20220101]
[18]:
<xarray.Dataset> Size: 1MB Dimensions: (latitude: 361, longitude: 720) Coordinates: number int64 8B 5 time datetime64[ns] 8B 2022-01-01 step timedelta64[ns] 8B 00:00:00 heightAboveGround float64 8B 2.0 * latitude (latitude) float64 3kB 90.0 89.5 89.0 ... -89.5 -90.0 * longitude (longitude) float64 6kB 0.0 0.5 1.0 ... 359.0 359.5 valid_time datetime64[ns] 8B 2022-01-01 gribfile_projection object 8B None Data variables: t2m (latitude, longitude) float32 1MB 244.5 244.5 ... 252.3 Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 2 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: gefs product: atmos.5 description: Global Ensemble Forecast System (GEFS) remote_grib: https://noaa-gefs-pds.s3.amazonaws.com/gefs.2022... local_grib: /home/blaylock/data/gefs/20220101/subset_04ef8ac... search: TMP:2 m
GEFS-Reanalysis Data#
This section demonstrates using the Global Ensemble Forecast System (GEFS) reanalysis: 2000-2019 dataset.
The GEFS version 12 reanalysis is available on Amazon Web Services and can be retrieved with Herbie.
Note: The GEFS directory structure is different than other models that Herbie can access, and that makes Herbieβs access to these files little awkward. Instead of grouping the GRIB fields by forecast hour where there are many different variables for the same lead time in the same file, the GEFS files are grouped into the same variable per file with each GRIB message being a different lead time. This changes the way a user would use Herbie to access GEFS dataβa user will need to supply a βvariable_levelβ argument to access a full file. For subsetting by specific grib messages, you will use the βsearchβ argument to key in on the message of interest. You will still need to give a value for βfxxβ to tell Herbie which directory to look for.
Yeah, itβs a little different paradigm for Herbie, but we can work with it. It may be nice to write a herbie.tool
to help make calls to Herbie a little more simple (like a custom bulk_download script)
Retrieve a full file#
Download a full GRIB2 file to your local system.
Remember to specify the following:
fxx
is the lead time lead time, a number between 0 and 384. If you are getting the full file, this only tells Herbie what folder to look in (Days:0-10 or Days:10-16).member
is the ensemble member. 0 is the control and a value between 1, 2, 3, or 4 is a perturbation member.variable_level
is the name of the file to obtain.
[19]:
H = Herbie(
"2017-03-14", model="gefs_reforecast", fxx=12, member=0, variable_level="tmp_2m"
)
H.download(verbose=True)
β
Found β model=gefs_reforecast β product=GEFSv12/reforecast β 2017-Mar-14 00:00 UTC F12 β GRIB2 @ aws β IDX @ aws
π¨π»βπ Created directory: [/home/blaylock/data/gefs_reforecast/20170314]
β
Success! Downloaded GEFS_REFORECAST from aws
src: https://noaa-gefs-retrospective.s3.amazonaws.com/GEFSv12/reforecast/2017/2017031400/c00/Days:1-10/tmp_2m_2017031400_c00.grib2
dst: /home/blaylock/data/gefs_reforecast/20170314/tmp_2m_2017031400_c00.grib2
[19]:
PosixPath('/home/blaylock/data/gefs_reforecast/20170314/tmp_2m_2017031400_c00.grib2')
Download/Retrieve a subset#
Subsetting uses the search
argument to parse out information from the GRIB2βs index file. Since the variables of each message in a file are all the same, we need to set the search
to key in on the lead time we are interested in.
Look at the index file to see how to key in on a specific GRIB message.
[20]:
# Look at the search_this column of the index DataFrame
H.inventory().search_this
[20]:
0 :TMP:2 m above ground:3 hour fcst:ENS=low-res ctl
1 :TMP:2 m above ground:6 hour fcst:ENS=low-res ctl
2 :TMP:2 m above ground:9 hour fcst:ENS=low-res ctl
3 :TMP:2 m above ground:12 hour fcst:ENS=low-res...
4 :TMP:2 m above ground:15 hour fcst:ENS=low-res...
...
75 :TMP:2 m above ground:228 hour fcst:ENS=low-re...
76 :TMP:2 m above ground:231 hour fcst:ENS=low-re...
77 :TMP:2 m above ground:234 hour fcst:ENS=low-re...
78 :TMP:2 m above ground:237 hour fcst:ENS=low-re...
79 :TMP:2 m above ground:240 hour fcst:ENS=low-re...
Name: search_this, Length: 80, dtype: object
[21]:
# Get the 15-h forecast
ds = H.xarray(":15 hour fcst:")
ds
[21]:
<xarray.Dataset> Size: 4MB Dimensions: (latitude: 721, longitude: 1440) Coordinates: number int64 8B 0 time datetime64[ns] 8B 2017-03-14 step timedelta64[ns] 8B 15:00:00 heightAboveGround float64 8B 2.0 * latitude (latitude) float64 6kB 90.0 89.75 89.5 ... -89.75 -90.0 * longitude (longitude) float64 12kB 0.0 0.25 0.5 ... 359.5 359.8 valid_time datetime64[ns] 8B 2017-03-14T15:00:00 gribfile_projection object 8B None Data variables: t2m (latitude, longitude) float32 4MB 242.8 242.8 ... 225.7 Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 2 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: gefs_reforecast product: GEFSv12/reforecast description: Global Ensemble Forecast System (GEFS) remote_grib: /home/blaylock/data/gefs_reforecast/20170314/tmp... local_grib: /home/blaylock/data/gefs_reforecast/20170314/sub... search: :15 hour fcst:
[22]:
ax = EasyMap("50m", crs=ds.herbie.crs, figsize=[10, 10]).STATES().BORDERS().ax
p = ax.pcolormesh(
ds.longitude,
ds.latitude,
ds.t2m - 273.15,
transform=pc,
**paint.NWSTemperature.kwargs2,
)
plt.colorbar(
p, ax=ax, orientation="horizontal", pad=0.05, **paint.NWSTemperature.cbar_kwargs2
)
ax.set_title(
f"{ds.model.upper()}: {H.product_description}\nValid: {ds.valid_time.dt.strftime('%H:%M UTC %d %b %Y').item()}",
loc="left",
)
ax.set_title(ds.t2m.GRIB_name, loc="right")
[22]:
Text(1.0, 1.0, '2 metre temperature')

What are valid values for variable_level
?#
You need to look at the file structure within the GEFS bucket to know what is avaialble. We can use s3fs to tell us what we can use for our variable_level
argument.
[23]:
import s3fs
import pandas as pd
[24]:
# List files in the GEFS bucket for a day
fs = s3fs.S3FileSystem(anon=True)
files = fs.ls(
path="noaa-gefs-retrospective/GEFSv12/reforecast/2015/2015010100/c00/Days:1-10"
)
[25]:
# var_lev prefix
var_lev = [i.split("/")[-1].split("_") for i in files if i.endswith(".grib2")]
[26]:
variable_levels_df = pd.DataFrame(var_lev, columns=["variable", "level", "a", "b", "c"])
variable_levels_df
[26]:
variable | level | a | b | c | |
---|---|---|---|---|---|
0 | acpcp | sfc | 2015010100 | c00.grib2 | None |
1 | apcp | sfc | 2015010100 | c00.grib2 | None |
2 | cape | sfc | 2015010100 | c00.grib2 | None |
3 | cin | sfc | 2015010100 | c00.grib2 | None |
4 | dlwrf | sfc | 2015010100 | c00.grib2 | None |
... | ... | ... | ... | ... | ... |
56 | vgrd | pvor | 2015010100 | c00.grib2 | None |
57 | vvel | pres | 2015010100 | c00.grib2 | None |
58 | vvel | pres | abv700mb | 2015010100 | c00.grib2 |
59 | watr | sfc | 2015010100 | c00.grib2 | None |
60 | weasd | sfc | 2015010100 | c00.grib2 | None |
61 rows Γ 5 columns
[28]:
# These are the available variables
variable_levels_df.variable.unique()
[28]:
array(['acpcp', 'apcp', 'cape', 'cin', 'dlwrf', 'dswrf', 'gflux', 'gust',
'hgt', 'hlcy', 'lhtfl', 'ncpcp', 'pbl', 'pres', 'pvort', 'pwat',
'rh', 'sfcr', 'shtfl', 'soilw', 'spfh', 'tcdc', 'tmax', 'tmin',
'tmp', 'tozne', 'tsoil', 'uflx', 'ugrd', 'ulwrf', 'uswrf', 'vflx',
'vgrd', 'vvel', 'watr', 'weasd'], dtype=object)
[29]:
# These are the available levels
variable_levels_df.level.unique()
[29]:
array(['sfc', 'ceiling', 'hybr', 'pres', 'hgt', 'msl', 'mslet', 'pvor',
'isen', 'eatm', 'bgrnd', '2m', 'tatm'], dtype=object)