π Inventory#
Ideally, GRIB2 files have a companion βindexβ file that are a plain ASCII text file that provides some details about the contents of each file (i.e., what each GRIB message contains). These files can tell you the variable represented in the GRIB message, the level, forecast lead time, and the starting byte range in the file.
There are two βflavorsβ of index files, wgrib-style and eccodes-style.
NCEP models provide the wgrib-style index files while ECMWF models provide the eccodes-style index file.
Herbie provides a parser to read the index file into a Pandas DataFrame and calls it the fileβs inventory.
Letβs start by looking at the inventory for a HRRR file.
[1]:
from herbie import Herbie
[2]:
H = Herbie("2024-01-01", model="hrrr")
H
β
Found β model=hrrr β product=sfc β 2024-Jan-01 00:00 UTC F00 β GRIB2 @ aws β IDX @ aws
[2]:
ββHerbie HRRR model sfc product initialized 2024-Jan-01 00:00 UTC F00 β source=aws
The path of the relevant index file is given by H.idx
. You can go to that URL and see what the raw index file looks like.
[4]:
H.idx
[4]:
'https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20240101/conus/hrrr.t00z.wrfsfcf00.grib2.idx'
Herbie parses the raw index file as a Pandas DataFrame using H.inventory()
[3]:
H.inventory()
[3]:
grib_message | start_byte | end_byte | range | reference_time | valid_time | variable | level | forecast_time | search_this | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 202809.0 | 0-202809 | 2024-01-01 | 2024-01-01 | REFC | entire atmosphere | anl | :REFC:entire atmosphere:anl |
1 | 2 | 202810 | 246792.0 | 202810-246792 | 2024-01-01 | 2024-01-01 | RETOP | cloud top | anl | :RETOP:cloud top:anl |
2 | 3 | 246793 | 496145.0 | 246793-496145 | 2024-01-01 | 2024-01-01 | var discipline=0 center=7 local_table=1 parmca... | entire atmosphere | anl | :var discipline=0 center=7 local_table=1 parmc... |
3 | 4 | 496146 | 649032.0 | 496146-649032 | 2024-01-01 | 2024-01-01 | VIL | entire atmosphere | anl | :VIL:entire atmosphere:anl |
4 | 5 | 649033 | 2038336.0 | 649033-2038336 | 2024-01-01 | 2024-01-01 | VIS | surface | anl | :VIS:surface:anl |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
165 | 166 | 126776108 | 126785469.0 | 126776108-126785469 | 2024-01-01 | 2024-01-01 | ICEC | surface | anl | :ICEC:surface:anl |
166 | 167 | 126785470 | 128189723.0 | 126785470-128189723 | 2024-01-01 | 2024-01-01 | SBT123 | top of atmosphere | anl | :SBT123:top of atmosphere:anl |
167 | 168 | 128189724 | 130514441.0 | 128189724-130514441 | 2024-01-01 | 2024-01-01 | SBT124 | top of atmosphere | anl | :SBT124:top of atmosphere:anl |
168 | 169 | 130514442 | 131785130.0 | 130514442-131785130 | 2024-01-01 | 2024-01-01 | SBT113 | top of atmosphere | anl | :SBT113:top of atmosphere:anl |
169 | 170 | 131785131 | NaN | 131785131- | 2024-01-01 | 2024-01-01 | SBT114 | top of atmosphere | anl | :SBT114:top of atmosphere:anl |
170 rows Γ 10 columns
Notice the search_this
column; that is a column that Herbie can do regular expression searches to filter the GRIB messages you want. For example, if you want all the variables at 500 mbβ¦
[9]:
H.inventory(":500 mb")
[9]:
grib_message | start_byte | end_byte | range | reference_time | valid_time | variable | level | forecast_time | search_this | |
---|---|---|---|---|---|---|---|---|---|---|
13 | 14 | 6299332 | 7003497.0 | 6299332-7003497 | 2024-01-01 | 2024-01-01 | HGT | 500 mb | anl | :HGT:500 mb:anl |
14 | 15 | 7003498 | 7550668.0 | 7003498-7550668 | 2024-01-01 | 2024-01-01 | TMP | 500 mb | anl | :TMP:500 mb:anl |
15 | 16 | 7550669 | 8417238.0 | 7550669-8417238 | 2024-01-01 | 2024-01-01 | DPT | 500 mb | anl | :DPT:500 mb:anl |
16 | 17 | 8417239 | 8997799.0 | 8417239-8997799 | 2024-01-01 | 2024-01-01 | UGRD | 500 mb | anl | :UGRD:500 mb:anl |
17 | 18 | 8997800 | 9584981.0 | 8997800-9584981 | 2024-01-01 | 2024-01-01 | VGRD | 500 mb | anl | :VGRD:500 mb:anl |
Notice that only the rows that contain 500 mb are selected. This is useful when you want to download a subset of variables from the GRIB file. Notice the range
column which tells us the byte range of each variable in the file. Herbie will use this byte range when you request downloading only the selected variables or opening it in xarray.
[11]:
H.download(":500 mb", verbose=True, overwrite=True)
π Download subset: ββHerbie HRRR model sfc product initialized 2024-Jan-01 00:00 UTC F00 β source=aws
cURL from https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20240101/conus/hrrr.t00z.wrfsfcf00.grib2
Found 5 grib messages.
Download subset group 1
14 :HGT:500 mb:anl
15 :TMP:500 mb:anl
16 :DPT:500 mb:anl
17 :UGRD:500 mb:anl
18 :VGRD:500 mb:anl
curl -s --range 6299332-9584981 "https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20240101/conus/hrrr.t00z.wrfsfcf00.grib2" > "/home/blaylock/data/hrrr/20240101/subset_6befbe61__hrrr.t00z.wrfsfcf00.grib2"
πΎ Saved the subset to /home/blaylock/data/hrrr/20240101/subset_6befbe61__hrrr.t00z.wrfsfcf00.grib2
[11]:
PosixPath('/home/blaylock/data/hrrr/20240101/subset_6befbe61__hrrr.t00z.wrfsfcf00.grib2')
[13]:
H.xarray(":500 mb")
/home/blaylock/GITHUB/Herbie/herbie/core.py:1088: UserWarning: Will not remove GRIB file because it previously existed.
warnings.warn("Will not remove GRIB file because it previously existed.")
[13]:
<xarray.Dataset> Dimensions: (y: 1059, x: 1799) Coordinates: time datetime64[ns] 2024-01-01 step timedelta64[ns] 00:00:00 isobaricInhPa float64 500.0 latitude (y, x) float64 21.14 21.15 21.15 ... 47.86 47.85 47.84 longitude (y, x) float64 237.3 237.3 237.3 ... 299.0 299.0 299.1 valid_time datetime64[ns] 2024-01-01 Dimensions without coordinates: y, x Data variables: t (y, x) float32 ... u (y, x) float32 ... v (y, x) float32 ... gh (y, x) float32 ... dpt (y, x) float32 ... gribfile_projection object None Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 0 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: hrrr product: sfc description: High-Resolution Rapid Refresh - CONUS remote_grib: https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.... local_grib: /home/blaylock/data/hrrr/20240101/subset_6befbe6... search: :500 mb
More examples of valid regular expressions are found in the Herbie Docs: Subset with search. Using H.inventory(search)
is an effective way to test differeng regex patterns to get the variables you are interested in downloading.
The eccodes-style index files work the same way, expect the regex for selecting variable names and levels will be different. Here is the ECMWF forecast inventory file.
[14]:
H = Herbie("2024-01-01", model="ecmwf")
H.idx
β
Found β model=ecmwf β product=oper β 2024-Jan-01 00:00 UTC F00 β GRIB2 @ azure β IDX @ azure
[14]:
'https://ai4edataeuwest.blob.core.windows.net/ecmwf/20240101/00z/0p4-beta/oper/20240101000000-0h-oper-fc.index'
[15]:
H.inventory()
[15]:
grib_message | start_byte | end_byte | range | reference_time | valid_time | step | param | levelist | levtype | number | domain | expver | class | type | stream | search_this | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 205483 | 0-205483 | 2024-01-01 | 2024-01-01 | 0 days | gh | 250 | pl | NaN | g | 0001 | od | fc | oper | :gh:250:pl:g:0001:od:fc:oper |
1 | 2 | 205483 | 427603 | 205483-427603 | 2024-01-01 | 2024-01-01 | 0 days | gh | 925 | pl | NaN | g | 0001 | od | fc | oper | :gh:925:pl:g:0001:od:fc:oper |
2 | 3 | 427603 | 427827 | 427603-427827 | 2024-01-01 | 2024-01-01 | 0 days | tp | NaN | sfc | NaN | g | 0001 | od | fc | oper | :tp:sfc:g:0001:od:fc:oper |
3 | 4 | 427827 | 640309 | 427827-640309 | 2024-01-01 | 2024-01-01 | 0 days | gh | 700 | pl | NaN | g | 0001 | od | fc | oper | :gh:700:pl:g:0001:od:fc:oper |
4 | 5 | 640309 | 878511 | 640309-878511 | 2024-01-01 | 2024-01-01 | 0 days | r | 850 | pl | NaN | g | 0001 | od | fc | oper | :r:850:pl:g:0001:od:fc:oper |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
78 | 79 | 23834649 | 24380346 | 23834649-24380346 | 2024-01-01 | 2024-01-01 | 0 days | vo | 700 | pl | NaN | g | 0001 | od | fc | oper | :vo:700:pl:g:0001:od:fc:oper |
79 | 80 | 24380346 | 24958955 | 24380346-24958955 | 2024-01-01 | 2024-01-01 | 0 days | vo | 250 | pl | NaN | g | 0001 | od | fc | oper | :vo:250:pl:g:0001:od:fc:oper |
80 | 81 | 24958955 | 25515164 | 24958955-25515164 | 2024-01-01 | 2024-01-01 | 0 days | vo | 200 | pl | NaN | g | 0001 | od | fc | oper | :vo:200:pl:g:0001:od:fc:oper |
81 | 82 | 25515164 | 26090217 | 25515164-26090217 | 2024-01-01 | 2024-01-01 | 0 days | d | 50 | pl | NaN | g | 0001 | od | fc | oper | :d:50:pl:g:0001:od:fc:oper |
82 | 83 | 26090217 | 26639389 | 26090217-26639389 | 2024-01-01 | 2024-01-01 | 0 days | vo | 50 | pl | NaN | g | 0001 | od | fc | oper | :vo:50:pl:g:0001:od:fc:oper |
83 rows Γ 17 columns
[17]:
H.inventory(":500:pl")
[17]:
grib_message | start_byte | end_byte | range | reference_time | valid_time | step | param | levelist | levtype | number | domain | expver | class | type | stream | search_this | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8 | 9 | 1562823 | 1799100 | 1562823-1799100 | 2024-01-01 | 2024-01-01 | 0 days | r | 500 | pl | NaN | g | 0001 | od | fc | oper | :r:500:pl:g:0001:od:fc:oper |
23 | 24 | 5128477 | 5391755 | 5128477-5391755 | 2024-01-01 | 2024-01-01 | 0 days | t | 500 | pl | NaN | g | 0001 | od | fc | oper | :t:500:pl:g:0001:od:fc:oper |
34 | 35 | 7931538 | 8114292 | 7931538-8114292 | 2024-01-01 | 2024-01-01 | 0 days | gh | 500 | pl | NaN | g | 0001 | od | fc | oper | :gh:500:pl:g:0001:od:fc:oper |
51 | 52 | 12740037 | 13041153 | 12740037-13041153 | 2024-01-01 | 2024-01-01 | 0 days | u | 500 | pl | NaN | g | 0001 | od | fc | oper | :u:500:pl:g:0001:od:fc:oper |
52 | 53 | 13041153 | 13355478 | 13041153-13355478 | 2024-01-01 | 2024-01-01 | 0 days | v | 500 | pl | NaN | g | 0001 | od | fc | oper | :v:500:pl:g:0001:od:fc:oper |
56 | 57 | 14281363 | 14611005 | 14281363-14611005 | 2024-01-01 | 2024-01-01 | 0 days | q | 500 | pl | NaN | g | 0001 | od | fc | oper | :q:500:pl:g:0001:od:fc:oper |
68 | 69 | 18693700 | 19269114 | 18693700-19269114 | 2024-01-01 | 2024-01-01 | 0 days | d | 500 | pl | NaN | g | 0001 | od | fc | oper | :d:500:pl:g:0001:od:fc:oper |
75 | 76 | 22142964 | 22686453 | 22142964-22686453 | 2024-01-01 | 2024-01-01 | 0 days | vo | 500 | pl | NaN | g | 0001 | od | fc | oper | :vo:500:pl:g:0001:od:fc:oper |