πŸ“ƒ Inventory#

Ideally, GRIB2 files have a companion β€œindex” file that are a plain ASCII text file that provides some details about the contents of each file (i.e., what each GRIB message contains). These files can tell you the variable represented in the GRIB message, the level, forecast lead time, and the starting byte range in the file.

There are two β€œflavors” of index files, wgrib-style and eccodes-style.

NCEP models provide the wgrib-style index files while ECMWF models provide the eccodes-style index file.

Herbie provides a parser to read the index file into a Pandas DataFrame and calls it the file’s inventory.

Let’s start by looking at the inventory for a HRRR file.

[1]:
from herbie import Herbie
[2]:
H = Herbie("2024-01-01", model="hrrr")
H
βœ… Found β”Š model=hrrr β”Š product=sfc β”Š 2024-Jan-01 00:00 UTC F00 β”Š GRIB2 @ aws β”Š IDX @ aws
[2]:
β–Œβ–ŒHerbie HRRR model sfc product initialized 2024-Jan-01 00:00 UTC F00 β”Š source=aws

The path of the relevant index file is given by H.idx. You can go to that URL and see what the raw index file looks like.

[4]:
H.idx
[4]:
'https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20240101/conus/hrrr.t00z.wrfsfcf00.grib2.idx'

Herbie parses the raw index file as a Pandas DataFrame using H.inventory()

[3]:
H.inventory()
[3]:
grib_message start_byte end_byte range reference_time valid_time variable level forecast_time search_this
0 1 0 202809.0 0-202809 2024-01-01 2024-01-01 REFC entire atmosphere anl :REFC:entire atmosphere:anl
1 2 202810 246792.0 202810-246792 2024-01-01 2024-01-01 RETOP cloud top anl :RETOP:cloud top:anl
2 3 246793 496145.0 246793-496145 2024-01-01 2024-01-01 var discipline=0 center=7 local_table=1 parmca... entire atmosphere anl :var discipline=0 center=7 local_table=1 parmc...
3 4 496146 649032.0 496146-649032 2024-01-01 2024-01-01 VIL entire atmosphere anl :VIL:entire atmosphere:anl
4 5 649033 2038336.0 649033-2038336 2024-01-01 2024-01-01 VIS surface anl :VIS:surface:anl
... ... ... ... ... ... ... ... ... ... ...
165 166 126776108 126785469.0 126776108-126785469 2024-01-01 2024-01-01 ICEC surface anl :ICEC:surface:anl
166 167 126785470 128189723.0 126785470-128189723 2024-01-01 2024-01-01 SBT123 top of atmosphere anl :SBT123:top of atmosphere:anl
167 168 128189724 130514441.0 128189724-130514441 2024-01-01 2024-01-01 SBT124 top of atmosphere anl :SBT124:top of atmosphere:anl
168 169 130514442 131785130.0 130514442-131785130 2024-01-01 2024-01-01 SBT113 top of atmosphere anl :SBT113:top of atmosphere:anl
169 170 131785131 NaN 131785131- 2024-01-01 2024-01-01 SBT114 top of atmosphere anl :SBT114:top of atmosphere:anl

170 rows Γ— 10 columns

Notice the search_this column; that is a column that Herbie can do regular expression searches to filter the GRIB messages you want. For example, if you want all the variables at 500 mb…

[9]:
H.inventory(":500 mb")
[9]:
grib_message start_byte end_byte range reference_time valid_time variable level forecast_time search_this
13 14 6299332 7003497.0 6299332-7003497 2024-01-01 2024-01-01 HGT 500 mb anl :HGT:500 mb:anl
14 15 7003498 7550668.0 7003498-7550668 2024-01-01 2024-01-01 TMP 500 mb anl :TMP:500 mb:anl
15 16 7550669 8417238.0 7550669-8417238 2024-01-01 2024-01-01 DPT 500 mb anl :DPT:500 mb:anl
16 17 8417239 8997799.0 8417239-8997799 2024-01-01 2024-01-01 UGRD 500 mb anl :UGRD:500 mb:anl
17 18 8997800 9584981.0 8997800-9584981 2024-01-01 2024-01-01 VGRD 500 mb anl :VGRD:500 mb:anl

Notice that only the rows that contain 500 mb are selected. This is useful when you want to download a subset of variables from the GRIB file. Notice the range column which tells us the byte range of each variable in the file. Herbie will use this byte range when you request downloading only the selected variables or opening it in xarray.

[11]:
H.download(":500 mb", verbose=True, overwrite=True)
πŸ“‡ Download subset: β–Œβ–ŒHerbie HRRR model sfc product initialized 2024-Jan-01 00:00 UTC F00 β”Š source=aws
 cURL from https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20240101/conus/hrrr.t00z.wrfsfcf00.grib2
Found 5 grib messages.
Download subset group 1
  14  :HGT:500 mb:anl
  15  :TMP:500 mb:anl
  16  :DPT:500 mb:anl
  17  :UGRD:500 mb:anl
  18  :VGRD:500 mb:anl
curl -s --range 6299332-9584981 "https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20240101/conus/hrrr.t00z.wrfsfcf00.grib2" > "/home/blaylock/data/hrrr/20240101/subset_6befbe61__hrrr.t00z.wrfsfcf00.grib2"
πŸ’Ύ Saved the subset to /home/blaylock/data/hrrr/20240101/subset_6befbe61__hrrr.t00z.wrfsfcf00.grib2
[11]:
PosixPath('/home/blaylock/data/hrrr/20240101/subset_6befbe61__hrrr.t00z.wrfsfcf00.grib2')
[13]:
H.xarray(":500 mb")
/home/blaylock/GITHUB/Herbie/herbie/core.py:1088: UserWarning: Will not remove GRIB file because it previously existed.
  warnings.warn("Will not remove GRIB file because it previously existed.")
[13]:
<xarray.Dataset>
Dimensions:              (y: 1059, x: 1799)
Coordinates:
    time                 datetime64[ns] 2024-01-01
    step                 timedelta64[ns] 00:00:00
    isobaricInhPa        float64 500.0
    latitude             (y, x) float64 21.14 21.15 21.15 ... 47.86 47.85 47.84
    longitude            (y, x) float64 237.3 237.3 237.3 ... 299.0 299.0 299.1
    valid_time           datetime64[ns] 2024-01-01
Dimensions without coordinates: y, x
Data variables:
    t                    (y, x) float32 ...
    u                    (y, x) float32 ...
    v                    (y, x) float32 ...
    gh                   (y, x) float32 ...
    dpt                  (y, x) float32 ...
    gribfile_projection  object None
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    model:                   hrrr
    product:                 sfc
    description:             High-Resolution Rapid Refresh - CONUS
    remote_grib:             https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr....
    local_grib:              /home/blaylock/data/hrrr/20240101/subset_6befbe6...
    search:            :500 mb

More examples of valid regular expressions are found in the Herbie Docs: Subset with search. Using H.inventory(search) is an effective way to test differeng regex patterns to get the variables you are interested in downloading.

The eccodes-style index files work the same way, expect the regex for selecting variable names and levels will be different. Here is the ECMWF forecast inventory file.

[14]:
H = Herbie("2024-01-01", model="ecmwf")
H.idx
βœ… Found β”Š model=ecmwf β”Š product=oper β”Š 2024-Jan-01 00:00 UTC F00 β”Š GRIB2 @ azure β”Š IDX @ azure
[14]:
'https://ai4edataeuwest.blob.core.windows.net/ecmwf/20240101/00z/0p4-beta/oper/20240101000000-0h-oper-fc.index'
[15]:
H.inventory()
[15]:
grib_message start_byte end_byte range reference_time valid_time step param levelist levtype number domain expver class type stream search_this
0 1 0 205483 0-205483 2024-01-01 2024-01-01 0 days gh 250 pl NaN g 0001 od fc oper :gh:250:pl:g:0001:od:fc:oper
1 2 205483 427603 205483-427603 2024-01-01 2024-01-01 0 days gh 925 pl NaN g 0001 od fc oper :gh:925:pl:g:0001:od:fc:oper
2 3 427603 427827 427603-427827 2024-01-01 2024-01-01 0 days tp NaN sfc NaN g 0001 od fc oper :tp:sfc:g:0001:od:fc:oper
3 4 427827 640309 427827-640309 2024-01-01 2024-01-01 0 days gh 700 pl NaN g 0001 od fc oper :gh:700:pl:g:0001:od:fc:oper
4 5 640309 878511 640309-878511 2024-01-01 2024-01-01 0 days r 850 pl NaN g 0001 od fc oper :r:850:pl:g:0001:od:fc:oper
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
78 79 23834649 24380346 23834649-24380346 2024-01-01 2024-01-01 0 days vo 700 pl NaN g 0001 od fc oper :vo:700:pl:g:0001:od:fc:oper
79 80 24380346 24958955 24380346-24958955 2024-01-01 2024-01-01 0 days vo 250 pl NaN g 0001 od fc oper :vo:250:pl:g:0001:od:fc:oper
80 81 24958955 25515164 24958955-25515164 2024-01-01 2024-01-01 0 days vo 200 pl NaN g 0001 od fc oper :vo:200:pl:g:0001:od:fc:oper
81 82 25515164 26090217 25515164-26090217 2024-01-01 2024-01-01 0 days d 50 pl NaN g 0001 od fc oper :d:50:pl:g:0001:od:fc:oper
82 83 26090217 26639389 26090217-26639389 2024-01-01 2024-01-01 0 days vo 50 pl NaN g 0001 od fc oper :vo:50:pl:g:0001:od:fc:oper

83 rows Γ— 17 columns

[17]:
H.inventory(":500:pl")
[17]:
grib_message start_byte end_byte range reference_time valid_time step param levelist levtype number domain expver class type stream search_this
8 9 1562823 1799100 1562823-1799100 2024-01-01 2024-01-01 0 days r 500 pl NaN g 0001 od fc oper :r:500:pl:g:0001:od:fc:oper
23 24 5128477 5391755 5128477-5391755 2024-01-01 2024-01-01 0 days t 500 pl NaN g 0001 od fc oper :t:500:pl:g:0001:od:fc:oper
34 35 7931538 8114292 7931538-8114292 2024-01-01 2024-01-01 0 days gh 500 pl NaN g 0001 od fc oper :gh:500:pl:g:0001:od:fc:oper
51 52 12740037 13041153 12740037-13041153 2024-01-01 2024-01-01 0 days u 500 pl NaN g 0001 od fc oper :u:500:pl:g:0001:od:fc:oper
52 53 13041153 13355478 13041153-13355478 2024-01-01 2024-01-01 0 days v 500 pl NaN g 0001 od fc oper :v:500:pl:g:0001:od:fc:oper
56 57 14281363 14611005 14281363-14611005 2024-01-01 2024-01-01 0 days q 500 pl NaN g 0001 od fc oper :q:500:pl:g:0001:od:fc:oper
68 69 18693700 19269114 18693700-19269114 2024-01-01 2024-01-01 0 days d 500 pl NaN g 0001 od fc oper :d:500:pl:g:0001:od:fc:oper
75 76 22142964 22686453 22142964-22686453 2024-01-01 2024-01-01 0 days vo 500 pl NaN g 0001 od fc oper :vo:500:pl:g:0001:od:fc:oper