๐Ÿ“ Pick Points#

Herbieโ€™s pick_points xarray accessor picks data from a model grid nearest locations of interest. Think of it like picking apples from a tree. Herbieโ€™s implementation uses the scikit-learn BallTree algorithm with the haversine formula to get nearest-neighbor values. The haversine formula is important when working with latitude and longitude coordinates.

image.png

For regular latitude-longitude grids (e.g., GFS, IFS), you could use xarrayโ€™s advanced indexing instead to easily select your points of interest, but that solution doesnโ€™t work in the following cases:

  • Models with curvilinear grids (i.e. not a regular latitude-longitude grid);

  • Longitude units, [0, 360) vs [-180, 180), differ between model and requested point.

  • The distance to the nearest neighbor needs to be known.

  • Picking more than one nearest neighbor points.

Picking values at nearest-neighbor points in a curvilinear grid has been one of my longstanding questions. In November 2019, I asked How to select the nearest lat/lon location with multi-dimension coordinates) on Stack Overflow, which has had over 28k views in four years. I suggested a rudimentary solution that determined the nearest neighbor point by finding the absolute minimum value between the point and grid difference, but this solution didnโ€™t scale well for picking many points.

I iterated over different solutions since then.

In the deprecated Herbie accessor ds.herbie.nearest_points, I used MetPyโ€™s assign_y_x, transformed the latitude and longitude points to the coordinate reference system coordinates (e.g. lambert conformal for HRRR), and then picked the nearest neighbor points. That worked fine, but it required a coordinate transformation by Cartopy, and not all xarray Datasets have sufficient information to determine the appropriate coordinate transformation.

This new approach using BallTree, we donโ€™t need to do a coordinate transformation, we can pick the \(k\)-th nearest neighbors, and the distance to the nearest neighbors (haversine formula) is returned. I have also implemented the capability to compute the inverse-distance weighted mean of the four (or more) grid points nearest your points of interest.

As I write this, I learned about the xoak package, which also uses BallTree to query nearest neighbors.


Letโ€™s get startedโ€ฆ

To use Herbieโ€™s xarray accessors, you just need to import anything from Herbie.

[1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr

import warnings

import herbie
from herbie import Herbie, FastHerbie

A Very Simple Demonstration#

First, Iโ€™ll show how Herbie extracts nearest points from a simple grid. Letโ€™s make a 3x3 xarray Dataset with latitude/longitude coordinates and then extract data nearest two points of interest.

[2]:
# Create a 3x3 xarray dataset
ds = xr.Dataset(
    {"a": (["latitude", "longitude"], [[0, 1, 2], [0, 1, 0], [0, 0, 0]])},
    coords={
        "latitude": (["latitude"], [44, 45, 46]),
        "longitude": (["longitude"], [-99, -100, -101]),
    },
)
ds
[2]:
<xarray.Dataset> Size: 120B
Dimensions:    (latitude: 3, longitude: 3)
Coordinates:
  * latitude   (latitude) int64 24B 44 45 46
  * longitude  (longitude) int64 24B -99 -100 -101
Data variables:
    a          (latitude, longitude) int64 72B 0 1 2 0 1 0 0 0 0

The points you want to extract must be given as a Pandas DataFrame with columns latitude and longitude given in degrees.

[3]:
# We want to pick data closest to two points
points = pd.DataFrame(
    {
        "longitude": [-100.25, -99.4],
        "latitude": [44.25, 45.4],
    }
)
points
[3]:
longitude latitude
0 -100.25 44.25
1 -99.40 45.40

Now we can pick the nearest grid points with the custom Herbie xarray accessor.

Note method='nearest' is the default behavior.

[4]:
# Pick the value nearest the requested points
matched = ds.herbie.pick_points(points, method="nearest")
matched
WARNING: Herbie won't cache the BallTree because it
         doesn't know what to name it. Please specify
         `tree_name` to cache the tree for use later.
INFO: ๐ŸŒฑ Growing new BallTree...๐ŸŒณ BallTree grew in 0.007 seconds.
[4]:
<xarray.Dataset> Size: 96B
Dimensions:              (point: 2)
Coordinates:
    latitude             (point) int64 16B 44 45
    longitude            (point) int64 16B -100 -99
    point_grid_distance  (point) float64 16B 34.22 54.41
    point_longitude      (point) float64 16B -100.2 -99.4
    point_latitude       (point) float64 16B 44.25 45.4
Dimensions without coordinates: point
Data variables:
    a                    (point) int64 16B 1 0

Alternatively, we can get the inverse-distance weighted mean of the 4 nearest points.

[5]:
# Distance weighted mean
matched_w = ds.herbie.pick_points(points, method="weighted")
matched_w
WARNING: Herbie won't cache the BallTree because it
         doesn't know what to name it. Please specify
         `tree_name` to cache the tree for use later.
INFO: ๐ŸŒฑ Growing new BallTree...๐ŸŒณ BallTree grew in 0.0065 seconds.
[5]:
<xarray.Dataset> Size: 240B
Dimensions:              (point: 2, k: 4)
Coordinates:
    point_longitude      (point) float64 16B -100.2 -99.4
    point_latitude       (point) float64 16B 44.25 45.4
    latitude             (k, point) int64 64B 44 45 44 45 45 46 45 46
    longitude            (k, point) int64 64B -100 -99 -101 ... -99 -101 -100
    point_grid_distance  (k, point) float64 64B 34.22 54.41 66.0 ... 102.4 81.38
Dimensions without coordinates: point, k
Data variables:
    a                    (point) float64 16B 1.082 0.2588

Thatโ€™s a lot of info to digest. Iโ€™ll help you visualize what we have done with the following figure.

[6]:
fig, axes = plt.subplots(1, 2, figsize=[13, 5])
for p, ax in zip(matched_w.point, axes):
    zz = matched.sel(point=p)

    # Plot grid
    ax.pcolormesh(
        ds.longitude, ds.latitude, ds.a, edgecolor="1", lw=1, cmap="Pastel2_r"
    )
    x, y = np.meshgrid(ds.longitude, ds.latitude)
    x = x.flatten()
    y = y.flatten()
    ax.scatter(x, y, facecolor="tab:red", edgecolor="tab:red", s=100, zorder=100)

    # Plot requested point
    ax.scatter(
        zz.point_longitude,
        zz.point_latitude,
        color="yellow",
        ec="k",
        marker="p",
        s=300,
        zorder=100,
    )
    for i in ds.latitude:
        for j in ds.longitude:
            z = ds.sel(latitude=i, longitude=j)
            ax.text(j, i, f"\na={z.a.item()}", ha="center", va="top")

    # Plot path to nearest point and distance
    # for i, j in zip(x, y):
    #    ax.plot([i, point.longitude.values[0]], [j, point.latitude.values[0]])
    for i in matched_w.k:
        if i == 0:
            kwargs = dict(lw=2, color="k", ls="--")
        else:
            kwargs = dict(color=".5", ls="--")
        z = matched_w.sel(k=i, point=p)
        ax.plot(
            [z.longitude, z.point_longitude], [z.latitude, z.point_latitude], **kwargs
        )
        ax.text(
            z.longitude,
            z.latitude,
            f"  {z.point_grid_distance.item():.1f} km",
            va="center",
            fontweight="bold",
        )
    ax.set_xticks(ds.longitude)
    ax.set_yticks(ds.latitude)
    ax.set_xlabel("Longitude")
    ax.set_ylabel("Latitude")
    ax.set_title(f"Point={p.item()}", loc="left", fontsize="xx-large")
    ax.set_title(
        f"nearest value:  $a={zz.a.item():.2f}$\nweighted mean:  $a={z.a.item():.2f}$",
        loc="right",
    )
../../../_images/user_guide_tutorial_accessor_notebooks_pick_points_11_0.png

The yellow pentagon is the requested point. The gridโ€™s nearest-neighbor point is connected by a thick dashed line. The three other nearest neighbors used to compute the distance-weighted mean are connected by a thin dashed line.

Summary: From a simple 2D Dataset with latitude and longitude coordinates, we got the nearest value for two points of interest. We also compute the inverse-distance weighted mean from the four grids nearest our point of interest.

Pick points from HRRR data#

The above is nice, but you might say, โ€œYeah, but you can already select data from a grid using xarray advanced selection.โ€ That is true, but it does not work for models with curvilienar grids, like the HRRR model.

Here is a demonstration using real HRRR data.

[7]:
H = Herbie("2024-03-01", model="hrrr")
ds = H.xarray(":(?:TMP|DPT):2 m")
ds
โœ… Found โ”Š model=hrrr โ”Š product=sfc โ”Š 2024-Mar-01 00:00 UTC F00 โ”Š GRIB2 @ aws โ”Š IDX @ aws
[7]:
<xarray.Dataset> Size: 46MB
Dimensions:              (y: 1059, x: 1799)
Coordinates:
    time                 datetime64[ns] 8B 2024-03-01
    step                 timedelta64[ns] 8B 00:00:00
    heightAboveGround    float64 8B 2.0
    latitude             (y, x) float64 15MB 21.14 21.15 21.15 ... 47.85 47.84
    longitude            (y, x) float64 15MB 237.3 237.3 237.3 ... 299.0 299.1
    valid_time           datetime64[ns] 8B 2024-03-01
    gribfile_projection  object 8B None
Dimensions without coordinates: y, x
Data variables:
    t2m                  (y, x) float32 8MB 292.5 292.5 292.4 ... 266.8 266.8
    d2m                  (y, x) float32 8MB 287.3 287.2 287.2 ... 262.2 262.2
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    model:                   hrrr
    product:                 sfc
    description:             High-Resolution Rapid Refresh - CONUS
    remote_grib:             https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr....
    local_grib:              /home/blaylock/data/hrrr/20240301/subset_e0ef89f...
    search:            :(?:TMP|DPT):2 m

Letโ€™s extract the data from some points with some fruit-themed station id names.

[8]:
points = pd.DataFrame(
    {
        "latitude": np.linspace(44, 45.5, 5),
        "longitude": np.linspace(-100, -101, 5),
        "stid": ["McIntosh", "Golden", "Fuji", "Gala", "Honeycrisp"],
    }
)
points
[8]:
latitude longitude stid
0 44.000 -100.00 McIntosh
1 44.375 -100.25 Golden
2 44.750 -100.50 Fuji
3 45.125 -100.75 Gala
4 45.500 -101.00 Honeycrisp
[9]:
%%time
matched = ds.herbie.pick_points(points)
matched
INFO: ๐ŸŒฑ Growing new BallTree...๐ŸŒณ BallTree grew in 3.3 seconds.
INFO: Saved BallTree to /home/blaylock/data/BallTree/hrrr_1799-1059.pkl
CPU times: user 3.36 s, sys: 210 ms, total: 3.57 s
Wall time: 3.55 s
[9]:
<xarray.Dataset> Size: 320B
Dimensions:              (point: 5)
Coordinates:
    time                 datetime64[ns] 8B 2024-03-01
    step                 timedelta64[ns] 8B 00:00:00
    heightAboveGround    float64 8B 2.0
    latitude             (point) float64 40B 43.99 44.37 44.76 45.13 45.5
    longitude            (point) float64 40B 260.0 259.8 259.5 259.3 259.0
    valid_time           datetime64[ns] 8B 2024-03-01
    gribfile_projection  object 8B None
    point_grid_distance  (point) float64 40B 0.8515 1.07 1.611 1.41 1.318
    point_latitude       (point) float64 40B 44.0 44.38 44.75 45.12 45.5
    point_longitude      (point) float64 40B -100.0 -100.2 -100.5 -100.8 -101.0
    point_stid           (point) object 40B 'McIntosh' 'Golden' ... 'Honeycrisp'
Dimensions without coordinates: point
Data variables:
    t2m                  (point) float32 20B 288.0 286.7 286.1 282.3 284.8
    d2m                  (point) float32 20B 268.8 271.8 270.5 274.1 270.5
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    model:                   hrrr
    product:                 sfc
    description:             High-Resolution Rapid Refresh - CONUS
    remote_grib:             https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr....
    local_grib:              /home/blaylock/data/hrrr/20240301/subset_e0ef89f...
    search:            :(?:TMP|DPT):2 m

Notice that a BallTree object for the HRRR model was saved with the name <model_name>_<x_dim>_<y_dim>.pkl in the directory you save Herbie data. This is done so it can be loaded more quickly next time you extract points from this grid.

In the next cell I run the same command, and it uses the cached tree instead.

[10]:
%%time
matched = ds.herbie.pick_points(points)
matched
CPU times: user 132 ms, sys: 20.7 ms, total: 153 ms
Wall time: 150 ms
[10]:
<xarray.Dataset> Size: 320B
Dimensions:              (point: 5)
Coordinates:
    time                 datetime64[ns] 8B 2024-03-01
    step                 timedelta64[ns] 8B 00:00:00
    heightAboveGround    float64 8B 2.0
    latitude             (point) float64 40B 43.99 44.37 44.76 45.13 45.5
    longitude            (point) float64 40B 260.0 259.8 259.5 259.3 259.0
    valid_time           datetime64[ns] 8B 2024-03-01
    gribfile_projection  object 8B None
    point_grid_distance  (point) float64 40B 0.8515 1.07 1.611 1.41 1.318
    point_latitude       (point) float64 40B 44.0 44.38 44.75 45.12 45.5
    point_longitude      (point) float64 40B -100.0 -100.2 -100.5 -100.8 -101.0
    point_stid           (point) object 40B 'McIntosh' 'Golden' ... 'Honeycrisp'
Dimensions without coordinates: point
Data variables:
    t2m                  (point) float32 20B 288.0 286.7 286.1 282.3 284.8
    d2m                  (point) float32 20B 268.8 271.8 270.5 274.1 270.5
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    model:                   hrrr
    product:                 sfc
    description:             High-Resolution Rapid Refresh - CONUS
    remote_grib:             https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr....
    local_grib:              /home/blaylock/data/hrrr/20240301/subset_e0ef89f...
    search:            :(?:TMP|DPT):2 m

You may want to swap the dimensions to use the point_stid coordinate as the dimension instead. Doing this makes it possible to select data by station ID instead of point index.

[11]:
matched = matched.swap_dims({"point": "point_stid"})
matched.sel(point_stid="Honeycrisp")
[11]:
<xarray.Dataset> Size: 128B
Dimensions:              ()
Coordinates:
    time                 datetime64[ns] 8B 2024-03-01
    step                 timedelta64[ns] 8B 00:00:00
    heightAboveGround    float64 8B 2.0
    latitude             float64 8B 45.5
    longitude            float64 8B 259.0
    valid_time           datetime64[ns] 8B 2024-03-01
    gribfile_projection  object 8B None
    point_grid_distance  float64 8B 1.318
    point_latitude       float64 8B 45.5
    point_longitude      float64 8B -101.0
    point_stid           <U10 40B 'Honeycrisp'
Data variables:
    t2m                  float32 4B 284.8
    d2m                  float32 4B 270.5
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    model:                   hrrr
    product:                 sfc
    description:             High-Resolution Rapid Refresh - CONUS
    remote_grib:             https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr....
    local_grib:              /home/blaylock/data/hrrr/20240301/subset_e0ef89f...
    search:            :(?:TMP|DPT):2 m

Now letโ€™s plot each point on a map with the value at the nearest grid point.

[12]:
from toolbox import EasyMap, pc, ccrs

ax = EasyMap(crs=ds.herbie.crs).ax
ax.pcolormesh(
    ds.longitude,
    ds.latitude,
    ds.t2m,
    cmap="Spectral_r",
    vmax=290,
    vmin=270,
    transform=pc,
)

for i in matched.point_stid:
    z = matched.sel(point_stid=i)
    ax.scatter(z.longitude, z.latitude, color="k", transform=pc)
    ax.scatter(
        z.point_longitude, z.point_latitude, color="yellow", marker=".", transform=pc
    )
    ax.text(
        z.point_longitude,
        z.point_latitude,
        f" {z.point_stid.item()}\n {z.t2m.item()-273.15:.2f} C",
        transform=pc,
    )
ax.set_extent([-101, -99, 44, 46], crs=pc)
ax.EasyMap.INSET_GLOBE()
ax.adjust_extent()
[12]:
(-289442.889311017, -108684.3334160587, 605320.486730601, 849853.0858732163)
../../../_images/user_guide_tutorial_accessor_notebooks_pick_points_23_1.png

Pick Points: Sounding#

If you want a โ€œsoundingโ€ at a single point, you will need to get the GRIB data for all the model layers of interest (yes, thatโ€™s a lot of data), then pick the point.

[13]:
# TODO: Get read sounding data to compare model data to.
[14]:
H = Herbie("2024-03-28 00:00", model="hrrr", product="prs")
ds = H.xarray("(?:DPT|TMP):[0-9]* mb", remove_grib=False)

# Get HRRR sounding at Salt Lake City
slc = ds.herbie.pick_points(
    pd.DataFrame({"latitude": [40.76], "longitude": [-111.876183]})
)
slc
โœ… Found โ”Š model=hrrr โ”Š product=prs โ”Š 2024-Mar-28 00:00 UTC F00 โ”Š GRIB2 @ aws โ”Š IDX @ aws
[14]:
<xarray.Dataset> Size: 696B
Dimensions:              (isobaricInhPa: 39, point: 1)
Coordinates:
    time                 datetime64[ns] 8B 2024-03-28
    step                 timedelta64[ns] 8B 00:00:00
  * isobaricInhPa        (isobaricInhPa) float64 312B 1e+03 975.0 ... 75.0 50.0
    latitude             (point) float64 8B 40.75
    longitude            (point) float64 8B 248.1
    valid_time           datetime64[ns] 8B 2024-03-28
    gribfile_projection  object 8B None
    point_grid_distance  (point) float64 8B 1.193
    point_latitude       (point) float64 8B 40.76
    point_longitude      (point) float64 8B -111.9
Dimensions without coordinates: point
Data variables:
    t                    (isobaricInhPa, point) float32 156B ...
    dpt                  (isobaricInhPa, point) float32 156B ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    model:                   hrrr
    product:                 prs
    description:             High-Resolution Rapid Refresh - CONUS
    remote_grib:             https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr....
    local_grib:              /home/blaylock/data/hrrr/20240328/subset_0befd2f...
    search:            (?:DPT|TMP):[0-9]* mb
[15]:
# TODO: plot data on a sounding plot using MetPy

ax = plt.gca()
ax.plot(slc.t, slc.isobaricInhPa, color="red", marker=".", label="Temperature")
ax.plot(slc.dpt, slc.isobaricInhPa, color="green", marker=".", label="Dew Point")
ax.invert_yaxis()
ax.set_ylabel("Level (hPa)")
ax.set_ylabel("Temperature (K)")
ax.legend()
[15]:
<matplotlib.legend.Legend at 0x7fedbb2ab6e0>
../../../_images/user_guide_tutorial_accessor_notebooks_pick_points_27_1.png

Pick Points: Timeseries#

[16]:
# TODO: Could demonstrate using FastHerbie here, if the kernel wouldn't crash (memory??)

i = []
for date in pd.date_range("2024-01-01", periods=9, freq="3h"):
    ds = Herbie(date).xarray("(?:DPT|TMP):2 m")
    i.append(
        ds.herbie.pick_points(
            pd.DataFrame(
                {
                    "latitude": [40.77069],
                    "longitude": [-111.96503],
                    "stid": ["KSLC"],
                }
            )
        )
    )
slc_ts = xr.concat(i, dim="valid_time")

slc_ts.t2m.plot(x="valid_time", marker=".", color="tab:red")
slc_ts.d2m.plot(x="valid_time", marker=".", color="tab:green")
โœ… Found โ”Š model=hrrr โ”Š product=sfc โ”Š 2024-Jan-01 00:00 UTC F00 โ”Š GRIB2 @ aws โ”Š IDX @ aws
โœ… Found โ”Š model=hrrr โ”Š product=sfc โ”Š 2024-Jan-01 03:00 UTC F00 โ”Š GRIB2 @ aws โ”Š IDX @ aws
โœ… Found โ”Š model=hrrr โ”Š product=sfc โ”Š 2024-Jan-01 06:00 UTC F00 โ”Š GRIB2 @ aws โ”Š IDX @ aws
โœ… Found โ”Š model=hrrr โ”Š product=sfc โ”Š 2024-Jan-01 09:00 UTC F00 โ”Š GRIB2 @ aws โ”Š IDX @ aws
โœ… Found โ”Š model=hrrr โ”Š product=sfc โ”Š 2024-Jan-01 12:00 UTC F00 โ”Š GRIB2 @ aws โ”Š IDX @ aws
โœ… Found โ”Š model=hrrr โ”Š product=sfc โ”Š 2024-Jan-01 15:00 UTC F00 โ”Š GRIB2 @ aws โ”Š IDX @ aws
โœ… Found โ”Š model=hrrr โ”Š product=sfc โ”Š 2024-Jan-01 18:00 UTC F00 โ”Š GRIB2 @ aws โ”Š IDX @ aws
โœ… Found โ”Š model=hrrr โ”Š product=sfc โ”Š 2024-Jan-01 21:00 UTC F00 โ”Š GRIB2 @ aws โ”Š IDX @ aws
โœ… Found โ”Š model=hrrr โ”Š product=sfc โ”Š 2024-Jan-02 00:00 UTC F00 โ”Š GRIB2 @ aws โ”Š IDX @ aws
๐Ÿ‘จ๐Ÿปโ€๐Ÿญ Created directory: [/home/blaylock/data/hrrr/20240102]
[16]:
[<matplotlib.lines.Line2D at 0x7fedbafadb80>]
../../../_images/user_guide_tutorial_accessor_notebooks_pick_points_29_2.png

Benchmark#

Letโ€™s see how fast ds.herbie.pick_points is for harvesting many pointsโ€ฆ

[17]:
H = Herbie("2024-03-28 00:00", model="hrrr")
ds = H.xarray(r"TMP:\d* mb", remove_grib=False)
โœ… Found โ”Š model=hrrr โ”Š product=sfc โ”Š 2024-Mar-28 00:00 UTC F00 โ”Š GRIB2 @ aws โ”Š IDX @ aws

Using the modelโ€™s own grid, I will generate 100 random samples to extract.

[18]:
def generate_random_string(len=8):
    """Generate a random string."""
    import random
    import string

    return "".join(random.choices(string.ascii_letters + string.digits, k=len))


n = 100
points_self = (
    ds[["latitude", "longitude"]]
    .to_dataframe()[["latitude", "longitude"]]
    .sample(n)
    .reset_index(drop=True)
)
points_self["stid"] = [generate_random_string() for _ in range(n)]
points_self
[18]:
latitude longitude stid
0 28.118041 251.508694 A4Bt4ArI
1 45.433523 236.549102 M8TVLmnW
2 29.326794 290.301011 EFnRTuds
3 48.066147 293.088237 LGzLiHaU
4 32.020415 247.694797 tzABimIV
... ... ... ...
95 44.124212 279.341471 zA7MhOEm
96 44.208684 270.780126 OSrB5uVX
97 30.286338 254.814174 e5xYHMth
98 38.231154 256.410611 ZwVijXzk
99 42.481527 268.409357 PsiEP6f0

100 rows ร— 3 columns

[19]:
%%timeit
y1 = ds.herbie.pick_points(points_self)
139 ms ยฑ 43.8 ms per loop (mean ยฑ std. dev. of 7 runs, 1 loop each)
[20]:
%%timeit
y2 = ds.herbie.pick_points(points_self, method="weighted")
1.94 s ยฑ 383 ms per loop (mean ยฑ std. dev. of 7 runs, 1 loop each)
[21]:
%%timeit
# Try the deprecated `nearest_points` method which used MetPy
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    y3 = ds.herbie.nearest_points(points_self)
681 ms ยฑ 47.1 ms per loop (mean ยฑ std. dev. of 7 runs, 1 loop each)
[22]:
# Do I get the same answer for the old and new method?
y1 = ds.herbie.pick_points(points_self)
y2 = ds.herbie.pick_points(points_self, method="weighted")
y3 = ds.herbie.nearest_points(points_self)

all(y1.latitude == y3.latitude), all(y1.longitude == y3.longitude)
/tmp/ipykernel_1317/3015965854.py:4: DeprecationWarning: The accessor `ds.herbie.nearest_points` is deprecated in favor of the `ds.herbie.pick_points` which uses the BallTree algorithm instead.
  y3 = ds.herbie.nearest_points(points_self)
/home/blaylock/GITHUB/Herbie/herbie/accessors.py:524: UserWarning: More than one time coordinate present for variable  "t".
  crs = ds.metpy_crs.item().to_cartopy()
[22]:
(True, True)
[23]:
ax = EasyMap(crs=ds.herbie.crs).ax

ax.pcolormesh(
    ds.longitude,
    ds.latitude,
    ds.isel(isobaricInhPa=0).t,
    cmap="Spectral_r",
    transform=pc,
)

ax.scatter(y1.longitude, y1.latitude, color="k", transform=pc)
ax.scatter(
    y1.point_longitude, y1.point_latitude, color="yellow", marker=".", transform=pc
)

ax.EasyMap.INSET_GLOBE()
[23]:
<GeoAxes: >
../../../_images/user_guide_tutorial_accessor_notebooks_pick_points_38_1.png

Not bad. Less than half a second to extract 100 points from the HRRR grid.

Now letโ€™s try even moreโ€ฆ

[32]:
n_samples = (
    [1, 2, 3, 4, 5, 6, 7, 8, 9]
    + list(range(10, 100, 10))
    + list(range(100, 1_000, 100))
    + list(range(1_000, 10_000, 1_000))
    + [100_000, 1_000_000]
)

with warnings.catch_warnings():
    warnings.simplefilter("ignore")

    # Timers for nearest_points (deprecated)
    times_np = []
    samples_np = []

    # Timers for pick_points method='nearest'
    times_pp = []
    samples_pp = []

    # Timers for pick_points method='weighted'
    times_w = []
    samples_w = []

    for s in n_samples:
        samples_np.append(s)
        timer = pd.Timestamp("now")
        y1 = ds.herbie.nearest_points(points_self)
        times_np.append((pd.Timestamp("now") - timer).total_seconds())

        samples_pp.append(s)
        timer = pd.Timestamp("now")
        y1 = ds.herbie.pick_points(points_self, method="nearest")
        times_pp.append((pd.Timestamp("now") - timer).total_seconds())

        samples_w.append(s)
        timer = pd.Timestamp("now")
        y1 = ds.herbie.pick_points(points_self, method="weighted")
        times_w.append((pd.Timestamp("now") - timer).total_seconds())


plt.figure(figsize=[8, 4])
plt.plot(
    samples_pp,
    times_pp,
    marker=".",
    lw=2,
    color="tab:blue",
    label="pick_points (nearest)",
)
plt.plot(
    samples_w,
    times_w,
    marker=".",
    lw=2,
    color="tab:orange",
    label="pick_points (weighted)",
)
plt.plot(
    samples_np,
    times_np,
    marker=".",
    lw=1,
    color="k",
    ls="--",
    label="nearest_points (deprecated)",
)

plt.title("Herbie methods to get data at a point")
plt.xlabel("Number of points picked")
plt.ylabel("Query Time (s)")
plt.ylim(ymin=0)
plt.xlim(xmin=1)
plt.legend(loc="upper center", bbox_to_anchor=[0.5, -0.2])
plt.gca().set_xscale("log")
../../../_images/user_guide_tutorial_accessor_notebooks_pick_points_40_0.png

Nice! This method scales well for harvesting many points ๐Ÿ˜Ž

Lets see how well this works for the GFS modelโ€ฆ

[25]:
gfs = Herbie("2024-01-01", model="gfs").xarray(":TMP:2 m ")
โœ… Found โ”Š model=gfs โ”Š product=pgrb2.0p25 โ”Š 2024-Jan-01 00:00 UTC F00 โ”Š GRIB2 @ aws โ”Š IDX @ aws
[26]:
%%time
gfs.herbie.pick_points(points_self)
INFO: ๐ŸŒฑ Growing new BallTree...๐ŸŒณ BallTree grew in 1.7 seconds.
INFO: Saved BallTree to /home/blaylock/data/BallTree/gfs_1440-721.pkl
CPU times: user 1.8 s, sys: 100 ms, total: 1.9 s
Wall time: 1.89 s
[26]:
<xarray.Dataset> Size: 5kB
Dimensions:              (point: 100)
Coordinates:
    time                 datetime64[ns] 8B 2024-01-01
    step                 timedelta64[ns] 8B 00:00:00
    heightAboveGround    float64 8B 2.0
    latitude             (point) float64 800B 28.0 45.5 29.25 ... 38.25 42.5
    longitude            (point) float64 800B 251.5 236.5 290.2 ... 256.5 268.5
    valid_time           datetime64[ns] 8B 2024-01-01
    gribfile_projection  object 8B None
    point_grid_distance  (point) float64 800B 13.15 8.325 9.869 ... 8.083 7.711
    point_latitude       (point) float64 800B 28.12 45.43 29.33 ... 38.23 42.48
    point_longitude      (point) float64 800B 251.5 236.5 290.3 ... 256.4 268.4
    point_stid           (point) object 800B 'A4Bt4ArI' ... 'PsiEP6f0'
Dimensions without coordinates: point
Data variables:
    t2m                  (point) float32 400B 287.8 277.5 293.1 ... 276.3 270.4
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    model:                   gfs
    product:                 pgrb2.0p25
    description:             Global Forecast System
    remote_grib:             https://noaa-gfs-bdp-pds.s3.amazonaws.com/gfs.20...
    local_grib:              /home/blaylock/data/gfs/20240101/subset_6bef22b0...
    search:            :TMP:2 m 

Advanced Options#

The default behavior for method='nearest' is to return the first nearest neighbor. The default behavior for method='weighted' is to return the inverse-distance weighted mean of the nearest 4 grid points.

You can get more or fewer neighbors by setting the k argument. This might be useful if you want to compute the standard deviation of the k grid points surrounding a point of interest.

[27]:
%%time
# Return the 5 nearest grid points to each request point
ds.herbie.pick_points(points_self, method="nearest", k=5)
CPU times: user 1.46 s, sys: 216 ms, total: 1.68 s
Wall time: 1.67 s
[27]:
<xarray.Dataset> Size: 24kB
Dimensions:              (k: 5, isobaricInhPa: 5, point: 100)
Coordinates:
    time                 datetime64[ns] 8B 2024-03-28
    step                 timedelta64[ns] 8B 00:00:00
  * isobaricInhPa        (isobaricInhPa) float64 40B 1e+03 925.0 ... 700.0 500.0
    latitude             (k, point) float64 4kB 28.12 45.43 ... 38.26 42.45
    longitude            (k, point) float64 4kB 251.5 236.5 ... 256.4 268.4
    valid_time           datetime64[ns] 8B 2024-03-28
    gribfile_projection  object 8B None
    point_grid_distance  (k, point) float64 4kB 0.0 0.0 0.0 ... 2.97 3.0 2.993
    point_latitude       (point) float64 800B 28.12 45.43 29.33 ... 38.23 42.48
    point_longitude      (point) float64 800B 251.5 236.5 290.3 ... 256.4 268.4
    point_stid           (point) object 800B 'A4Bt4ArI' ... 'PsiEP6f0'
Dimensions without coordinates: k, point
Data variables:
    t                    (k, isobaricInhPa, point) float32 10kB 304.9 ... 244.0
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    model:                   hrrr
    product:                 sfc
    description:             High-Resolution Rapid Refresh - CONUS
    remote_grib:             https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr....
    local_grib:              /home/blaylock/data/hrrr/20240328/subset_0befe27...
    search:            TMP:\d* mb
[28]:
%%time
# Compute the distance weighted mean for the 9 nearest grid points to
# each request point
ds.herbie.pick_points(points_self, method="weighted", k=9)
CPU times: user 2.78 s, sys: 97.1 ms, total: 2.88 s
Wall time: 2.87 s
[28]:
<xarray.Dataset> Size: 28kB
Dimensions:              (isobaricInhPa: 5, point: 100, k: 9)
Coordinates:
    time                 datetime64[ns] 8B 2024-03-28
    step                 timedelta64[ns] 8B 00:00:00
  * isobaricInhPa        (isobaricInhPa) float64 40B 1e+03 925.0 ... 700.0 500.0
    valid_time           datetime64[ns] 8B 2024-03-28
    gribfile_projection  object 8B None
    point_latitude       (point) float64 800B 28.12 45.43 29.33 ... 38.23 42.48
    point_longitude      (point) float64 800B 251.5 236.5 290.3 ... 256.4 268.4
    point_stid           (point) object 800B 'A4Bt4ArI' ... 'PsiEP6f0'
    latitude             (k, point) float64 7kB 28.12 45.43 ... 38.26 42.45
    longitude            (k, point) float64 7kB 251.5 236.5 ... 256.4 268.4
    point_grid_distance  (k, point) float64 7kB 0.0 0.0 0.0 ... 4.242 4.232
Dimensions without coordinates: point, k
Data variables:
    t                    (isobaricInhPa, point) float64 4kB 304.9 ... 244.0

Suppose I wanted to get the standard deviation of model surface height around each of my points.

[29]:
ds = Herbie("2024-01-01").xarray("HGT:surface")
dsp = ds.herbie.pick_points(points_self, k=100)
dsp
โœ… Found โ”Š model=hrrr โ”Š product=sfc โ”Š 2024-Jan-01 00:00 UTC F00 โ”Š GRIB2 @ aws โ”Š IDX @ aws
[29]:
<xarray.Dataset> Size: 282kB
Dimensions:              (k: 100, point: 100)
Coordinates:
    time                 datetime64[ns] 8B 2024-01-01
    step                 timedelta64[ns] 8B 00:00:00
    surface              float64 8B 0.0
    latitude             (k, point) float64 80kB 28.12 45.43 ... 38.33 42.38
    longitude            (k, point) float64 80kB 251.5 236.5 ... 256.3 268.3
    valid_time           datetime64[ns] 8B 2024-01-01
    gribfile_projection  object 8B None
    point_grid_distance  (k, point) float64 80kB 0.0 0.0 0.0 ... 16.97 16.93
    point_latitude       (point) float64 800B 28.12 45.43 29.33 ... 38.23 42.48
    point_longitude      (point) float64 800B 251.5 236.5 290.3 ... 256.4 268.4
    point_stid           (point) object 800B 'A4Bt4ArI' ... 'PsiEP6f0'
Dimensions without coordinates: k, point
Data variables:
    orog                 (k, point) float32 40kB 955.9 597.6 ... 1.361e+03 282.0
Attributes:
    GRIB_edition:            2
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    model:                   hrrr
    product:                 sfc
    description:             High-Resolution Rapid Refresh - CONUS
    remote_grib:             https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr....
    local_grib:              /home/blaylock/data/hrrr/20240101/subset_6bef832...
    search:            HGT:surface
[30]:
z = dsp.orog.std(dim="k")
z
[30]:
<xarray.DataArray 'orog' (point: 100)> Size: 400B
array([3.57974121e+02, 1.69752808e+02, 0.00000000e+00, 1.00506348e+02,
       6.67194519e+01, 1.44643211e+01, 1.72897491e+01, 4.49295654e+01,
       5.20242023e+00, 1.31510179e-02, 2.03198242e+01, 0.00000000e+00,
       5.40655518e+01, 2.79081287e+01, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 1.36973648e+01, 5.59680462e-02,
       2.10313797e+01, 6.57881317e+01, 0.00000000e+00, 1.84571640e+02,
       1.35059387e+02, 4.05825577e+01, 3.62122345e+01, 7.16302872e+01,
       1.69172256e+02, 0.00000000e+00, 2.00776978e+01, 0.00000000e+00,
       2.38120499e+01, 4.83460712e+00, 4.66342430e+01, 2.26004620e+01,
       6.58044434e+00, 2.43521683e+02, 5.29580927e+00, 9.35856552e+01,
       1.76103989e+02, 2.67501984e+01, 0.00000000e+00, 6.42464294e+01,
       7.21421127e+01, 2.53476772e+01, 1.55120902e-02, 1.96815292e+02,
       0.00000000e+00, 1.62173843e+01, 3.15807373e+02, 1.09912872e+01,
       9.66130905e+01, 1.96222886e-01, 0.00000000e+00, 9.15527344e-05,
       0.00000000e+00, 5.75467110e+01, 1.22795906e+02, 0.00000000e+00,
       1.90995293e+01, 9.26871872e+01, 0.00000000e+00, 4.46593895e+01,
       7.39437256e+01, 0.00000000e+00, 2.01596813e+01, 1.12923744e+02,
       2.73812828e+01, 0.00000000e+00, 4.54793930e+01, 1.94310131e+01,
       1.56630157e+02, 4.82573051e+01, 2.60957432e+01, 1.03776340e+01,
       3.65887360e+02, 0.00000000e+00, 4.98398705e+01, 3.41156349e+01,
       4.36897621e+01, 0.00000000e+00, 4.71860838e+00, 5.85202408e+01,
       0.00000000e+00, 2.07875748e+01, 0.00000000e+00, 0.00000000e+00,
       4.01632398e-01, 4.51303040e+02, 1.41894979e+01, 0.00000000e+00,
       2.05594254e+02, 1.38799706e+01, 0.00000000e+00, 4.10455780e+01,
       3.56380310e+01, 7.39399796e+01, 3.51430511e+01, 1.29698896e+01],
      dtype=float32)
Coordinates:
    time                 datetime64[ns] 8B 2024-01-01
    step                 timedelta64[ns] 8B 00:00:00
    surface              float64 8B 0.0
    valid_time           datetime64[ns] 8B 2024-01-01
    gribfile_projection  object 8B None
    point_latitude       (point) float64 800B 28.12 45.43 29.33 ... 38.23 42.48
    point_longitude      (point) float64 800B 251.5 236.5 290.3 ... 256.4 268.4
    point_stid           (point) object 800B 'A4Bt4ArI' ... 'PsiEP6f0'
Dimensions without coordinates: point
[31]:
ax = EasyMap().ax
art = ax.scatter(z.point_longitude, z.point_latitude, c=z, cmap="winter")
plt.colorbar(
    art,
    ax=ax,
    label="Standard deviation of model elevation surrounding location",
    orientation="horizontal",
    pad=0.01,
)
[31]:
<matplotlib.colorbar.Colorbar at 0x7fedb1142000>
../../../_images/user_guide_tutorial_accessor_notebooks_pick_points_50_1.png