๐ Pick Points#
Herbieโs pick_points
xarray accessor picks data from a model grid nearest locations of interest. Think of it like picking apples from a tree. Herbieโs implementation uses the scikit-learn BallTree algorithm with the haversine formula to get nearest-neighbor values. The haversine formula is important when working with latitude and longitude coordinates.
For regular latitude-longitude grids (e.g., GFS, IFS), you could use xarrayโs advanced indexing instead to easily select your points of interest, but that solution doesnโt work in the following cases:
Models with curvilinear grids (i.e. not a regular latitude-longitude grid);
Longitude units,
[0, 360)
vs[-180, 180)
, differ between model and requested point.The distance to the nearest neighbor needs to be known.
Picking more than one nearest neighbor points.
Picking values at nearest-neighbor points in a curvilinear grid has been one of my longstanding questions. In November 2019, I asked How to select the nearest lat/lon location with multi-dimension coordinates) on Stack Overflow, which has had over 28k views in four years. I suggested a rudimentary solution that determined the nearest neighbor point by finding the absolute minimum value between the point and grid difference, but this solution didnโt scale well for picking many points.
I iterated over different solutions since then.
In the deprecated Herbie accessor ds.herbie.nearest_points
, I used MetPyโs assign_y_x, transformed the latitude and longitude points to the coordinate reference system coordinates (e.g. lambert conformal for HRRR), and then picked the nearest neighbor points. That worked fine, but it required a coordinate transformation by Cartopy, and not all xarray Datasets have sufficient information to
determine the appropriate coordinate transformation.
This new approach using BallTree, we donโt need to do a coordinate transformation, we can pick the \(k\)-th nearest neighbors, and the distance to the nearest neighbors (haversine formula) is returned. I have also implemented the capability to compute the inverse-distance weighted mean of the four (or more) grid points nearest your points of interest.
As I write this, I learned about the xoak package, which also uses BallTree to query nearest neighbors.
Letโs get startedโฆ
To use Herbieโs xarray accessors, you just need to import anything from Herbie.
[1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import warnings
import herbie
from herbie import Herbie, FastHerbie
A Very Simple Demonstration#
First, Iโll show how Herbie extracts nearest points from a simple grid. Letโs make a 3x3 xarray Dataset with latitude/longitude coordinates and then extract data nearest two points of interest.
[2]:
# Create a 3x3 xarray dataset
ds = xr.Dataset(
{"a": (["latitude", "longitude"], [[0, 1, 2], [0, 1, 0], [0, 0, 0]])},
coords={
"latitude": (["latitude"], [44, 45, 46]),
"longitude": (["longitude"], [-99, -100, -101]),
},
)
ds
[2]:
<xarray.Dataset> Size: 120B Dimensions: (latitude: 3, longitude: 3) Coordinates: * latitude (latitude) int64 24B 44 45 46 * longitude (longitude) int64 24B -99 -100 -101 Data variables: a (latitude, longitude) int64 72B 0 1 2 0 1 0 0 0 0
The points you want to extract must be given as a Pandas DataFrame with columns latitude
and longitude
given in degrees.
[3]:
# We want to pick data closest to two points
points = pd.DataFrame(
{
"longitude": [-100.25, -99.4],
"latitude": [44.25, 45.4],
}
)
points
[3]:
longitude | latitude | |
---|---|---|
0 | -100.25 | 44.25 |
1 | -99.40 | 45.40 |
Now we can pick the nearest grid points with the custom Herbie xarray accessor.
Note
method='nearest'
is the default behavior.
[4]:
# Pick the value nearest the requested points
matched = ds.herbie.pick_points(points, method="nearest")
matched
WARNING: Herbie won't cache the BallTree because it
doesn't know what to name it. Please specify
`tree_name` to cache the tree for use later.
INFO: ๐ฑ Growing new BallTree...๐ณ BallTree grew in 0.007 seconds.
[4]:
<xarray.Dataset> Size: 96B Dimensions: (point: 2) Coordinates: latitude (point) int64 16B 44 45 longitude (point) int64 16B -100 -99 point_grid_distance (point) float64 16B 34.22 54.41 point_longitude (point) float64 16B -100.2 -99.4 point_latitude (point) float64 16B 44.25 45.4 Dimensions without coordinates: point Data variables: a (point) int64 16B 1 0
Alternatively, we can get the inverse-distance weighted mean of the 4 nearest points.
[5]:
# Distance weighted mean
matched_w = ds.herbie.pick_points(points, method="weighted")
matched_w
WARNING: Herbie won't cache the BallTree because it
doesn't know what to name it. Please specify
`tree_name` to cache the tree for use later.
INFO: ๐ฑ Growing new BallTree...๐ณ BallTree grew in 0.0065 seconds.
[5]:
<xarray.Dataset> Size: 240B Dimensions: (point: 2, k: 4) Coordinates: point_longitude (point) float64 16B -100.2 -99.4 point_latitude (point) float64 16B 44.25 45.4 latitude (k, point) int64 64B 44 45 44 45 45 46 45 46 longitude (k, point) int64 64B -100 -99 -101 ... -99 -101 -100 point_grid_distance (k, point) float64 64B 34.22 54.41 66.0 ... 102.4 81.38 Dimensions without coordinates: point, k Data variables: a (point) float64 16B 1.082 0.2588
Thatโs a lot of info to digest. Iโll help you visualize what we have done with the following figure.
[6]:
fig, axes = plt.subplots(1, 2, figsize=[13, 5])
for p, ax in zip(matched_w.point, axes):
zz = matched.sel(point=p)
# Plot grid
ax.pcolormesh(
ds.longitude, ds.latitude, ds.a, edgecolor="1", lw=1, cmap="Pastel2_r"
)
x, y = np.meshgrid(ds.longitude, ds.latitude)
x = x.flatten()
y = y.flatten()
ax.scatter(x, y, facecolor="tab:red", edgecolor="tab:red", s=100, zorder=100)
# Plot requested point
ax.scatter(
zz.point_longitude,
zz.point_latitude,
color="yellow",
ec="k",
marker="p",
s=300,
zorder=100,
)
for i in ds.latitude:
for j in ds.longitude:
z = ds.sel(latitude=i, longitude=j)
ax.text(j, i, f"\na={z.a.item()}", ha="center", va="top")
# Plot path to nearest point and distance
# for i, j in zip(x, y):
# ax.plot([i, point.longitude.values[0]], [j, point.latitude.values[0]])
for i in matched_w.k:
if i == 0:
kwargs = dict(lw=2, color="k", ls="--")
else:
kwargs = dict(color=".5", ls="--")
z = matched_w.sel(k=i, point=p)
ax.plot(
[z.longitude, z.point_longitude], [z.latitude, z.point_latitude], **kwargs
)
ax.text(
z.longitude,
z.latitude,
f" {z.point_grid_distance.item():.1f} km",
va="center",
fontweight="bold",
)
ax.set_xticks(ds.longitude)
ax.set_yticks(ds.latitude)
ax.set_xlabel("Longitude")
ax.set_ylabel("Latitude")
ax.set_title(f"Point={p.item()}", loc="left", fontsize="xx-large")
ax.set_title(
f"nearest value: $a={zz.a.item():.2f}$\nweighted mean: $a={z.a.item():.2f}$",
loc="right",
)
The yellow pentagon is the requested point. The gridโs nearest-neighbor point is connected by a thick dashed line. The three other nearest neighbors used to compute the distance-weighted mean are connected by a thin dashed line.
Summary: From a simple 2D Dataset with latitude and longitude coordinates, we got the nearest value for two points of interest. We also compute the inverse-distance weighted mean from the four grids nearest our point of interest.
Pick points from HRRR data#
The above is nice, but you might say, โYeah, but you can already select data from a grid using xarray advanced selection.โ That is true, but it does not work for models with curvilienar grids, like the HRRR model.
Here is a demonstration using real HRRR data.
[7]:
H = Herbie("2024-03-01", model="hrrr")
ds = H.xarray(":(?:TMP|DPT):2 m")
ds
โ
Found โ model=hrrr โ product=sfc โ 2024-Mar-01 00:00 UTC F00 โ GRIB2 @ aws โ IDX @ aws
[7]:
<xarray.Dataset> Size: 46MB Dimensions: (y: 1059, x: 1799) Coordinates: time datetime64[ns] 8B 2024-03-01 step timedelta64[ns] 8B 00:00:00 heightAboveGround float64 8B 2.0 latitude (y, x) float64 15MB 21.14 21.15 21.15 ... 47.85 47.84 longitude (y, x) float64 15MB 237.3 237.3 237.3 ... 299.0 299.1 valid_time datetime64[ns] 8B 2024-03-01 gribfile_projection object 8B None Dimensions without coordinates: y, x Data variables: t2m (y, x) float32 8MB 292.5 292.5 292.4 ... 266.8 266.8 d2m (y, x) float32 8MB 287.3 287.2 287.2 ... 262.2 262.2 Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 0 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: hrrr product: sfc description: High-Resolution Rapid Refresh - CONUS remote_grib: https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.... local_grib: /home/blaylock/data/hrrr/20240301/subset_e0ef89f... search: :(?:TMP|DPT):2 m
Letโs extract the data from some points with some fruit-themed station id names.
[8]:
points = pd.DataFrame(
{
"latitude": np.linspace(44, 45.5, 5),
"longitude": np.linspace(-100, -101, 5),
"stid": ["McIntosh", "Golden", "Fuji", "Gala", "Honeycrisp"],
}
)
points
[8]:
latitude | longitude | stid | |
---|---|---|---|
0 | 44.000 | -100.00 | McIntosh |
1 | 44.375 | -100.25 | Golden |
2 | 44.750 | -100.50 | Fuji |
3 | 45.125 | -100.75 | Gala |
4 | 45.500 | -101.00 | Honeycrisp |
[9]:
%%time
matched = ds.herbie.pick_points(points)
matched
INFO: ๐ฑ Growing new BallTree...๐ณ BallTree grew in 3.3 seconds.
INFO: Saved BallTree to /home/blaylock/data/BallTree/hrrr_1799-1059.pkl
CPU times: user 3.36 s, sys: 210 ms, total: 3.57 s
Wall time: 3.55 s
[9]:
<xarray.Dataset> Size: 320B Dimensions: (point: 5) Coordinates: time datetime64[ns] 8B 2024-03-01 step timedelta64[ns] 8B 00:00:00 heightAboveGround float64 8B 2.0 latitude (point) float64 40B 43.99 44.37 44.76 45.13 45.5 longitude (point) float64 40B 260.0 259.8 259.5 259.3 259.0 valid_time datetime64[ns] 8B 2024-03-01 gribfile_projection object 8B None point_grid_distance (point) float64 40B 0.8515 1.07 1.611 1.41 1.318 point_latitude (point) float64 40B 44.0 44.38 44.75 45.12 45.5 point_longitude (point) float64 40B -100.0 -100.2 -100.5 -100.8 -101.0 point_stid (point) object 40B 'McIntosh' 'Golden' ... 'Honeycrisp' Dimensions without coordinates: point Data variables: t2m (point) float32 20B 288.0 286.7 286.1 282.3 284.8 d2m (point) float32 20B 268.8 271.8 270.5 274.1 270.5 Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 0 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: hrrr product: sfc description: High-Resolution Rapid Refresh - CONUS remote_grib: https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.... local_grib: /home/blaylock/data/hrrr/20240301/subset_e0ef89f... search: :(?:TMP|DPT):2 m
Notice that a BallTree object for the HRRR model was saved with the name <model_name>_<x_dim>_<y_dim>.pkl
in the directory you save Herbie data. This is done so it can be loaded more quickly next time you extract points from this grid.
In the next cell I run the same command, and it uses the cached tree instead.
[10]:
%%time
matched = ds.herbie.pick_points(points)
matched
CPU times: user 132 ms, sys: 20.7 ms, total: 153 ms
Wall time: 150 ms
[10]:
<xarray.Dataset> Size: 320B Dimensions: (point: 5) Coordinates: time datetime64[ns] 8B 2024-03-01 step timedelta64[ns] 8B 00:00:00 heightAboveGround float64 8B 2.0 latitude (point) float64 40B 43.99 44.37 44.76 45.13 45.5 longitude (point) float64 40B 260.0 259.8 259.5 259.3 259.0 valid_time datetime64[ns] 8B 2024-03-01 gribfile_projection object 8B None point_grid_distance (point) float64 40B 0.8515 1.07 1.611 1.41 1.318 point_latitude (point) float64 40B 44.0 44.38 44.75 45.12 45.5 point_longitude (point) float64 40B -100.0 -100.2 -100.5 -100.8 -101.0 point_stid (point) object 40B 'McIntosh' 'Golden' ... 'Honeycrisp' Dimensions without coordinates: point Data variables: t2m (point) float32 20B 288.0 286.7 286.1 282.3 284.8 d2m (point) float32 20B 268.8 271.8 270.5 274.1 270.5 Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 0 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: hrrr product: sfc description: High-Resolution Rapid Refresh - CONUS remote_grib: https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.... local_grib: /home/blaylock/data/hrrr/20240301/subset_e0ef89f... search: :(?:TMP|DPT):2 m
You may want to swap the dimensions to use the point_stid
coordinate as the dimension instead. Doing this makes it possible to select data by station ID instead of point index.
[11]:
matched = matched.swap_dims({"point": "point_stid"})
matched.sel(point_stid="Honeycrisp")
[11]:
<xarray.Dataset> Size: 128B Dimensions: () Coordinates: time datetime64[ns] 8B 2024-03-01 step timedelta64[ns] 8B 00:00:00 heightAboveGround float64 8B 2.0 latitude float64 8B 45.5 longitude float64 8B 259.0 valid_time datetime64[ns] 8B 2024-03-01 gribfile_projection object 8B None point_grid_distance float64 8B 1.318 point_latitude float64 8B 45.5 point_longitude float64 8B -101.0 point_stid <U10 40B 'Honeycrisp' Data variables: t2m float32 4B 284.8 d2m float32 4B 270.5 Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 0 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: hrrr product: sfc description: High-Resolution Rapid Refresh - CONUS remote_grib: https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.... local_grib: /home/blaylock/data/hrrr/20240301/subset_e0ef89f... search: :(?:TMP|DPT):2 m
Now letโs plot each point on a map with the value at the nearest grid point.
[12]:
from toolbox import EasyMap, pc, ccrs
ax = EasyMap(crs=ds.herbie.crs).ax
ax.pcolormesh(
ds.longitude,
ds.latitude,
ds.t2m,
cmap="Spectral_r",
vmax=290,
vmin=270,
transform=pc,
)
for i in matched.point_stid:
z = matched.sel(point_stid=i)
ax.scatter(z.longitude, z.latitude, color="k", transform=pc)
ax.scatter(
z.point_longitude, z.point_latitude, color="yellow", marker=".", transform=pc
)
ax.text(
z.point_longitude,
z.point_latitude,
f" {z.point_stid.item()}\n {z.t2m.item()-273.15:.2f} C",
transform=pc,
)
ax.set_extent([-101, -99, 44, 46], crs=pc)
ax.EasyMap.INSET_GLOBE()
ax.adjust_extent()
[12]:
(-289442.889311017, -108684.3334160587, 605320.486730601, 849853.0858732163)
Pick Points: Sounding#
If you want a โsoundingโ at a single point, you will need to get the GRIB data for all the model layers of interest (yes, thatโs a lot of data), then pick the point.
[13]:
# TODO: Get read sounding data to compare model data to.
[14]:
H = Herbie("2024-03-28 00:00", model="hrrr", product="prs")
ds = H.xarray("(?:DPT|TMP):[0-9]* mb", remove_grib=False)
# Get HRRR sounding at Salt Lake City
slc = ds.herbie.pick_points(
pd.DataFrame({"latitude": [40.76], "longitude": [-111.876183]})
)
slc
โ
Found โ model=hrrr โ product=prs โ 2024-Mar-28 00:00 UTC F00 โ GRIB2 @ aws โ IDX @ aws
[14]:
<xarray.Dataset> Size: 696B Dimensions: (isobaricInhPa: 39, point: 1) Coordinates: time datetime64[ns] 8B 2024-03-28 step timedelta64[ns] 8B 00:00:00 * isobaricInhPa (isobaricInhPa) float64 312B 1e+03 975.0 ... 75.0 50.0 latitude (point) float64 8B 40.75 longitude (point) float64 8B 248.1 valid_time datetime64[ns] 8B 2024-03-28 gribfile_projection object 8B None point_grid_distance (point) float64 8B 1.193 point_latitude (point) float64 8B 40.76 point_longitude (point) float64 8B -111.9 Dimensions without coordinates: point Data variables: t (isobaricInhPa, point) float32 156B ... dpt (isobaricInhPa, point) float32 156B ... Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 0 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: hrrr product: prs description: High-Resolution Rapid Refresh - CONUS remote_grib: https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.... local_grib: /home/blaylock/data/hrrr/20240328/subset_0befd2f... search: (?:DPT|TMP):[0-9]* mb
[15]:
# TODO: plot data on a sounding plot using MetPy
ax = plt.gca()
ax.plot(slc.t, slc.isobaricInhPa, color="red", marker=".", label="Temperature")
ax.plot(slc.dpt, slc.isobaricInhPa, color="green", marker=".", label="Dew Point")
ax.invert_yaxis()
ax.set_ylabel("Level (hPa)")
ax.set_ylabel("Temperature (K)")
ax.legend()
[15]:
<matplotlib.legend.Legend at 0x7fedbb2ab6e0>
Pick Points: Timeseries#
[16]:
# TODO: Could demonstrate using FastHerbie here, if the kernel wouldn't crash (memory??)
i = []
for date in pd.date_range("2024-01-01", periods=9, freq="3h"):
ds = Herbie(date).xarray("(?:DPT|TMP):2 m")
i.append(
ds.herbie.pick_points(
pd.DataFrame(
{
"latitude": [40.77069],
"longitude": [-111.96503],
"stid": ["KSLC"],
}
)
)
)
slc_ts = xr.concat(i, dim="valid_time")
slc_ts.t2m.plot(x="valid_time", marker=".", color="tab:red")
slc_ts.d2m.plot(x="valid_time", marker=".", color="tab:green")
โ
Found โ model=hrrr โ product=sfc โ 2024-Jan-01 00:00 UTC F00 โ GRIB2 @ aws โ IDX @ aws
โ
Found โ model=hrrr โ product=sfc โ 2024-Jan-01 03:00 UTC F00 โ GRIB2 @ aws โ IDX @ aws
โ
Found โ model=hrrr โ product=sfc โ 2024-Jan-01 06:00 UTC F00 โ GRIB2 @ aws โ IDX @ aws
โ
Found โ model=hrrr โ product=sfc โ 2024-Jan-01 09:00 UTC F00 โ GRIB2 @ aws โ IDX @ aws
โ
Found โ model=hrrr โ product=sfc โ 2024-Jan-01 12:00 UTC F00 โ GRIB2 @ aws โ IDX @ aws
โ
Found โ model=hrrr โ product=sfc โ 2024-Jan-01 15:00 UTC F00 โ GRIB2 @ aws โ IDX @ aws
โ
Found โ model=hrrr โ product=sfc โ 2024-Jan-01 18:00 UTC F00 โ GRIB2 @ aws โ IDX @ aws
โ
Found โ model=hrrr โ product=sfc โ 2024-Jan-01 21:00 UTC F00 โ GRIB2 @ aws โ IDX @ aws
โ
Found โ model=hrrr โ product=sfc โ 2024-Jan-02 00:00 UTC F00 โ GRIB2 @ aws โ IDX @ aws
๐จ๐ปโ๐ญ Created directory: [/home/blaylock/data/hrrr/20240102]
[16]:
[<matplotlib.lines.Line2D at 0x7fedbafadb80>]
Benchmark#
Letโs see how fast ds.herbie.pick_points
is for harvesting many pointsโฆ
[17]:
H = Herbie("2024-03-28 00:00", model="hrrr")
ds = H.xarray(r"TMP:\d* mb", remove_grib=False)
โ
Found โ model=hrrr โ product=sfc โ 2024-Mar-28 00:00 UTC F00 โ GRIB2 @ aws โ IDX @ aws
Using the modelโs own grid, I will generate 100 random samples to extract.
[18]:
def generate_random_string(len=8):
"""Generate a random string."""
import random
import string
return "".join(random.choices(string.ascii_letters + string.digits, k=len))
n = 100
points_self = (
ds[["latitude", "longitude"]]
.to_dataframe()[["latitude", "longitude"]]
.sample(n)
.reset_index(drop=True)
)
points_self["stid"] = [generate_random_string() for _ in range(n)]
points_self
[18]:
latitude | longitude | stid | |
---|---|---|---|
0 | 28.118041 | 251.508694 | A4Bt4ArI |
1 | 45.433523 | 236.549102 | M8TVLmnW |
2 | 29.326794 | 290.301011 | EFnRTuds |
3 | 48.066147 | 293.088237 | LGzLiHaU |
4 | 32.020415 | 247.694797 | tzABimIV |
... | ... | ... | ... |
95 | 44.124212 | 279.341471 | zA7MhOEm |
96 | 44.208684 | 270.780126 | OSrB5uVX |
97 | 30.286338 | 254.814174 | e5xYHMth |
98 | 38.231154 | 256.410611 | ZwVijXzk |
99 | 42.481527 | 268.409357 | PsiEP6f0 |
100 rows ร 3 columns
[19]:
%%timeit
y1 = ds.herbie.pick_points(points_self)
139 ms ยฑ 43.8 ms per loop (mean ยฑ std. dev. of 7 runs, 1 loop each)
[20]:
%%timeit
y2 = ds.herbie.pick_points(points_self, method="weighted")
1.94 s ยฑ 383 ms per loop (mean ยฑ std. dev. of 7 runs, 1 loop each)
[21]:
%%timeit
# Try the deprecated `nearest_points` method which used MetPy
with warnings.catch_warnings():
warnings.simplefilter("ignore")
y3 = ds.herbie.nearest_points(points_self)
681 ms ยฑ 47.1 ms per loop (mean ยฑ std. dev. of 7 runs, 1 loop each)
[22]:
# Do I get the same answer for the old and new method?
y1 = ds.herbie.pick_points(points_self)
y2 = ds.herbie.pick_points(points_self, method="weighted")
y3 = ds.herbie.nearest_points(points_self)
all(y1.latitude == y3.latitude), all(y1.longitude == y3.longitude)
/tmp/ipykernel_1317/3015965854.py:4: DeprecationWarning: The accessor `ds.herbie.nearest_points` is deprecated in favor of the `ds.herbie.pick_points` which uses the BallTree algorithm instead.
y3 = ds.herbie.nearest_points(points_self)
/home/blaylock/GITHUB/Herbie/herbie/accessors.py:524: UserWarning: More than one time coordinate present for variable "t".
crs = ds.metpy_crs.item().to_cartopy()
[22]:
(True, True)
[23]:
ax = EasyMap(crs=ds.herbie.crs).ax
ax.pcolormesh(
ds.longitude,
ds.latitude,
ds.isel(isobaricInhPa=0).t,
cmap="Spectral_r",
transform=pc,
)
ax.scatter(y1.longitude, y1.latitude, color="k", transform=pc)
ax.scatter(
y1.point_longitude, y1.point_latitude, color="yellow", marker=".", transform=pc
)
ax.EasyMap.INSET_GLOBE()
[23]:
<GeoAxes: >
Not bad. Less than half a second to extract 100 points from the HRRR grid.
Now letโs try even moreโฆ
[32]:
n_samples = (
[1, 2, 3, 4, 5, 6, 7, 8, 9]
+ list(range(10, 100, 10))
+ list(range(100, 1_000, 100))
+ list(range(1_000, 10_000, 1_000))
+ [100_000, 1_000_000]
)
with warnings.catch_warnings():
warnings.simplefilter("ignore")
# Timers for nearest_points (deprecated)
times_np = []
samples_np = []
# Timers for pick_points method='nearest'
times_pp = []
samples_pp = []
# Timers for pick_points method='weighted'
times_w = []
samples_w = []
for s in n_samples:
samples_np.append(s)
timer = pd.Timestamp("now")
y1 = ds.herbie.nearest_points(points_self)
times_np.append((pd.Timestamp("now") - timer).total_seconds())
samples_pp.append(s)
timer = pd.Timestamp("now")
y1 = ds.herbie.pick_points(points_self, method="nearest")
times_pp.append((pd.Timestamp("now") - timer).total_seconds())
samples_w.append(s)
timer = pd.Timestamp("now")
y1 = ds.herbie.pick_points(points_self, method="weighted")
times_w.append((pd.Timestamp("now") - timer).total_seconds())
plt.figure(figsize=[8, 4])
plt.plot(
samples_pp,
times_pp,
marker=".",
lw=2,
color="tab:blue",
label="pick_points (nearest)",
)
plt.plot(
samples_w,
times_w,
marker=".",
lw=2,
color="tab:orange",
label="pick_points (weighted)",
)
plt.plot(
samples_np,
times_np,
marker=".",
lw=1,
color="k",
ls="--",
label="nearest_points (deprecated)",
)
plt.title("Herbie methods to get data at a point")
plt.xlabel("Number of points picked")
plt.ylabel("Query Time (s)")
plt.ylim(ymin=0)
plt.xlim(xmin=1)
plt.legend(loc="upper center", bbox_to_anchor=[0.5, -0.2])
plt.gca().set_xscale("log")
Nice! This method scales well for harvesting many points ๐
Lets see how well this works for the GFS modelโฆ
[25]:
gfs = Herbie("2024-01-01", model="gfs").xarray(":TMP:2 m ")
โ
Found โ model=gfs โ product=pgrb2.0p25 โ 2024-Jan-01 00:00 UTC F00 โ GRIB2 @ aws โ IDX @ aws
[26]:
%%time
gfs.herbie.pick_points(points_self)
INFO: ๐ฑ Growing new BallTree...๐ณ BallTree grew in 1.7 seconds.
INFO: Saved BallTree to /home/blaylock/data/BallTree/gfs_1440-721.pkl
CPU times: user 1.8 s, sys: 100 ms, total: 1.9 s
Wall time: 1.89 s
[26]:
<xarray.Dataset> Size: 5kB Dimensions: (point: 100) Coordinates: time datetime64[ns] 8B 2024-01-01 step timedelta64[ns] 8B 00:00:00 heightAboveGround float64 8B 2.0 latitude (point) float64 800B 28.0 45.5 29.25 ... 38.25 42.5 longitude (point) float64 800B 251.5 236.5 290.2 ... 256.5 268.5 valid_time datetime64[ns] 8B 2024-01-01 gribfile_projection object 8B None point_grid_distance (point) float64 800B 13.15 8.325 9.869 ... 8.083 7.711 point_latitude (point) float64 800B 28.12 45.43 29.33 ... 38.23 42.48 point_longitude (point) float64 800B 251.5 236.5 290.3 ... 256.4 268.4 point_stid (point) object 800B 'A4Bt4ArI' ... 'PsiEP6f0' Dimensions without coordinates: point Data variables: t2m (point) float32 400B 287.8 277.5 293.1 ... 276.3 270.4 Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 0 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: gfs product: pgrb2.0p25 description: Global Forecast System remote_grib: https://noaa-gfs-bdp-pds.s3.amazonaws.com/gfs.20... local_grib: /home/blaylock/data/gfs/20240101/subset_6bef22b0... search: :TMP:2 m
Advanced Options#
The default behavior for method='nearest'
is to return the first nearest neighbor. The default behavior for method='weighted'
is to return the inverse-distance weighted mean of the nearest 4 grid points.
You can get more or fewer neighbors by setting the k
argument. This might be useful if you want to compute the standard deviation of the k grid points surrounding a point of interest.
[27]:
%%time
# Return the 5 nearest grid points to each request point
ds.herbie.pick_points(points_self, method="nearest", k=5)
CPU times: user 1.46 s, sys: 216 ms, total: 1.68 s
Wall time: 1.67 s
[27]:
<xarray.Dataset> Size: 24kB Dimensions: (k: 5, isobaricInhPa: 5, point: 100) Coordinates: time datetime64[ns] 8B 2024-03-28 step timedelta64[ns] 8B 00:00:00 * isobaricInhPa (isobaricInhPa) float64 40B 1e+03 925.0 ... 700.0 500.0 latitude (k, point) float64 4kB 28.12 45.43 ... 38.26 42.45 longitude (k, point) float64 4kB 251.5 236.5 ... 256.4 268.4 valid_time datetime64[ns] 8B 2024-03-28 gribfile_projection object 8B None point_grid_distance (k, point) float64 4kB 0.0 0.0 0.0 ... 2.97 3.0 2.993 point_latitude (point) float64 800B 28.12 45.43 29.33 ... 38.23 42.48 point_longitude (point) float64 800B 251.5 236.5 290.3 ... 256.4 268.4 point_stid (point) object 800B 'A4Bt4ArI' ... 'PsiEP6f0' Dimensions without coordinates: k, point Data variables: t (k, isobaricInhPa, point) float32 10kB 304.9 ... 244.0 Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 0 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: hrrr product: sfc description: High-Resolution Rapid Refresh - CONUS remote_grib: https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.... local_grib: /home/blaylock/data/hrrr/20240328/subset_0befe27... search: TMP:\d* mb
[28]:
%%time
# Compute the distance weighted mean for the 9 nearest grid points to
# each request point
ds.herbie.pick_points(points_self, method="weighted", k=9)
CPU times: user 2.78 s, sys: 97.1 ms, total: 2.88 s
Wall time: 2.87 s
[28]:
<xarray.Dataset> Size: 28kB Dimensions: (isobaricInhPa: 5, point: 100, k: 9) Coordinates: time datetime64[ns] 8B 2024-03-28 step timedelta64[ns] 8B 00:00:00 * isobaricInhPa (isobaricInhPa) float64 40B 1e+03 925.0 ... 700.0 500.0 valid_time datetime64[ns] 8B 2024-03-28 gribfile_projection object 8B None point_latitude (point) float64 800B 28.12 45.43 29.33 ... 38.23 42.48 point_longitude (point) float64 800B 251.5 236.5 290.3 ... 256.4 268.4 point_stid (point) object 800B 'A4Bt4ArI' ... 'PsiEP6f0' latitude (k, point) float64 7kB 28.12 45.43 ... 38.26 42.45 longitude (k, point) float64 7kB 251.5 236.5 ... 256.4 268.4 point_grid_distance (k, point) float64 7kB 0.0 0.0 0.0 ... 4.242 4.232 Dimensions without coordinates: point, k Data variables: t (isobaricInhPa, point) float64 4kB 304.9 ... 244.0
Suppose I wanted to get the standard deviation of model surface height around each of my points.
[29]:
ds = Herbie("2024-01-01").xarray("HGT:surface")
dsp = ds.herbie.pick_points(points_self, k=100)
dsp
โ
Found โ model=hrrr โ product=sfc โ 2024-Jan-01 00:00 UTC F00 โ GRIB2 @ aws โ IDX @ aws
[29]:
<xarray.Dataset> Size: 282kB Dimensions: (k: 100, point: 100) Coordinates: time datetime64[ns] 8B 2024-01-01 step timedelta64[ns] 8B 00:00:00 surface float64 8B 0.0 latitude (k, point) float64 80kB 28.12 45.43 ... 38.33 42.38 longitude (k, point) float64 80kB 251.5 236.5 ... 256.3 268.3 valid_time datetime64[ns] 8B 2024-01-01 gribfile_projection object 8B None point_grid_distance (k, point) float64 80kB 0.0 0.0 0.0 ... 16.97 16.93 point_latitude (point) float64 800B 28.12 45.43 29.33 ... 38.23 42.48 point_longitude (point) float64 800B 251.5 236.5 290.3 ... 256.4 268.4 point_stid (point) object 800B 'A4Bt4ArI' ... 'PsiEP6f0' Dimensions without coordinates: k, point Data variables: orog (k, point) float32 40kB 955.9 597.6 ... 1.361e+03 282.0 Attributes: GRIB_edition: 2 GRIB_centre: kwbc GRIB_centreDescription: US National Weather Service - NCEP GRIB_subCentre: 0 Conventions: CF-1.7 institution: US National Weather Service - NCEP model: hrrr product: sfc description: High-Resolution Rapid Refresh - CONUS remote_grib: https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.... local_grib: /home/blaylock/data/hrrr/20240101/subset_6bef832... search: HGT:surface
[30]:
z = dsp.orog.std(dim="k")
z
[30]:
<xarray.DataArray 'orog' (point: 100)> Size: 400B array([3.57974121e+02, 1.69752808e+02, 0.00000000e+00, 1.00506348e+02, 6.67194519e+01, 1.44643211e+01, 1.72897491e+01, 4.49295654e+01, 5.20242023e+00, 1.31510179e-02, 2.03198242e+01, 0.00000000e+00, 5.40655518e+01, 2.79081287e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.36973648e+01, 5.59680462e-02, 2.10313797e+01, 6.57881317e+01, 0.00000000e+00, 1.84571640e+02, 1.35059387e+02, 4.05825577e+01, 3.62122345e+01, 7.16302872e+01, 1.69172256e+02, 0.00000000e+00, 2.00776978e+01, 0.00000000e+00, 2.38120499e+01, 4.83460712e+00, 4.66342430e+01, 2.26004620e+01, 6.58044434e+00, 2.43521683e+02, 5.29580927e+00, 9.35856552e+01, 1.76103989e+02, 2.67501984e+01, 0.00000000e+00, 6.42464294e+01, 7.21421127e+01, 2.53476772e+01, 1.55120902e-02, 1.96815292e+02, 0.00000000e+00, 1.62173843e+01, 3.15807373e+02, 1.09912872e+01, 9.66130905e+01, 1.96222886e-01, 0.00000000e+00, 9.15527344e-05, 0.00000000e+00, 5.75467110e+01, 1.22795906e+02, 0.00000000e+00, 1.90995293e+01, 9.26871872e+01, 0.00000000e+00, 4.46593895e+01, 7.39437256e+01, 0.00000000e+00, 2.01596813e+01, 1.12923744e+02, 2.73812828e+01, 0.00000000e+00, 4.54793930e+01, 1.94310131e+01, 1.56630157e+02, 4.82573051e+01, 2.60957432e+01, 1.03776340e+01, 3.65887360e+02, 0.00000000e+00, 4.98398705e+01, 3.41156349e+01, 4.36897621e+01, 0.00000000e+00, 4.71860838e+00, 5.85202408e+01, 0.00000000e+00, 2.07875748e+01, 0.00000000e+00, 0.00000000e+00, 4.01632398e-01, 4.51303040e+02, 1.41894979e+01, 0.00000000e+00, 2.05594254e+02, 1.38799706e+01, 0.00000000e+00, 4.10455780e+01, 3.56380310e+01, 7.39399796e+01, 3.51430511e+01, 1.29698896e+01], dtype=float32) Coordinates: time datetime64[ns] 8B 2024-01-01 step timedelta64[ns] 8B 00:00:00 surface float64 8B 0.0 valid_time datetime64[ns] 8B 2024-01-01 gribfile_projection object 8B None point_latitude (point) float64 800B 28.12 45.43 29.33 ... 38.23 42.48 point_longitude (point) float64 800B 251.5 236.5 290.3 ... 256.4 268.4 point_stid (point) object 800B 'A4Bt4ArI' ... 'PsiEP6f0' Dimensions without coordinates: point
[31]:
ax = EasyMap().ax
art = ax.scatter(z.point_longitude, z.point_latitude, c=z, cmap="winter")
plt.colorbar(
art,
ax=ax,
label="Standard deviation of model elevation surrounding location",
orientation="horizontal",
pad=0.01,
)
[31]:
<matplotlib.colorbar.Colorbar at 0x7fedb1142000>