π€ Rclone#
If you need to download many full GRIB2 files from the cloud, I would suggest using rclone instead of Herbie. I love rclone!
Rclone is a command-line program that does multi-threaded downloads between different cloud providers or to local disk. Below is a brief tutorial to help you get started.
As part of NOAAβs Open Data Dissemination program, the NEXRAD radar, GOES satellite, HRRR model, and many other dataset are publicly available via Amazon Web Services (AWS) and other cloud providers. You can use rclone
to download these datasets. You can even use rclone to access personal OneDrive, Google Drive, Box, and other types of cloud storage.
Install and setup#
If you use conda, rclone is simply installed like this:
conda install -c conda-forge rclone
Or, you can download from https://rclone.org/downloads/.
Now configure rclone
to access Amazon S3. You can configure a new remote via the command line walk-through by running the command
rclone config
Select
n
for new remote and name the remote anything you like, but use a name that will remind you it accesses the public Amazon S3 buckets. I named minepublicAWS
.Set Type of Storage as
Amazon S3 Compliant Storage Providers
(5)Set Storage Provider as
Amazon Web Services S3
(1).Leave everything else blank (push enter for the remaining prompts).
If it looks right, accept with
y
and exit the setup.
The configuration file is saved in in this location
~/.config/rclone.config
And this should look like this
[publicAWS]
type = s3
provider = AWS
CLI Usage#
You can now use the remote you just configured to access NOAAβs public buckets on Amazon Web Services S3. Below are the names of some of NOAAβs public buckets.
Data | Bucket Name | |
---|---|---|
HRRR | noaa-hrrr-bdp-pds |
documentation |
GFS | noaa-gfs-bdp-pds |
documentation |
GEFS | noaa-gefs-pds |
documentation |
GOES16 | noaa-goes16 |
documentation |
GOES17 | noaa-goes17 |
documentation |
GOES18 | noaa-goes18 |
documentation |
NEXRAD | noaa-nexrad-level2 |
documentation |
Note: bdp-pds stands for Big Data Program Public Data Set
You access the bucket contents by typing the command
rclone <command> <options> <remoteName>:<bucket>
Documentation for all the commands and options can be found on the rclone website.
List bucket directories#
rclone lsd publicAWS:noaa-goes16/
List bucket directories for specific folders#
rclone lsd publicAWS:noaa-hrrr-bdp-pds/hrrr.20210101
List files in bucket#
rclone ls publicAWS:noaa-hrrr-bdp-pds/hrrr.20210101/conus
Copy file or files to your local machine#
rclone copy publicAWS:noaa-goes16/ABI-L2-MCMIPC/2018/283/00/OR_ABI-L2-MCMIPC-M3_G16_s20182830057203_e20182830059576_c20182830100076.nc ./
Examples#
HRRR#
For accessing HRRR from AWS and/or Azure, this should be in your ~/.config/rclone/rclone.conf
file:
[publicAWS]
type = s3
provider = AWS
[azure-hrrr]
type = azureblob
sas_url = https://noaahrrr.blob.core.windows.net/hrrr
Usage examples
# List HRRR buckets on AWS
rclone lsd publicAWS:noaa-hrrr-bdp-pds
# List HRRR blobs on Azure
rclone lsd azure-hrrr:hrrr
ECMWF#
For accessing ECMWF open data on AWS or Azure, this should be in your ~/.config/rclone/rclone.conf
file:
[publicAWS]
type = s3
provider = AWS
[azure-ecmwf]
type = azureblob
sas_url = https://ai4edataeuwest.blob.core.windows.net/ecmwf
Usage examples
# List ECMWF buckets on AWS
rclone lsd publicAWS:ecmwf-forecasts
# List ECMWF blobs on Azure
rclone lsd azure-ecmwf:ecmwf
Copy all oper
and scda
data from a certain day/hour to your local computer
rclone copy azure-ecmwf:ecmwf/20221213/ . --include "*-{oper,scda}-fc.{grib2,index}"
Command Part | Explanation |
---|---|
rclone |
the rclone command |
copy |
copy data from A to B |
azure-ecmwf:ecmwf/20221213/ |
The data source |
. |
The location to download the dat (preserve directory structure) |
--include "*-{oper,scda}-fc.{grib2,index}" |
filter the files you want downloaded |