Examples of Working with COGs

Environment

The packages needed for this notebook can be installed with conda or mamba. Using the environment.yml from this folder run:

conda create -f environment.yml

or

mamba create -f environment.yml

This notebook has been tested to work with the listed Conda environment.

Setup

For demonstrating some COG concepts, we will download a regular GeoTIFF, create a Cloud-Optimized GeoTIFF and explore how they are different.

First we use the earthaccess library to setup credentials to fetch data from NASA’s EarthData catalog.

import earthaccess
import rasterio
from rasterio.plot import show
from rio_cogeo import cog_validate, cog_info
/Users/kyle/local/micromamba/envs/coguide-cog/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
earthaccess.login()
EARTHDATA_USERNAME and EARTHDATA_PASSWORD are not set in the current environment, try setting them or use a different strategy (netrc, interactive)
You're now authenticated with NASA Earthdata Login
Using token with expiration date: 10/24/2023
Using .netrc file for EDL
<earthaccess.auth.Auth at 0x10427d390>

Download a GeoTIFF from EarthData

Note: The whole point of is that we don’t download data. So in future examples, we will demonstrate how to access just subsets of data using COG and compare that with a GeoTIFF.

# Download data
short_name = 'VCF5KYR'
version = '001'

veg_item_results = earthaccess.search_data(
    short_name=short_name,
    version=version,
    count=1
)
Granules found: 33
test_data_dir = "./test_data"
veg_files = earthaccess.download(veg_item_results, test_data_dir)
veg_gtiff_filename = f"{test_data_dir}/{veg_files[0]}"
 Getting 1 granules, approx download size: 0.07 GB
File VCF5KYR_1982001_001_2018224204211.tif already downloaded
QUEUEING TASKS | : 100%|██████████| 1/1 [00:00<00:00, 900.84it/s]
PROCESSING TASKS | : 100%|██████████| 1/1 [00:00<00:00, 15887.52it/s]
COLLECTING RESULTS | : 100%|██████████| 1/1 [00:00<00:00, 29330.80it/s]

To learn more about the example data see the Vegetation Continuous Fields (VCF) information page.

Is it a valid COG?

We can use rio_cogeo.cog_validate to check. It returns is_valid, errors and warnings.

cog_validate(veg_gtiff_filename)
The following warnings were found:
- The file is greater than 512xH or 512xW, it is recommended to include internal overviews

The following errors were found:
- The file is greater than 512xH or 512xW, but is not tiled
(False,
 ['The file is greater than 512xH or 512xW, but is not tiled'],
 ['The file is greater than 512xH or 512xW, it is recommended to include internal overviews'])

Return values:

  1. is_valid is False: this is not a valid COG.
  2. errors are 'The file is greater than 512xH or 512xW, but is not tiled'. To be a valid COG, the file should be tiled since it has a height and width both greater than 512.
  3. warnings are 'The file is greater than 512xH or 512xW, it is recommended to include internal overviews'. It is recommended to provide overviews.

Converting a GeoTIFF to COG

We can use rio_cogeo.cog_create to convert a GeoTIFF into a Cloud Optimized GeoTIFF

veg_cog_filename = veg_gtiff_filename.replace(".tif", "_cog.tif")

!rio cogeo create {veg_gtiff_filename} {veg_cog_filename}
Reading input: /Users/kyle/ds/cloud-optimized-geospatial-formats-guide/cloud-optimized-geotiffs/test_data/VCF5KYR_1982001_001_2018224204211.tif
  [####################################]  100%
Adding overviews...
Updating dataset tags...
Writing output to: /Users/kyle/ds/cloud-optimized-geospatial-formats-guide/cloud-optimized-geotiffs/test_data/VCF5KYR_1982001_001_2018224204211_cog.tif
cog_validate(veg_cog_filename)
(True, [], [])

This is a valid COG, so we will use it to compare with our GeoTIFF.

Data Structure

Dimensions Dimensions are the number of bands, rows and columns stored in a GeoTIFF. More Info

Internal Blocks (aka chunks or internal tiles) Internal blocks are required if the dimensions of data are over 512x512. More Info

Let’s check out the dimensions and blocks of our GeoTIFF and Cloud-Optimized GeoTIFF.

veg_gtiff_rio = rasterio.open(veg_gtiff_filename)
veg_cog_rio = rasterio.open(veg_cog_filename)
print(veg_gtiff_rio.shape)
veg_cog_rio.shape
(3600, 7200)
(3600, 7200)

They have the same dimensions which is what we expect, so that is good!

We can also print information about the GeoTIFF’s IFD (Internal File Directory). Only one item is returned because the GeoTIFF doesn’t have overviews. When we print the IFD info for the COG, which has overviews, we see more items returned.

cog_info(veg_gtiff_filename).IFD
[IFD(Level=0, Width=7200, Height=3600, Blocksize=(1, 7200), Decimation=0)]
cog_info(veg_cog_filename).IFD
[IFD(Level=0, Width=7200, Height=3600, Blocksize=(512, 512), Decimation=0),
 IFD(Level=1, Width=3600, Height=1800, Blocksize=(128, 128), Decimation=2),
 IFD(Level=2, Width=1800, Height=900, Blocksize=(128, 128), Decimation=4),
 IFD(Level=3, Width=900, Height=450, Blocksize=(128, 128), Decimation=8)]

Note for IFD Level 0, the regular GeoTIFF has a blocksize of (1, 7200) which implies each row of data is a separate block. So whenever reading in data, even if only a few columns are required, the full row must be read.

Overviews

Overviews are downsampled (aggregated) data intended for visualization.

The smallest size overview should match the tiling components’ fetch size, typically 256x256. Due to aspect ratio variation just aim to have at least one dimension at or slightly less than 256. > The COG driver in GDAL, or rio cogeo tools should do this.

There are many resampling algorithms for generating overviews. The best resampling algorithm depends on the range, type, and distribution of the data. When creating overviews several options should be compared before deciding which resampling method to apply.

GDAL >= 3.2 allows for the overview resampling method to be set directly.

veg_gtiff_rio.overviews(1)
[]
veg_cog_rio.overviews(1)
[2, 4, 8]

By displaying each overview, we can see how the dimensions get coarser for each overview level.

def show_overviews(geotiff):  
    for overview in geotiff.overviews(1):
        out_height = int(geotiff.height // overview)
        out_width = int(geotiff.width // overview)
        print(f"out height: {out_height}")
        print(f"out width: {out_width}")    
        # read first band of file and set shape of new output array
        window_size_height = round(out_height/8)
        window_size_width = round(out_width/8)
        image = veg_cog_rio.read(1, out_shape=(1, out_height, out_width))[
            window_size_height:(window_size_height*2),
            window_size_width:(window_size_width*2),
        ]
        show(image)
        
show_overviews(veg_cog_rio)
out height: 1800
out width: 3600
out height: 900
out width: 1800
out height: 450
out width: 900

We can generate more and different overviews, through different tilesizes and resampling.

import gen_overviews
tmp_dst = gen_overviews.create_overviews_from_gtiff(veg_gtiff_rio)
tmp_cog = rasterio.open(tmp_dst)
cog_info(tmp_dst).IFD
[IFD(Level=0, Width=7200, Height=3600, Blocksize=(1, 7200), Decimation=0),
 IFD(Level=1, Width=3600, Height=1800, Blocksize=(128, 128), Decimation=2),
 IFD(Level=2, Width=1800, Height=900, Blocksize=(128, 128), Decimation=4),
 IFD(Level=3, Width=900, Height=450, Blocksize=(128, 128), Decimation=8),
 IFD(Level=4, Width=450, Height=225, Blocksize=(128, 128), Decimation=16)]

Note: Now we have overviews but there are still no tiles on the Level 0 IFD.

overviews = tmp_cog.overviews(1)
overviews
[2, 4, 8, 16]
show_overviews(tmp_cog)
out height: 1800
out width: 3600
out height: 900
out width: 1800
out height: 450
out width: 900
out height: 225
out width: 450