Converting LiDAR LAS Files to Cloud-Optimized Point Clouds (COPCs)

Environment

The packages needed for this notebook can be installed with conda or mamba. Using the environment.yml from this folder run:

conda env create -f environment.yml

or

mamba env create -f environment.yml

Finally, you may activate and select the kernel in the notebook (running in Jupyter)

conda activate coguide-copc

The notebook has been tested to work with the listed Conda environment.

Setup

This tutorial will explore how to-

  1. Read a LiDAR LAS file using PDAL in Python
  2. Convert the LiDAR LAS file to Cloud-Optimized Point Cloud (COPC) format
  3. Validate the generated COPC file

About the Dataset

We will be using the G-LiHT Lidar Point Cloud V001 from the NASA EarthData. To access NASA EarthData in Jupyter you need to register for an Earthdata account.

We will use earthaccess library to set up credentials to fetch data from NASA’s EarthData catalog.

import earthaccess
import os
import pdal
/opt/homebrew/anaconda3/envs/coguide-copc/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
earthaccess.login()
<earthaccess.auth.Auth at 0x10bfc7d90>

Creating a Data Directory for this Tutorial

We are creating a data directory for downloading all the required files locally.

# set data directory path
data_dir = './data'

# check if directory exists -> if directory doesn't exist, directory is created
if not os.path.exists(data_dir):
    os.mkdir(data_dir)

Downloading the Dataset from EarthData

We are using search_data method from the earthaccess module for searching the Granules from the selected collection. The temporal argument defines the temporal range for

# Search Granules

las_item_results = earthaccess.search_data(
    short_name="GLLIDARPC",
    version="001",
    temporal = ("2020"), 
    count=3
)
Granules found: 72
las_item_results
[Collection: {'EntryTitle': 'G-LiHT Lidar Point Cloud V001'}
 Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -81.03452828650298, 'Latitude': 25.50220025425373}, {'Longitude': -81.01391715300757, 'Latitude': 25.50220365895999}, {'Longitude': -81.01391819492625, 'Latitude': 25.5112430715201}, {'Longitude': -81.03453087148995, 'Latitude': 25.511239665437053}, {'Longitude': -81.03452828650298, 'Latitude': 25.50220025425373}]}}]}}}
 Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2020-03-11T04:00:00.000Z', 'EndingDateTime': '2020-03-12T03:59:59.000Z'}}
 Size(MB): 238.623
 Data: ['https://e4ftl01.cr.usgs.gov//GWELD1/COMMUNITY/GLLIDARPC.001/2020.03.11/GLLIDARPC_FL_20200311_FIA8_l0s47.las'],
 Collection: {'EntryTitle': 'G-LiHT Lidar Point Cloud V001'}
 Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -81.02242648723991, 'Latitude': 25.493163090615468}, {'Longitude': -80.99410838333016, 'Latitude': 25.49316468678571}, {'Longitude': -80.99410794242846, 'Latitude': 25.502204110708817}, {'Longitude': -81.02242816553566, 'Latitude': 25.50220251389295}, {'Longitude': -81.02242648723991, 'Latitude': 25.493163090615468}]}}]}}}
 Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2020-03-11T04:00:00.000Z', 'EndingDateTime': '2020-03-12T03:59:59.000Z'}}
 Size(MB): 248.383
 Data: ['https://e4ftl01.cr.usgs.gov//GWELD1/COMMUNITY/GLLIDARPC.001/2020.03.11/GLLIDARPC_FL_20200311_FIA8_l0s46.las'],
 Collection: {'EntryTitle': 'G-LiHT Lidar Point Cloud V001'}
 Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Longitude': -80.94099075054905, 'Latitude': 25.276201329530473}, {'Longitude': -80.9355627247816, 'Latitude': 25.276199059361314}, {'Longitude': -80.9355579494582, 'Latitude': 25.285238744206318}, {'Longitude': -80.94098637748567, 'Latitude': 25.285241015299494}, {'Longitude': -80.94099075054905, 'Latitude': 25.276201329530473}]}}]}}}
 Temporal coverage: {'RangeDateTime': {'BeginningDateTime': '2020-03-11T04:00:00.000Z', 'EndingDateTime': '2020-03-12T03:59:59.000Z'}}
 Size(MB): 91.0422
 Data: ['https://e4ftl01.cr.usgs.gov//GWELD1/COMMUNITY/GLLIDARPC.001/2020.03.11/GLLIDARPC_FL_20200311_FIA8_l0s22.las']]

Let’s use the file with size 91.04 MB and convert it to a COPC format.

# Download Data - Selecting the 3rd file from the `las_item_results` list
gliht_las_file = earthaccess.download(las_item_results[2], data_dir)
las_filename = gliht_las_file[0]
print(las_filename)
 Getting 1 granules, approx download size: 0.09 GB
QUEUEING TASKS | : 100%|██████████| 1/1 [00:00<00:00, 1869.12it/s]
File GLLIDARPC_FL_20200311_FIA8_l0s22.las already downloaded
PROCESSING TASKS | : 100%|██████████| 1/1 [00:00<00:00, 16131.94it/s]
COLLECTING RESULTS | : 100%|██████████| 1/1 [00:00<00:00, 33554.43it/s]
data/GLLIDARPC_FL_20200311_FIA8_l0s22.las

A Brief Introduction to PDAL

PDAL (Point Data Abstraction Library) is a C/C++ based open-source library for processing point cloud data. Additionally, it also has a PDAL-Python wrapper to work in a Pythonic environment.

Accessing and Getting Metadata Information

PDAL CLI provides multiple applications for processing point clouds. Also, it allows chaining of these applications for processing point clouds. Similar to gdal info for TIFFs, we can run pdal info <filename> on the command line for getting metadata from a point cloud file without reading it in memory.

!pdal info {las_filename}
{
  "file_size": 95464691,
  "filename": "data/GLLIDARPC_FL_20200311_FIA8_l0s22.las",
  "now": "2024-03-20T12:30:57-0500",
  "pdal_version": "2.6.3 (git-version: Release)",
  "reader": "readers.las",
  "stats":
  {
    "bbox":
    {
      "EPSG:4326":
      {
        "bbox":
        {
          "maxx": -80.93555795,
          "maxy": 25.28524102,
          "maxz": 69.99,
          "minx": -80.94099075,
          "miny": 25.27619906,
          "minz": -12.54
        },
        "boundary": { "type": "Polygon", "coordinates": [ [ [ -80.940990750549048, 25.276201329530473, -12.54 ], [ -80.940986377485672, 25.285241015299494, -12.54 ], [ -80.9355579494582, 25.285238744206318, 69.99 ], [ -80.935562724781605, 25.276199059361314, 69.99 ], [ -80.940990750549048, 25.276201329530473, -12.54 ] ] ] }
      },
      "native":
      {
        "bbox":
        {
          "maxx": 506487.7363,
          "maxy": 2796533.993,
          "maxz": 69.99,
          "minx": 505941.2263,
          "miny": 2795533.003,
          "minz": -12.54
        },
        "boundary": { "type": "Polygon", "coordinates": [ [ [ 505941.226302567636594, 2795533.003240843303502, -12.54 ], [ 505941.226302567636594, 2796533.993240843061358, -12.54 ], [ 506487.736302567645907, 2796533.993240843061358, 69.99 ], [ 506487.736302567645907, 2795533.003240843303502, 69.99 ], [ 505941.226302567636594, 2795533.003240843303502, -12.54 ] ] ] }
      }
    },
    "statistic":
    [
      {
        "average": 506237.8598,
        "count": 3409439,
        "maximum": 506487.7363,
        "minimum": 505941.2263,
        "name": "X",
        "position": 0,
        "stddev": 101.3857552,
        "variance": 10279.07135
      },
      {
        "average": 2795977.637,
        "count": 3409439,
        "maximum": 2796533.993,
        "minimum": 2795533.003,
        "name": "Y",
        "position": 1,
        "stddev": 274.313888,
        "variance": 75248.10912
      },
      {
        "average": 2.192797205,
        "count": 3409439,
        "maximum": 69.99,
        "minimum": -12.54,
        "name": "Z",
        "position": 2,
        "stddev": 1.788122887,
        "variance": 3.197383461
      },
      {
        "average": 30205.96606,
        "count": 3409439,
        "maximum": 65535,
        "minimum": 14789,
        "name": "Intensity",
        "position": 3,
        "stddev": 5497.346879,
        "variance": 30220822.71
      },
      {
        "average": 1.200336478,
        "count": 3409439,
        "maximum": 5,
        "minimum": 1,
        "name": "ReturnNumber",
        "position": 4,
        "stddev": 0.4267302243,
        "variance": 0.1820986843
      },
      {
        "average": 1.400683808,
        "count": 3409439,
        "maximum": 5,
        "minimum": 1,
        "name": "NumberOfReturns",
        "position": 5,
        "stddev": 0.5529822902,
        "variance": 0.3057894133
      },
      {
        "average": 0.5248757933,
        "count": 3409439,
        "maximum": 1,
        "minimum": 0,
        "name": "ScanDirectionFlag",
        "position": 6,
        "stddev": 0.4993808847,
        "variance": 0.249381268
      },
      {
        "average": 0.002420339534,
        "count": 3409439,
        "maximum": 1,
        "minimum": 0,
        "name": "EdgeOfFlightLine",
        "position": 7,
        "stddev": 0.04913738087,
        "variance": 0.002414482199
      },
      {
        "average": 1.239596602,
        "count": 3409439,
        "maximum": 2,
        "minimum": 1,
        "name": "Classification",
        "position": 8,
        "stddev": 0.4268373506,
        "variance": 0.1821901239
      },
      {
        "average": 2.569738893,
        "count": 3409439,
        "maximum": 33,
        "minimum": -31,
        "name": "ScanAngleRank",
        "position": 9,
        "stddev": 16.33805559,
        "variance": 266.9320603
      },
      {
        "average": 1.475124207,
        "count": 3409439,
        "maximum": 2,
        "minimum": 1,
        "name": "UserData",
        "position": 10,
        "stddev": 0.4993808847,
        "variance": 0.249381268
      },
      {
        "average": 192.5227238,
        "count": 3409439,
        "maximum": 65535,
        "minimum": 0,
        "name": "PointSourceId",
        "position": 11,
        "stddev": 680.5012031,
        "variance": 463081.8874
      },
      {
        "average": 310476.1839,
        "count": 3409439,
        "maximum": 310485.0477,
        "minimum": 310469.0825,
        "name": "GpsTime",
        "position": 12,
        "stddev": 4.240892261,
        "variance": 17.98516717
      },
      {
        "average": 0,
        "count": 3409439,
        "maximum": 0,
        "minimum": 0,
        "name": "Synthetic",
        "position": 13,
        "stddev": 0,
        "variance": 0
      },
      {
        "average": 0,
        "count": 3409439,
        "maximum": 0,
        "minimum": 0,
        "name": "KeyPoint",
        "position": 14,
        "stddev": 0,
        "variance": 0
      },
      {
        "average": 0,
        "count": 3409439,
        "maximum": 0,
        "minimum": 0,
        "name": "Withheld",
        "position": 15,
        "stddev": 0,
        "variance": 0
      },
      {
        "average": 0,
        "count": 3409439,
        "maximum": 0,
        "minimum": 0,
        "name": "Overlap",
        "position": 16,
        "stddev": 0,
        "variance": 0
      }
    ]
  }
}

PDAL Pipelines

For converting the LiDAR LAS file to COPC format, we will define a pdal pipeline. A pipeline defines data processing within pdal for reading (using pdal readers), processing (using pdal filters) and writing operations (using pdal writers). The pipelines can also represent sequential operations and can be executed as stages.

A pdal pipeline is defined in a JSON format either as a JSON object or a JSON array. Below is an example of a pdal pipeline taking a .las file as input, generating stats and writing it to a COPC format.

{
  "pipeline": [
    {
        "filename":las_filename,
        "type":"readers.las"
    },
    {
        "type":"filters.stats",
    },
    {
        "type":"writers.copc",
        "filename":copc_filename
    }
]
}

This pipeline can be executed using the pdal pipeline <path_to_json_file> from the command line for a pipeline saved as a local JSON file.

Programmatic Pipeline Construction

However, here we will explore a comparatively easier and Pythonic approach to define a pipeline and execute it. This is based on the PDAL Python extension which provides a programmatic pipeline construction approach in addition to the simple pipeline construction approach discussed above.

This approach utilizes the | operator to pipe various stages together representing a pipeline. For eg., the above pipeline can be represented as -

pipeline = pdal.Reader.las(filename=las_filename) | pdal.Writer.copc(filename=copc_filename) | pdal.Filter.stats()

This pipeline can be executed using pipeline.execute.

LAS to COPC Conversion

Now, let’s dive into converting the LAS file to a COPC format based on the programmatic pipeline construction.

# Defining output filename. Usually, COPC files are saved as .copc.laz
copc_filename = las_filename.replace('.las', '.copc.laz')
copc_filename
'data/GLLIDARPC_FL_20200311_FIA8_l0s22.copc.laz'
# pipe = stage 1 | stage 2 | stage 3
# Or, pipeline = pipeline 1 | stage 2

# Once the pipeline is executed successfully, it prints the count of number of points
pipe = pdal.Reader.las(filename=las_filename) | pdal.Writer.copc(filename=copc_filename)
pipe.execute()
3409439

Validation

As we can see from output of the below cell, the .copc.laz file is created in the destination directory.

# using -go for removing user details and h for getting memory size in MBs
!ls -goh ./data
total 239888
-rw-r--r--  1     26M Mar 20 11:55 GLLIDARPC_FL_20200311_FIA8_l0s22.copc.laz
-rw-r--r--  1     91M Feb 29 11:27 GLLIDARPC_FL_20200311_FIA8_l0s22.las

Let’s read the created COPC file again and check the value of copc flag from the metadata. If the generated LiDAR file is a valid COPC file, then this flag should be set to True.

valid_pipe = pdal.Reader.copc(filename=copc_filename) | pdal.Filter.stats()
valid_pipe.execute()

# Getting value for the "copc" key under the metadata
# Output is True for a valid COPC
value = valid_pipe.metadata["metadata"]["readers.copc"].get("copc")
print(value)
True

Accessing Data

The data values can be accessed from the executed pipeline using valid_pipe.arrays. The values in the arrays represent the LiDAR point cloud attributes such as X, Y, Z, and Intensity, etc.

arr_values = valid_pipe.arrays

# Print the array values as a dataframe
print(arr_values)
[array([(506245.56, 2796471.44, 0.24, 40740, 1, 1, 1, 0, 2, 0, 0, 0, 0,  16.998, 1,   0, 310483.75227621, 0),
       (506247.16, 2796471.58, 0.27, 35541, 2, 2, 1, 0, 2, 0, 0, 0, 0,  16.998, 1,   0, 310483.75229014, 0),
       (506247.95, 2796471.65, 0.24, 17716, 2, 2, 1, 0, 2, 0, 0, 0, 0,  16.998, 1,   0, 310483.75229699, 0),
       ...,
       (506066.58, 2796032.75, 2.34, 31587, 1, 1, 0, 0, 1, 0, 0, 0, 0, -24.   , 2, 203, 310477.36925451, 0),
       (506067.37, 2796033.29, 2.52, 32876, 1, 1, 0, 0, 1, 0, 0, 0, 0, -22.998, 2, 216, 310477.37590641, 0),
       (506062.6 , 2796033.27, 1.4 , 27393, 1, 1, 0, 0, 1, 0, 0, 0, 0, -24.   , 2, 108, 310477.38259945, 0)],
      dtype=[('X', '<f8'), ('Y', '<f8'), ('Z', '<f8'), ('Intensity', '<u2'), ('ReturnNumber', 'u1'), ('NumberOfReturns', 'u1'), ('ScanDirectionFlag', 'u1'), ('EdgeOfFlightLine', 'u1'), ('Classification', 'u1'), ('Synthetic', 'u1'), ('KeyPoint', 'u1'), ('Withheld', 'u1'), ('Overlap', 'u1'), ('ScanAngleRank', '<f4'), ('UserData', 'u1'), ('PointSourceId', '<u2'), ('GpsTime', '<f8'), ('ScanChannel', 'u1')])]

Similarly, we can get COPC file statistic and log from the executed pipeline using valid_pipe.metadata["metadata"]["filters.stats"]["statistic"] and valid_pipe.log. The readers are encouraged to explore the results of these operations on their own.

# Getting statistic from the metadata
valid_pipe.metadata["metadata"]["filters.stats"]["statistic"]
[{'average': 506237.8635,
  'count': 3409439,
  'maximum': 506487.74,
  'minimum': 505941.23,
  'name': 'X',
  'position': 0,
  'stddev': 101.3857552,
  'variance': 10279.07135},
 {'average': 2795977.634,
  'count': 3409439,
  'maximum': 2796533.99,
  'minimum': 2795533,
  'name': 'Y',
  'position': 1,
  'stddev': 274.313888,
  'variance': 75248.10914},
 {'average': 2.192797205,
  'count': 3409439,
  'maximum': 69.99,
  'minimum': -12.54,
  'name': 'Z',
  'position': 2,
  'stddev': 1.788122887,
  'variance': 3.197383461},
 {'average': 30205.96606,
  'count': 3409439,
  'maximum': 65535,
  'minimum': 14789,
  'name': 'Intensity',
  'position': 3,
  'stddev': 5497.346879,
  'variance': 30220822.71},
 {'average': 1.200336478,
  'count': 3409439,
  'maximum': 5,
  'minimum': 1,
  'name': 'ReturnNumber',
  'position': 4,
  'stddev': 0.4267302243,
  'variance': 0.1820986843},
 {'average': 1.400683808,
  'count': 3409439,
  'maximum': 5,
  'minimum': 1,
  'name': 'NumberOfReturns',
  'position': 5,
  'stddev': 0.5529822902,
  'variance': 0.3057894133},
 {'average': 0.5248757933,
  'count': 3409439,
  'maximum': 1,
  'minimum': 0,
  'name': 'ScanDirectionFlag',
  'position': 6,
  'stddev': 0.4993808847,
  'variance': 0.249381268},
 {'average': 0.002420339534,
  'count': 3409439,
  'maximum': 1,
  'minimum': 0,
  'name': 'EdgeOfFlightLine',
  'position': 7,
  'stddev': 0.04913738087,
  'variance': 0.002414482199},
 {'average': 1.239596602,
  'count': 3409439,
  'maximum': 2,
  'minimum': 1,
  'name': 'Classification',
  'position': 8,
  'stddev': 0.4268373506,
  'variance': 0.1821901239},
 {'average': 2.569735542,
  'count': 3409439,
  'maximum': 33,
  'minimum': -31.00200081,
  'name': 'ScanAngleRank',
  'position': 9,
  'stddev': 16.33805538,
  'variance': 266.9320534},
 {'average': 1.475124207,
  'count': 3409439,
  'maximum': 2,
  'minimum': 1,
  'name': 'UserData',
  'position': 10,
  'stddev': 0.4993808847,
  'variance': 0.249381268},
 {'average': 192.5227238,
  'count': 3409439,
  'maximum': 65535,
  'minimum': 0,
  'name': 'PointSourceId',
  'position': 11,
  'stddev': 680.5012031,
  'variance': 463081.8874},
 {'average': 310476.1839,
  'count': 3409439,
  'maximum': 310485.0477,
  'minimum': 310469.0825,
  'name': 'GpsTime',
  'position': 12,
  'stddev': 4.240892262,
  'variance': 17.98516718},
 {'average': 0,
  'count': 3409439,
  'maximum': 0,
  'minimum': 0,
  'name': 'ScanChannel',
  'position': 13,
  'stddev': 0,
  'variance': 0},
 {'average': 0,
  'count': 3409439,
  'maximum': 0,
  'minimum': 0,
  'name': 'Synthetic',
  'position': 14,
  'stddev': 0,
  'variance': 0},
 {'average': 0,
  'count': 3409439,
  'maximum': 0,
  'minimum': 0,
  'name': 'KeyPoint',
  'position': 15,
  'stddev': 0,
  'variance': 0},
 {'average': 0,
  'count': 3409439,
  'maximum': 0,
  'minimum': 0,
  'name': 'Withheld',
  'position': 16,
  'stddev': 0,
  'variance': 0},
 {'average': 0,
  'count': 3409439,
  'maximum': 0,
  'minimum': 0,
  'name': 'Overlap',
  'position': 17,
  'stddev': 0,
  'variance': 0}]