GIS Data Processing and Automation: A Practical Log of 83 Projects

This log documents a series of 83 geospatial data processing tasks completed between August and November 2025. Each entry outlines the core challenge, technical approach, and key implementation details—focusing on reproducibility and efficiency.

1. Converting SLDPRT to GeoJSON

SLDPRT is a proprietary SolidWorks 3D model format. Direct conversion to GeoJSON isn't feasible due to semantic mismatch (3D CAD vs. 2D geographic features). The solution involved:

  1. Using CrossManager 2024 to export SLDPRT as OBJ.
  2. Processing the OBJ file in FME to extract geometry and convert it into GeoJSON with valid coordinate reference systems.

2. MAPGIS to CAD Conversion

MAPGIS project files (.MPJ) were converted using the official MAPGIS software with the "Section" plugin. Key steps:

  • Load the .MPJ file in MAPGIS.
  • Use the plugin to export layers as DXF.
  • Verify output in AutoCAD; use zoom-extents if geometry appears missing.

3. Spatial Correction of CAD Vectors to GCJ-02

A CAD-derived shapefile needed alignment with Gaode (GCJ-02) basemaps. Workflow:

  1. Define initial CRS as EPSG:3857 for ease of transformation.
  2. Perform rubber-sheeting in ArcGIS Pro using Tianditu imagery as reference.
  3. Apply fine-tuning via affine transformation.
  4. Reproject final result to GCJ-02 using QGIS plugins.

Critical note: Always assign a CRS before correction—undefined projections cause misalignment.

4–5, 7–8, 14–18, 20. Format Interconversions (Excel, MDB, CAD, SHP, KML)

Repeated conversions between common GIS formats were automated using FME and GDAL. Key considerations:

  • Excel → SHP: Ensure UTF-8 encoding and validate coordinate colums.
  • MDB → CAD: Project to a suitable projected CRS (e.g., UTM) for accurate length/area calculations.
  • CAD → SHP: Handle layer attributes like autocad_layer, fme_text_string. For Chinese CAD files:
    • X/Y with 6 digits: likely local coordinate system—requires 3/7-parameter transformation.
    • X=7 digits, Y=8 digits: includes zone number (first two digits of Y indicate 3° or 6° UTM zone).
  • KML → SHP with Attributes: Parse HTML tables embedded in the <description> field. Python implementation below:
import xml.etree.ElementTree as ET
from osgeo import ogr
import os

def parse_kml_description(html):
    # Extract key-value pairs from HTML table in KML description
    try:
        start = html.find('<table')
        end = html.find('</table>') + 9
        if start == -1 or end < 9: return {}
        table_html = html[start:end]
        root = ET.fromstring(f'<root>{table_html}</root>').find('.//table')
        if root is None: return {}
        attrs = {}
        for tr in root.findall('.//tr'):
            th = tr.find('th')
            td = tr.find('td')
            if th is not None and td is not None:
                key = th.text.strip() if th.text else ''
                val = td.text.strip() if td.text else ''
                if key: attrs[key] = val
        return attrs
    except Exception:
        return {}

def kml_to_shp(kml_path, shp_path):
    ds_in = ogr.Open(kml_path)
    lyr_in = ds_in.GetLayer()
    srs = lyr_in.GetSpatialRef()
    
    driver = ogr.GetDriverByName('ESRI Shapefile')
    if os.path.exists(shp_path): driver.DeleteDataSource(shp_path)
    ds_out = driver.CreateDataSource(shp_path)
    lyr_out = ds_out.CreateLayer('output', srs=srs, geom_type=ogr.wkbUnknown)

    # Scan all features to collect field names
    fields = set()
    features_data = []
    for feat in lyr_in:
        desc = feat.GetField('description')
        attrs = parse_kml_description(desc) if desc else {}
        fields.update(attrs.keys())
        features_data.append({'geom': feat.GetGeometryRef().Clone(), 'attrs': attrs})
    
    # Create fields (sanitize for Shapefile limits)
    for f in sorted(fields):
        name = ''.join(c if c.isalnum() else '_' for c in f)[:10]
        lyr_out.CreateField(ogr.FieldDefn(name, ogr.OFTString))
    
    # Write features
    for item in features_data:
        out_feat = ogr.Feature(lyr_out.GetLayerDefn())
        out_feat.SetGeometry(item['geom'])
        for k, v in item['attrs'].items():
            name = ''.join(c if c.isalnum() else '_' for c in k)[:10]
            out_feat.SetField(name, v)
        lyr_out.CreateFeature(out_feat)
    
    ds_in, ds_out = None, None

21. STL to GLB Conversion

Used Blender scripting or online converters for lightweight 3D web visualization (e.g., Three.js). GLB is preferred over GeoJSON for complex meshes.

22. Bivariate Mapping

Visualized dual variables (e.g., population density and temperature) using bivariate choropleth techniques in QGIS with the "Bivariate Legend" plugin. Data sources:

  • Population: LandScan Global
  • Climate: TerraClimate

Output saved as PDF to preserve vector quality.

NetCDF Time-Series Processing

Automated extraction of temporal averages from NetCDF climate data:

import xarray as xr
import rioxarray

def process_netcdf(nc_file, var_name, year):
    ds = xr.open_dataset(nc_file)
    da = ds[var_name]
    time_dim = [d for d in da.dims if 'time' in d.lower()][0]
    mean_da = da.mean(dim=time_dim)
    if mean_da.rio.crs is None:
        mean_da = mean_da.rio.write_crs("EPSG:4326")
    mean_da.rio.to_raster(f"{var_name}_avg_{year}.tif", dtype='float32')

23. Supervised Land Cover Classification (GEE)

Used Google Earth Engine with Landsat 5 imagery and training points. Key steps:

  1. Cloud masking using QA_PIXEL.
  2. Compute NDVI, NDBI, MNDWI indices.
  3. Train Random Forest classifier (50 trees).
  4. Evaluate accuracy via confusion matrix and Kappa.

Sample GEE code structure:

// Load image and mask clouds
var img = ee.ImageCollection('LANDSAT/LT05/C02/T1_L2')
  .filterDate('1990-01-01', '1990-12-31')
  .map(maskL5sr).median();

// Add spectral indices
var ndvi = img.normalizedDifference(['SR_B4', 'SR_B3']).rename('NDVI');
var classified = img.addBands([ndvi, ...]).select(bands).classify(classifier);

// Accuracy assessment
var test = samplePoints.classify(classifier);
print('Accuracy:', test.errorMatrix('landcover', 'classification').accuracy());

27. 3D Bar Charts

Generated interactive 3D bar visualizations using web-based tools like HasGG.

28–33. Thematic Mapping

Routine cartographic products created in QGIS:

  • Heavy metal concentration maps (interpolation + masking).
  • Location maps, historical rainfall animations, emergency response layouts.
  • Tourism route heatmaps using kernel density estimation.

34. Raster Gap-Filling

Filled NoData areas in a small raster using values from a larger overlapping raster:

  • ArcGIS Pro: Con(IsNull("small"), Con(IsNull("large"), "large", 0), "small")
  • QGIS Raster Calculator: ``` ("small@1" IS NOT NULL) * "small@1" + ("small@1" IS NULL AND "large@1" IS NOT NULL) * "large@1" + ("small@1" IS NULL AND "large@1" IS NULL) * 0
    
    

36. Facility Location Analysis

Applied weighted overlay and cost-distance analysis in ArcGIS Pro to identify optimal distribution center sites based on demand, transport networks, and land use constraints.

37. Landslide Probability (FOS)

Calculated Factor of Safety (FOS) using slope stability models incorporating soil properties, groundwater levels, and seismic coefficients.

Confusion Matrix & Sankey Diagrams

Evaluated classification accuracy and visualized land cover transitions:

  • Computed Producer’s (Recall) and User’s (Precision) accuracies per class.
  • Built interactive Sankey diagrams using ECharts to show area flows between classes over time (2015–2025).

Python snippet for accuracy metrics:

import numpy as np
from sklearn.metrics import cohen_kappa_score

cm = np.array([...])  # Confusion matrix
oa = np.trace(cm) / cm.sum()
kappa = cohen_kappa_score(
    np.repeat(range(8), cm.sum(axis=1)),
    np.concatenate([np.full(cm[i].sum(), i) for i in range(8)])
)

68. PostgreSQL + QGIS Integration

Connected QGIS to a PostgreSQL database containing transportation survey data. Executed SQL queries to:

  • Retrieve student population by planning district (PD).
  • Aggregate trip counts by mode and destination PD.
  • Create a query layer joined to PD geometries for mapping origin-destination flows.

Label expression in QGIS: "pid" || ': ' || "total_trips_to_destination"

69. MySQL Spatial + Python Visualization

Stored hydrological station coordinates as POINT in MySQL:

CREATE TABLE stations (
  id INT AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(255),
  location POINT SRID 4326
);

Loaded Excel data via Python (mysql-connector) and visualized using Matplotlib Basemap or Folium.

70, 83. ArcGIS Online Workflows

Published feature layers from shapefiles and GeoPackages. Configured:

  • Classified symbology based on population.
  • Labels using $feature["name"].
  • Pop-ups with embedded photos via URL fields.

Shared maps within organizational accounts and distributed public links.

82. Field Evidence Documentation

Collected ground-truth points in QGIS:

  • Created point layer with attributes: point_id, evidnctype (H/L/T/O), short_desc, long_desc.
  • Symbology categorized by evidence type.
  • Labeled as "1: Quiet residential street" etc.
  • Photos named by point ID and description.

Tags: gis data conversion spatial analysis map production task log

Posted on Sat, 09 May 2026 05:08:32 +0000 by Archbob