Concentration & Co-localization | Sebastián Baeza-González

This project examines the spatial patterns of inventive activity in Japan, analyzing how technological innovation concentrates and co-localizes across different fields. Using patent data from the Japan Patent Office (1975-2014), we reveal how Japan’s innovation geography shifted from a highly integrated system to more specialized, isolated clusters.

Research Overview

Key Questions

Concentration: Is there more or less technological agglomeration than expected?
Co-localization: Which technologies spatially interact with others?
Policy Impact: How did Japan’s regionalization policy affect innovation patterns?

Main Findings

Increased Concentration: After 1995, technologies became more spatially concentrated at the local level
Decreased Co-localization: Spatial proximity between different technologies significantly declined
Policy Trade-off: Regionalization policies successfully promoted local specialization but reduced cross-field interactions

Methodology

The analysis employs the K^emp function (Duranton & Overman, 2005, 2008), a distance-based measure that:

Treats space as continuous, avoiding the Modifiable Areal Unit Problem (MAUP)
Provides statistical significance through Monte Carlo simulations
Enables comparison across technologies and time periods

Data Pipeline

┌─────────────────────┐     ┌──────────────────┐     ┌─────────────────────┐
│   IIP Patent DB     │────▶│   Geocoding      │────▶│  Point Database     │
│   (14.3M patents)   │     │   (Block-level)  │     │  (524,714 points)   │
└─────────────────────┘     └──────────────────┘     └─────────────────────┘
                                                              │
                                                              ▼
                            ┌──────────────────┐     ┌─────────────────────┐
                            │  Visualization   │◀────│  K^emp Analysis     │
                            │  & Mapping       │     │  (Concentration &   │
                            └──────────────────┘     │   Co-localization)  │
                                                     └─────────────────────┘

Code Highlights

1. Computing the K^emp Function

The weighted K-density function measures spatial concentration by analyzing the distribution of bilateral distances between patent locations:

# Load required libraries
library(dbmss)
library(sf)
library(tidyverse)

# Create weighted point pattern from patent data
# Each point is weighted by the number of patents at that location
create_weighted_ppp <- function(data, tech_category) {
  data %>%
    filter(category == tech_category) %>%
    mutate(weight = n_patents) %>%
    st_as_sf(coords = c("lon", "lat"), crs = 4326) %>%
    as.ppp()
}

# Calculate K^emp for a technology category
# This identifies whether clustering exceeds random expectation
calculate_kemp <- function(ppp_object, max_distance = 180000) {
  Kest(ppp_object, 
       correction = "best",
       r = seq(0, max_distance, by = 1000))
}

Why this matters: Unlike traditional measures that rely on administrative boundaries, K^emp captures the continuous nature of spatial clustering, revealing patterns that polygon-based methods miss.

2. Concentration Index (Γ and Ψ)

The gamma (Γ) and psi (Ψ) indices quantify localization and dispersion strength:

# Calculate localization index (Gamma)
# Captures positive deviation from upper confidence band
calculate_gamma <- function(kemp_result, upper_ci) {
  gamma <- sum(pmax(kemp_result$obs - upper_ci, 0))
  return(gamma)
}

# Calculate dispersion index (Psi)
# Only computed when Gamma = 0
calculate_psi <- function(kemp_result, lower_ci, gamma) {
  if (gamma > 0) {
    return(0)
  }
  psi <- sum(pmax(lower_ci - kemp_result$obs, 0))
  return(psi)
}

# Monte Carlo simulation for confidence intervals
# 99 simulations with 0.05 significance level
run_monte_carlo <- function(ppp_object, n_sim = 99) {
  envelope(ppp_object, 
           Kest, 
           nsim = n_sim, 
           nrank = 2,  # For 0.05 confidence
           verbose = FALSE)
}

3. Co-localization Analysis

Measuring spatial relationships between technology pairs reveals knowledge flow patterns:

# Calculate co-localization between two technology categories
# Uses the same K^emp framework applied to pairs
calculate_colocalization <- function(data, tech_A, tech_B) {
  
  # Combine points from both categories
  combined_data <- data %>%
    filter(category %in% c(tech_A, tech_B))
  
  # Create bivariate point pattern
  biv_ppp <- combined_data %>%
    mutate(mark = factor(category)) %>%
    st_as_sf(coords = c("lon", "lat"), crs = 4326) %>%
    as.ppp()
  
  # Calculate cross-K function
  # Tests whether tech_A points cluster near tech_B points
  # beyond what's expected from their combined distribution
  Kcross(biv_ppp, 
         i = tech_A, 
         j = tech_B,
         correction = "isotropic")
}

# Iterate over all technology pairs (30 categories = 435 pairs)
calculate_all_pairs <- function(data, categories) {
  pairs <- combn(categories, 2, simplify = FALSE)
  
  map_dfr(pairs, function(pair) {
    result <- calculate_colocalization(data, pair[1], pair[2])
    tibble(
      tech_A = pair[1],
      tech_B = pair[2],
      gamma = calculate_gamma(result, upper_ci),
      psi = calculate_psi(result, lower_ci, gamma)
    )
  })
}

4. Visualization of Results

Creating publication-ready figures that reveal spatial patterns:

# Plot concentration curves by technology
plot_concentration_curves <- function(kemp_results, tech_categories) {
  
  ggplot(kemp_results, aes(x = distance_km, y = kemp_value)) +
    # Confidence envelope (Monte Carlo)
    geom_ribbon(aes(ymin = lower_ci, ymax = upper_ci), 
                alpha = 0.3, fill = "grey70") +
    # Observed K^emp curve
    geom_line(color = "#E74C3C", linewidth = 0.8) +
    # Reference (random expectation)
    geom_line(aes(y = theoretical), 
              linetype = "dashed", color = "grey40") +
    facet_wrap(~technology, scales = "free_y", ncol = 5) +
    labs(
      x = "Distance (km)",
      y = expression(K^emp),
      title = "Concentration Patterns by Technology Category"
    ) +
    theme_minimal(base_family = "Helvetica") +
    theme(
      strip.text = element_text(size = 9, face = "bold"),
      panel.grid.minor = element_blank()
    )
}

# Heatmap for co-localization matrix
plot_colocalization_matrix <- function(coloc_results) {
  
  # Transform psi values to negative for co-dispersion
  coloc_matrix <- coloc_results %>%
    mutate(value = ifelse(gamma > 0, gamma, -psi))
  
  ggplot(coloc_matrix, aes(x = tech_A, y = tech_B, fill = value)) +
    geom_tile() +
    scale_fill_gradient2(
      low = "#3498DB",      # Blue for co-dispersion
      mid = "white",
      high = "#E74C3C",     # Red for co-localization
      midpoint = 0,
      name = "Γ/Ψ"
    ) +
    coord_fixed() +
    theme_minimal() +
    theme(
      axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
      axis.text.y = element_text(size = 7),
      axis.title = element_blank()
    )
}

Key Results Summary

Period	Concentrated Technologies	Dispersed Technologies
1975-1994	Electronics, Mining, Nuclear Physics, Clocks/Computers	Agriculture, Non-organic Chemistry
1995-2014	Same + More categories	Only Agriculture remains

The shift from 1975-1994 to 1995-2014 shows:

More concentrated categories (strengthening of regional clusters)
Fewer dispersed categories (loss of national integration)
Reduced co-localization (isolation of technological domains)

Repository Structure

Concentration-colocalization-Japan/
├── 00_Data/
│   ├── patent_points.csv          # Geocoded inventor locations
│   ├── technology_lookup.csv      # IPC to NBER classification
│   └── japan_boundaries.shp       # Administrative boundaries
├── 01_Scripts/
│   ├── 01_Concentration.R         # K^emp concentration analysis
│   ├── 02_Colocalization.R        # Pairwise co-localization
│   └── 03_Visualization.R         # Figures and maps
└── README.md

Citation

If you use this code or methodology, please cite:

@article{baeza2026concentration,
  title={Concentration and co-localization dynamics of technological 
         innovation: The Japanese case},
  author={Baeza-Gonz{\'a}lez, Sebasti{\'a}n and Kamakura, Natsuki},
  journal={Applied Geography},
  volume={186},
  pages={103819},
  year={2026},
  publisher={Elsevier},
  doi={10.1016/j.apgeog.2025.103819}
}

Links

GitHub Repository Published Paper