As Jekyll sites scale to thousands of pages, client-side search solutions like Lunr.js hit performance limits due to memory constraints and download sizes. A distributed search architecture using Cloudflare Workers and R2 storage enables sub-100ms search across massive content collections while maintaining the static nature of Jekyll. This technical guide details the implementation of a sharded, distributed search index that partitions content across multiple R2 buckets and uses Worker-based query processing to deliver Google-grade search performance for static sites.

In This Guide

Distributed Search Architecture and Sharding Strategy

The distributed search architecture partitions the search index across multiple R2 buckets based on content characteristics, enabling parallel query execution and efficient memory usage. The system comprises three main components: the index generation pipeline (Jekyll plugin), the storage layer (R2 buckets), and the query processor (Cloudflare Workers).

Index sharding follows a multi-dimensional strategy: primary sharding by content type (posts, pages, documentation) and secondary sharding by alphabetical ranges or date ranges within each type. This approach ensures balanced distribution while maintaining logical grouping of related content. Each shard contains a complete inverted index for its content subset, along with metadata for relevance scoring and result aggregation.


// Sharding Strategy:
// posts/a-f.json    [65MB]  → R2 Bucket 1
// posts/g-m.json    [58MB]  → R2 Bucket 1  
// posts/n-t.json    [62MB]  → R2 Bucket 2
// posts/u-z.json    [55MB]  → R2 Bucket 2
// pages/*.json      [45MB]  → R2 Bucket 3
// docs/*.json       [120MB] → R2 Bucket 4 (further sharded)

// Query Flow:
// 1. Query → Cloudflare Worker
// 2. Worker identifies relevant shards
// 3. Parallel fetch from multiple R2 buckets
// 4. Result aggregation and scoring
// 5. Response with ranked results

Jekyll Index Generation and Content Processing Pipeline

The index generation occurs during Jekyll build through a custom plugin that processes content, builds inverted indices, and generates sharded index files. The pipeline includes text extraction, tokenization, stemming, and index optimization.

Here's the core Jekyll plugin for distributed index generation:


# _plugins/search_index_generator.rb
require 'nokogiri'
require 'zlib'

class SearchIndexGenerator < Jekyll::Generator
  def generate(site)
    @shards = Hash.new { |h,k| h[k] = {} }
    
    site.documents.each do |doc|
      next unless should_index?(doc)
      
      content = extract_searchable_content(doc)
      tokens = process_content(content)
      add_to_shards(doc, tokens)
    end
    
    generate_shard_files(site)
  end
  
  private
  
  def process_content(content)
    # HTML stripping and text extraction
    text = Nokogiri::HTML(content).text
    # Tokenization and normalization
    tokens = text.downcase.split(/[^\w]+/)
    # Stop word removal and stemming
    tokens.reject! { |t| STOP_WORDS.include?(t) }
    tokens.map! { |t| stem(t) }
    # Frequency analysis
    token_freq = Hash.new(0)
    tokens.each { |t| token_freq[t] += 1 }
    token_freq
  end
  
  def add_to_shards(document, token_freq)
    shard_key = determine_shard(document)
    doc_id = document.url
    
    @shards[shard_key][doc_id] = {
      title: document.data['title'],
      url: document.url,
      content: token_freq,
      metadata: extract_metadata(document),
      boost: calculate_boost_factor(document)
    }
  end
  
  def generate_shard_files(site)
    @shards.each do |shard_name, shard_data|
      compressed_data = Zlib::Deflate.deflate(JSON.generate(shard_data))
      site.pages << SearchIndexPage.new(site, shard_name, compressed_data)
    end
  end
end

R2 Storage Optimization for Search Index Files

R2 storage configuration optimizes for both storage efficiency and query performance. The implementation uses compression, intelligent partitioning, and cache headers to minimize latency and costs.

Index files are compressed using brotli compression with custom dictionaries tailored to the site's content. Each shard includes a header with metadata for quick query planning and shard selection. The R2 bucket structure organizes shards by content type and update frequency, enabling different caching strategies for static vs. frequently updated content.


// R2 Bucket Structure:
// search-indices/
//   ├── posts/
//   │   ├── shard-001.br.json
//   │   ├── shard-002.br.json
//   │   └── manifest.json
//   ├── pages/
//   │   ├── shard-001.br.json  
//   │   └── manifest.json
//   └── global/
//       ├── stopwords.json
//       ├── stemmer-rules.json
//       └── analytics.log

// Upload script with optimization
async function uploadShard(shardName, shardData) {
  const compressed = compressWithBrotli(shardData);
  const key = `search-indices/posts/${shardName}.br.json`;
  
  await env.SEARCH_BUCKET.put(key, compressed, {
    httpMetadata: {
      contentType: 'application/json',
      contentEncoding: 'br'
    },
    customMetadata: {
      'shard-size': compressed.length,
      'document-count': shardData.documentCount,
      'avg-doc-length': shardData.avgLength
    }
  });
}

Worker-Based Query Processing and Result Aggregation

The query processor handles search requests by identifying relevant shards, executing parallel searches, and aggregating results. The implementation uses Worker's concurrent fetch capabilities for optimal performance.

Here's the core query processing implementation:


export default {
  async fetch(request, env, ctx) {
    const { query, page = 1, limit = 10 } = await getSearchParams(request);
    
    if (!query || query.length < 2) {
      return jsonResponse({ error: 'Query too short' }, 400);
    }
    
    const startTime = Date.now();
    const searchTerms = parseQuery(query);
    const relevantShards = await identifyRelevantShards(searchTerms, env);
    
    // Execute parallel searches across shards
    const shardResults = await Promise.allSettled(
      relevantShards.map(shard => searchShard(shard, searchTerms, env))
    );
    
    // Aggregate and rank results
    const allResults = aggregateResults(shardResults);
    const rankedResults = rankResults(allResults, searchTerms);
    const paginatedResults = paginateResults(rankedResults, page, limit);
    
    const responseTime = Date.now() - startTime;
    
    return jsonResponse({
      query,
      results: paginatedResults,
      total: rankedResults.length,
      page,
      limit,
      responseTime,
      shardsQueried: relevantShards.length
    });
  }
}

async function searchShard(shardKey, searchTerms, env) {
  const shardData = await env.SEARCH_BUCKET.get(shardKey);
  if (!shardData) return [];
  
  const decompressed = await decompressBrotli(shardData);
  const index = JSON.parse(decompressed);
  
  return searchTerms.flatMap(term => 
    Object.entries(index)
      .filter(([docId, doc]) => doc.content[term])
      .map(([docId, doc]) => ({
        docId,
        score: calculateTermScore(doc.content[term], doc.boost, term),
        document: doc
      }))
  );
}

Relevance Ranking and Result Scoring Implementation

The ranking algorithm combines TF-IDF scoring with content-based boosting and user behavior signals. The implementation calculates relevance scores using multiple factors including term frequency, document length, and content authority.

Here's the sophisticated ranking implementation:


function rankResults(results, searchTerms) {
  return results
    .map(result => {
      const score = calculateRelevanceScore(result, searchTerms);
      return { ...result, finalScore: score };
    })
    .sort((a, b) => b.finalScore - a.finalScore);
}

function calculateRelevanceScore(result, searchTerms) {
  let score = 0;
  
  // TF-IDF base scoring
  searchTerms.forEach(term => {
    const tf = result.document.content[term] || 0;
    const idf = calculateIDF(term, globalStats);
    score += (tf / result.document.metadata.wordCount) * idf;
  });
  
  // Content-based boosting
  score *= result.document.boost;
  
  // Title match boosting
  const titleMatches = searchTerms.filter(term => 
    result.document.title.toLowerCase().includes(term)
  ).length;
  score *= (1 + (titleMatches * 0.3));
  
  // URL structure boosting
  if (result.document.url.includes(searchTerms.join('-')) {
    score *= 1.2;
  }
  
  // Freshness boosting for recent content
  const daysOld = (Date.now() - new Date(result.document.metadata.date)) / (1000 * 3600 * 24);
  const freshnessBoost = Math.max(0.5, 1 - (daysOld / 365));
  score *= freshnessBoost;
  
  return score;
}

function calculateIDF(term, globalStats) {
  const docFrequency = globalStats.termFrequency[term] || 1;
  return Math.log(globalStats.totalDocuments / docFrequency);
}

Query Performance Optimization and Caching

Query performance optimization involves multiple caching layers, query planning, and result prefetching. The system implements a sophisticated caching strategy that balances freshness with performance.

The caching architecture includes:


// Multi-layer caching strategy
const CACHE_STRATEGY = {
  // L1: In-memory cache for hot queries (1 minute TTL)
  memory: new Map(),
  
  // L2: Worker KV cache for frequent queries (1 hour TTL)  
  kv: env.QUERY_CACHE,
  
  // L3: R2-based shard cache with compression
  shard: env.SEARCH_BUCKET,
  
  // L4: Edge cache for popular result sets
  edge: caches.default
};

async function executeQueryWithCaching(query, env, ctx) {
  const cacheKey = generateCacheKey(query);
  
  // Check L1 memory cache
  if (CACHE_STRATEGY.memory.has(cacheKey)) {
    return CACHE_STRATEGY.memory.get(cacheKey);
  }
  
  // Check L2 KV cache
  const cachedResult = await CACHE_STRATEGY.kv.get(cacheKey);
  if (cachedResult) {
    // Refresh in memory cache
    CACHE_STRATEGY.memory.set(cacheKey, JSON.parse(cachedResult));
    return JSON.parse(cachedResult);
  }
  
  // Execute fresh query
  const results = await executeFreshQuery(query, env);
  
  // Cache results at multiple levels
  ctx.waitUntil(cacheQueryResults(cacheKey, results, env));
  
  return results;
}

// Query planning optimization
function optimizeQueryPlan(searchTerms, shardMetadata) {
  const plan = {
    shards: [],
    estimatedCost: 0,
    executionStrategy: 'parallel'
  };
  
  searchTerms.forEach(term => {
    const termShards = shardMetadata.getShardsForTerm(term);
    plan.shards = [...new Set([...plan.shards, ...termShards])];
    plan.estimatedCost += termShards.length * shardMetadata.getShardCost(term);
  });
  
  // For high-cost queries, use sequential execution with early termination
  if (plan.estimatedCost > 1000) {
    plan.executionStrategy = 'sequential';
    plan.shards.sort((a, b) => a.cost - b.cost);
  }
  
  return plan;
}

This distributed search architecture enables Jekyll sites to handle millions of documents with sub-100ms query response times. The system scales horizontally by adding more R2 buckets and shards, while the Worker-based processing ensures consistent performance regardless of query complexity. The implementation provides Google-grade search capabilities while maintaining the cost efficiency and simplicity of static site generation.