Decoding the City's Pulse: Analyzing NYC 311 Data with LLMs

NYC’s 311 system generated ~3.5 million service requests in 2024, a perfect dataset for testing whether local LLMs can handle large-scale text analysis without API costs or rate limits. In this post, I’ll walk through how I used a local LLM to categorize and analyze these requests, revealing insights into the city’s pulse.

Instead of processing each request individually, I reduced the dataset to ~1,200 unique combinations of agency, complaint type, and descriptor fields, then used a local LLM to categorize these patterns. This approach works well for pre-computing categories to later use for analysis, such as identifying hotspots or trends.

Visualizing the Baseline Data

Before diving into LLM categorization, let’s examine the spatial and temporal patterns in the raw data:

The heatmap shows expected patterns, high density correlates with population density. More interesting is the per-capita analysis:

Some hotspots with anomalously high request rates (highest month per zipcode):

Zip Code	Neighborhood	Month	Requests per 1000 Residents
10466	Wakefield	Dec 2024	275
10004	Financial District	Oct 2024	241
11239	East New York	Dec 2024	166
11366	Fresh Meadows	Jun 2024	158
10006	Financial District	Jul 2024	109
11101	Long Island City	Sep 2024	101
10018	Garment District	Oct 2024	95
10007	Tribeca	Sep 2024	92
10464	City Island	Jun 2024	86
10036	Hell’s Kitchen	Oct 2024	81

The Financial District’s high rates likely reflect commercial density rather than residential issues, while areas like Wakefield and East New York suggest genuine challenges.

LLM-Powered Categorization

To extract insights from this massive dataset, I integrated a local LLM directly into DuckDB using a custom User Defined Function (UDF). This approach combines SQL’s analytical power with modern language models, processing and categorizing data at scale within a single query.

DuckDB + LLM Integration

I created a Python UDF that connects DuckDB to a local install of LM Studio, serving the gemma-3-27b-it model via OpenAI-compatible API.

Here’s the core implementation:

# This file contains the UDF for calling the local LLM API.
# Register the Python function as a scalar UDF
# con.create_function("prompt", prompt, [str, str, str, float], str)

import json

import requests

MODEL = "gemma-3-27b-it"
MODEL_TEMP = 0.4
MODEL_MAX_TOKENS = -1


# Define the UDF
def prompt(
    prompt_text: str,
    system_message: str | None = None,
    json_schema: str | None = None,
    temperature: float | None = None,
) -> str:
    """
    Calls the local LLM API and returns the response.
    """
    url = "http://localhost:1234/v1/chat/completions"
    headers = {"Content-Type": "application/json"}
    payload = {
        "model": MODEL,
        "messages": [
            {"role": "user", "content": prompt_text},
        ],
        "temperature": MODEL_TEMP,
        "max_tokens": MODEL_MAX_TOKENS,
        "stream": False,
    }

    # Set system message if provided
    if system_message is not None:
        payload["messages"].insert(0, {"role": "system", "content": system_message})

    # Set temperature if provided
    if temperature is not None:
        payload["temperature"] = temperature

    if json_schema:
        payload["response_format"] = {
            "type": "json_schema",
            "json_schema": json.loads(json_schema),
        }

    try:
        response = requests.post(url, headers=headers, data=json.dumps(payload))
        response.raise_for_status()  # Raise an error for HTTP codes 4xx/5xx

        return (
            response.json()
            .get("choices", [{}])[0]
            .get("message", {})
            .get("content", "No response")
        )
    except requests.exceptions.RequestException as e:
        # Raise an exception to ensure the query fails
        raise RuntimeError(f"API request failed: {str(e)}") from e
    except Exception as e:
        # Handle any other exceptions
        raise RuntimeError(f"An error occurred: {str(e)}") from e

This UDF handles the API communication, error handling, and JSON schema enforcement. The open_prompt DuckDB extension provides similar functionality, but I opted for a custom UDF to ensure I had full control over the model usage and response formatting.

SQL-Powered Categorization

The following python code snippet demonstrates how to use the UDF within a DuckDB query to categorize service requests based on agency, complaint type, and description. The LLM processes each unique combination, returning a JSON object with the assigned category and subcategory.

Load the UDF and other necessary libraries, then define the system prompt and JSON schema for the LLM:

import json

import duckdb
from jinja2 import Template

from src.duckdb_prompt_udf import prompt

# Connect to DuckDB
con = duckdb.connect(database=":memory:", read_only=False)

# Register the Python function as a scalar UDF
con.create_function("prompt", prompt, [str, str, str, float], str)

Define the system prompt and JSON schema for the LLM. The system prompt instructs the model to categorize service requests based on predefined categories and subcategories, while the JSON schema ensures valid responses:

system_prompt = (
    "You are a government auditor reviewing New York's 311 service request system.\n\n"
    "Your task:\n"
    "1. Review the given service request details (agency, complaint type, and description)\n"
    "2. Choose the most appropriate category and subcategory from the provided CATEGORIES json structure\n"
    "3. The sample data in CATEGORIES json serves as guidance for classification\n\n"
    "Important rules:\n"
    "- Only use CATEGORIES and SUBCATEGORIES that exist in the provided CATEGORIES json block\n"
    "- Do NOT use the sample labels as categories or subcategories\n"
    "- Do NOT create new categories\n\n"
    "Response format:\n"
    "```json\n"
    '{"category": "STRING", "subcategory": "STRING"}\n'
    "```\n"
    "Note: Your response must contain ONLY the JSON object, nothing else."
)

#  Define the JSON schema for the response
json_schema = json.dumps(
    {
        "name": "category_response",
        "type": "object",
        "strict": "true",
        "schema": {
            "type": "object",
            "properties": {
                "category": {
                    "type": "string",
                },
                "subcategory": {
                    "type": "string",
                },
            },
        },
        "required": ["category", "subcategory"],
    }
)

Finally, we define the SQL query that uses the UDF to categorize service requests. The query processes the dataset, calling the LLM for each unique combination of agency, complaint type, and descriptor. The prompt function formats the request, while the JSON schema ensures valid responses.

# Define the SQL query template
query_template = """
COPY (
    WITH llm_categorization AS (
        SELECT
            regexp_replace(
                prompt(
                    '# CATEGORIES:\n'
                    || '```json\n'
                    ||categories::VARCHAR || '\n'
                    || '```\n'
                    ||'# SERVICE REQUEST:\n'
                    ||'AGENCY: ' || IFNULL(agency, 'N/A') || '\n'
                    ||'COMPLAINT TYPE: ' || IFNULL(complaint_type, 'N/A')  || '\n'
                    ||'DESCRIPTION: ' || IFNULL(descriptor, 'N/A') || '\n',
                    ''::VARCHAR,
                    ''::VARCHAR,
                    ::FLOAT
                ),
                '```json|```',
                '',
                'g'
            )::VARCHAR AS raw_llm_response,
            json_extract_string(raw_llm_response, '$.category')::VARCHAR AS category,
            json_extract_string(raw_llm_response, '$.subcategory')::VARCHAR AS subcategory,
            agency,
            complaint_type,
            descriptor,
            request_count,
        FROM (
            SELECT 
                agency,
                complaint_type,
                descriptor,
                count(*) AS request_count
            FROM "" 
                group by 1,2,3
                order by 4 desc
                limit 
        ) CROSS JOIN read_json('./data/categories.json')
    )

    SELECT
        agency,
        complaint_type,
        descriptor,
        category,
        subcategory,
        raw_llm_response,
        request_count
    FROM llm_categorization 
) TO '';
"""

# # Render the template with variables
template = Template(query_template)
query = template.render(
    system_prompt=system_prompt,
    json_schema=json_schema,
    temperature=0.2,
    data_file="./data/cityofnewyork/service_requests_2024.parquet",
    output_file="./output/llm_categorize_output_2024.csv",
    limit=1200,
)

# Execute the query
print("Executing query...")

result = con.execute(query).fetchall()
print(result)

The input dataset is a Parquet file containing NYC 311 service requests for 2024, which is read into DuckDB. The query deduplicates the dataset to focus on unique combinations of agency, complaint type, and descriptor, significantly reducing the number of LLM calls required.

The categories.json file contains the predefined categories and subcategories, which the LLM uses to classify requests. The query deduplicates the dataset, reducing it from 3.5 million records to just ~1,200 unique combinations.

The resulting categorized data is saved to a CSV file, which can be used for further analysis, by loading it back into DuckDB or any other data analysis tool.

Processing Results

Processing this SQL query with the LLM integration took about 60 minutes on a MacBook Pro. Each LLM call took 2-4 seconds, with the bottleneck being model inference rather than data processing.

The model achieved high categorization accuracy. I manually reviewed a random sample of 200 requests and found ~196 correctly assigned. This 98% figure should be taken as indicative rather than rigorous.

Breaking down the data

Now that we have the data categorized, we can start to break it down by category and subcategory to identify trends.

The top 10 request categories/subcategories for 2024 are:

Category	Subcategory	Request Count
Public Safety & Order	Parking	796,805
Public Safety & Order	Noise & Disturbances	752,910
Housing & Infrastructure	Building & Utilities	715,222
Environmental Health & Sanitation	Waste Management & Sanitation	247,084
Housing & Infrastructure	Street & Sidewalk Conditions	234,890
Environmental Health & Sanitation	Animals & Pests	129,931
Hazardous Conditions	Water Quality & Leaks	113,746
Government & Community Services	Parks & Community	104,311
Public Safety & Order	Non-Emergency Police Matters	95,088
Consumer & Business Services	Consumer Complaints	64,080

These results align well with the primary functions of NYC’s 311 system, showing that our LLM-powered categorization has successfully identified the most common types of citizen requests across the city.

Conclusion

This analysis demonstrates how LLMs can automatically categorize large-scale government datasets at zero marginal cost. By integrating a local LLM into DuckDB through a custom UDF, we processed 3.5 million NYC 311 requests with 98% accuracy in 60 minutes on consumer hardware.

The key technical insight—deduplication before processing—reduced computational requirements by 99.97% while maintaining analytical value. This approach proves that sophisticated text analysis can be performed cost-effectively without relying on expensive cloud APIs.

Performance characteristics:

Input: 3.5M records → 1,200 unique combinations
Processing: 60 minutes on M4 Max MacBook Pro with 36GB RAM
Accuracy: 98% on manual validation sample
Cost: $0

The complete methodology could be applied to any large-scale text classification task where deduplication is possible, or sufficiently unique combinations exist. This opens up new possibilities for real-time categorization of citizen requests, customer feedback, and other large text datasets.

What’s Next?

Having established a working methodology for LLM-powered categorization of NYC 311 data, several analytical directions could provide deeper insights:

Seasonal Patterns: Analyze how complaint categories vary throughout the year—do heating complaints surge in winter while noise complaints peak in summer?

Neighborhood Clustering: Combine our categorized data with the geographic hotspots to identify which neighborhoods consistently report specific types of issues.

Response Time Analysis: Use LLMs to extract urgency indicators from complaint text and correlate with actual resolution times across different categories.

Predictive Modeling: Train machine learning models on the categorized data to predict future complaint trends based on historical patterns.

The complete code is available in the markusos/llm_duck repository.