Characterizing Unfamiliar JSON Without Reading It

I have a badly-behaved MCP that returns gigantic JSON objects. Since I’m running it through a gateway, I might as well save money and context by simplifying it.

When an AI agent needs to understand the shape of a JSON response before writing code against it, the naive approach is to read the file. A sample response from the MCP proved to be 22kB. Dumping the entire thing into context would cost about ~8.5k tokens. You’ll get the answer, but you pay for every repeated field name, every redundant value, every null you already suspected was there.

jqi is a structural profiler for JSON. Instead of showing you the data, it shows you the shape of the data — field paths, types, value ranges, cardinalities, and enum-like distributions. Its designed to be used by an LLM (not a human) alongside jq to interactively explore and understand arbitrary JSON structures.

When I gave Sonnet 4.6 this (relatively small) file and asked it to find me opportunities to optimize the output, jqi saves 6.5k tokens, around 20%. Without jqi, a fresh instance of Sonnet 4.6 read most of the file to read and reason about the data directly.

Category	With `jqi`	Without `jqi`
Messages	10.7k	17.2k
Total context used	26.1k	32.6k

(This includes the overhead of reading the skill and installing the tool.)

The skill (including instructions to install it) are available on GitHub.

The Workflow

# Step 1: Orient
jqi --nest-depth 1 logs/test.json
# Step 2: Profile the repeated structure:
jq -c '.response_payload.edges[]' logs/test.json | jqi -

.node               : object
.node._id           : string  len=17..17  [all distinct]
.node.adjustedDeadline: bool
.node.agileStoryPoints: int  min=0  max=0
.node.archived      : bool
.node.assignedBy    : "arhAoLQYgXeZv3gM7"  "arhAoLQYgXeZv3gM7{...}arhAoLQYgXeZv3gM7"
.node.customStatus  : "In Progress" | "To do" | "Consider Doing" | "Do Next" | "Immediate" | "To Do"  len=5..14
...
.node.isBlocked     : bool
.node.isRisk        : bool
.node.labels        : array  len=1 (constant)  of string
.node.linkTargets   : array  len=1 (constant)  of object  (in 8%)

Key discoveries include redundant values (markdownDescription and description), dead weight objects (timeTracking, privacy), and even structural dependencies (customStatus is strictly more informative than status) The few things that needed a second look were resolved with focused jq + jqi calls, each taking under a second and returning a few lines of output.

Sonnet was able to make complete recommendations without ever reading a contiguous chunk from the file.

Why This Matters for Agents

The standard advice for keeping agent context lean is: “don’t read files you don’t need to.” But sometimes you do need to understand a file — you just don’t need to read all of it. The structural summary produced by jqi is almost always sufficient to:

Identify which fields are always null/empty/constant
Find enum-like fields and their actual values
Spot optional fields and their presence rates
Detect type inconsistencies or mixed-type arrays

None of that requires the raw data in context. A 28KB file becomes a 30-line summary. At scale — longer files, more tool calls, longer conversations — the compounding effect is significant.

The right tool for the job isn’t always the one that gets you the most information. Sometimes it’s the one that gets you exactly the information you need, and no more.

Characterizing Unfamiliar JSON Without Reading It

A new tool for AI to grapple with large JSON files. (link)

The Workflow

Why This Matters for Agents

Sitemap

Connect