There’s a simple way to get LLMs to produce better outputs: give the model more context.
And that context can take many different shapes and sizes — raw text, pdfs, screenshots and PowerPoint presentations.
But what happens when you want to give an LLM a large structured data set?
One with many properties and interdependencies?
You can’t just upload a database. (And if you could, who has a random database just sitting on your Google Drive?)
Thankfully there’s a funky file format to the rescue.
AI has fundamentally changed the way I work. As a small business owner, AI enhances every single part of my business — strategy, marketing, operations, legal and product design. The leverage is unreal and I want to teach you how to Master ChatGPT in less than a month.
But first, the steadfast and earnest CSV
Before we get into JSON, we must salute one of the most dependable file formats in our toolkit: the csv.
The comma-separated values file is clean and lightweight. Because it’s just text (which organizes similar columns via the comma “delimiter”), csv files play very nicely with the underlying tokenization technology of an LLM.
Here’s a simple example of a csv file:
Symbol, Shares, Price, Sector
AAPL, 100, 203.74, Technology
JPM, 50, 289.63, Financials
TLSA, 40, 307.23, Technology
However, the csv isn’t infallible. Its simplicity begets its shortcomings.
Specifically, its 2-dimensional structure can’t represent relationships and more complex data sets.
Introducing CSV’s sturdier counterpart
JSON, which stands for JavaScript Object Notation, is a “flat” way to represent complex data. Claude explained JSON as:
A “filing system” for digital data — just like how you organize physical documents in folders, JSON organizes digital information in a structured, readable way that any computer system can understand.
If we expand on our portfolio holdings example, here’s how we could represent a single stock, like AAPL using JSON:
Here you can see JSON putting some order to the chaos.
First, all the data corresponds to APPL. You have company fundamentals (revenue, PE Ratio), descriptive data (comparable companies and risks) and internal notes (which could contain paragraphs of text information).
Next you’ll see nested data (e.g. fundamentals inside a company) and arrays (e.g. each comparable company is represented with its own set of “columns”).
It’s also really easy to extend this format to include another ticker, TLSA:
And finally, here’s a JSON file encoding Slack messages:
Here you’ll see:
Message text
User handles and display names
Reacji
Channel information
What’s wonderful about this format is that it all gets saved as a flat text file — just like a CSV.
How would you create a JSON file?
I know what you’re thinking.
You’re seeing this file with curly brackets, strange indentations, Christmas colors, commas, quotes and colons — and wondering:
Who on earth would create a file like this?
Thankfully, you don’t write JSON manually.
JSON is predominantly a format that is exported from other apps.
It’s typically created for passing data between APIs (as well the LLM-friendly Model Context Protocol).
For example, my entire blogging history at RadReads can be exported into a 38 megabyte JSON file from the blogging platform WordPress.
The full text of each blog post was in the one line titled body (shown by the number 1 in the image above).
(Fun fact: I used this JSON file when I migrated 400 old posts over to khehy.com using the AI coding assistant, Cursor.)
How do you work with JSON?
Remember, despite the complexity of its underlying JSON is just a plain old text file.
So JSON can be easily read by an LLM model (and ChatGPT’s o3 can automatically run complex analyses on JSON).
I expanded our portfolio holdings file to include 20 companies — which ended up being 2,100 lines of text1.
and then ran the following prompt:
“You are an equity-data analyst.
Build a ‘Fundamentals — At-a-Glance’ table from the attached 20-company JSON with columns: Ticker, Sector, Revenue (USD bn), YoY Revenue Growth (%), Net Income (USD bn), EPS, P/E, Dividend Yield (%), Free Cash Flow (USD bn), Net Debt (USD bn; negative = net cash).
Round appropriately and add three bullets: highest/lowest P-E, biggest net-cash, fastest grower.
No extra commentary.
Here was ChatGPT’s o3 Output:
Next, I used the new and improved Claude Opus 4.1 to analyze the quality of each company’s profitability. Here’s the prompt:
Analyze the attached financial data to identify potential earnings quality issues and investment red flags:
1. Calculate the quality of earnings score for each company by comparing: net income to FCF conversion, margin trends, and revenue growth sustainability
2. Identify companies where PE ratios seem disconnected from fundamentals (too high or suspiciously low)
3. Flag companies with concerning combinations: high revenue growth but declining margins, high PE but low growth, negative FCF despite positive earnings
4. Rank companies by "accounting aggression risk" based on the divergence between reported metrics and cash generation
5. Identify the 3 companies most likely to disappoint in next earnings and 3 most likely to surprise positively
Present as a risk report with specific metrics, warning signals, and recommended actions for each flag identified.
Claude returned a full assessment with quality scores and recommendations:
For my last analysis, I decided to leave ChatGPT/Claude and use the programming and visualization tools in Python. While I don’t know Python, I am familiar with the AI-coding platform Cursor and gave the following prompt:
In python, create a detailed dashboard of this json file
Cursor worked for 10 minutes and created a dynamic dashboard using the graphing library Plotly and the web app framework Streamlit:
It’s your turn to try JSON
Try going into one of your frequently used apps and data sources to see if they have a JSON export. Some ideas include:
LinkedIn history
Google “Takeout” (spanning Gmail, Calendar and Contacts)
Federal Reserve data (via FRED)
WordPress archive
Once you have the file first try opening to understand its content and structure. Then run a simple prompt using a powerful model like o3.
Next, sit back, and watch the magic ensue.
For purposes of this demo, I asked ChatGPT to create the data. I have not sanity checked it and suspect that portions are both inaccurate or hallucinated. Therefore, the downstream analyses are also inaccurate. This was an editorial decision done purely to explain how to use JSON and Python.
I'm curious how y'all are using JSON?