Abstract
Querying JSON data within Oracle improves year by year. For nearly 50 years, Oracle has been a relational database relying on linking data between tables. JSON often contains arrays of data within elements. Querying JSON successfully relies on more than knowing syntax, but also understanding how the data structures differ. We will explore SQL select statements to parse JSON data including arrays. You’ll need to utilize these skills while developing Application Programming Interfaces (API).
This is part of a series of articles on developing and supporting API written with Oracle PL/SQL and Oracle APEX.
Keywords
Oracle Native JSON, Oracle JSON_TABLE, Oracle apex_json, plsql, plsql api, Restful API
Introduction and History
Parsing JSON data may be accomplished within an Oracle select statement. Years ago, we relied on parsing the data with APEX functions such as apex_json.parse
and apex_json.get_varchar2
. When I compare an older API I wrote in PLSQL using these apex_json
functions to the current technique involving the select statement, we reduce the amount of code needed by 75%. Reducing code with the select statement reduces the risk of errors and reduces the cost of developing API.
One can observe a trend at Oracle that continues to include JSON as an important and ongoing source of data. With Oracle 12c, the support for native JSON improved. This was released in July of 2013. Publishing a blog about this in 2021 does not push the envelop on early adoption of technology. Instead, I indent to increase parity between the JSON datatype support with the increasing support of RESTful API.
RESTful API development and rapid parsing of JSON data must travel hand-in-hand.
I write with a slight bias that involves fetching data via a RESTful GET. In an effort to provide a single point of reference, I’ll touch on resources that permit hosting your own data in a RESTful format for others to perform a GET.
API Series
This article is part of a series of articles about developing an API with Oracle PL/SQL and Oracle APEX.
- APEX Debug Messages – standardizing error collection, debugging, and trace
- Oracle Pl/SQL Template for RESTful API
Structure
Nearly all of the Oracle documentation, guidance, and related blog posts start with the awkward premise of populating the data. I am going to skip this step. Finding interesting JSON data on the web is easier than creating JSON data. Additionally, creating JSON data from a blog post may introduce an error that will be impossible to find. Most of us that attend “Internet University” have experienced copying text from a website only to find unprintable (hidden) ASCII characters that waste an hour of troubleshooting. Sometimes even the process of pasting to a text-based note pad then copying won’t avoid these problems.
You can fetch some JSON data at the following links:
And if you are reading this article, then you likely have an API target selected. Go pull data from there with your browser, cURL, Postman or something.
I tend to query JSON from our API_STAGING table. You can find this table structure at my github:
https://github.com/cmoore-sp/plsql-api/blob/f66d0e6865bdc2a40f40f0c7560ed6820bfb27d9/api_setup.sql
Table API_STAGING
As mentioned, you can copy/paste the API_STAGING table structure from the GitHub link above. This table includes a CLOB for JSON data. I do not put in the JSON integrity checks, just as a personal preference. Because we are working with is coming from a remote connection, I decided it is better to accept the data as presented then generate an error. It is easier to have the data then diagnose the issue. The option is to give an error before seeing the problem.
CREATE TABLE API_STAGING (
STAGING_PK NUMBER generated by default on null as identity,
SCHEMA_NAME VARCHAR2(30),
API_NAME VARCHAR2(100),
API_MODULE VARCHAR2(100),
DATA_TYPE VARCHAR2(100),
ACTION VARCHAR2(100),
ACTION_DATE DATE,
BASE_URL VARCHAR2(1000),
APPEND VARCHAR2(2000),
URL VARCHAR2(4000),
STATUS_CODE VARCHAR2(10),
JSON_RESPONSE CLOB,
HTTP_RESPONSE CLOB,
BODY CLOB,
DELETE_OK DATE );
This table is used for the data coming from a remote site. JSON data drops into the JSON_RESPONSE clob. In the event that we get errors or an HTML response, we put that into the HTTP_RESPONSE field. When preparing data to be pushed to a remote site, we can use the BODY blob for a payload. Our DELETE_OK field aids us with clean up which is done via a scheduler.job regularly. The table serves as a log for the RESTful activity between Oracle and the RESTful target.
When I find JSON I wish to explore, I copy the JSON into a new row and paste it in the JSON_RESPONSE clob. Now I don’t worry about typos (my own anyway). The second advantage is query/code I develop to explore the data will dovetail into my final product. I don’t have to re-do efforts. The query starts with:
select
s.staging_pk
from api_staging s,
json_table (s.json_response, $ columns
JSON Data Types
As a courtesy and quick reference because JSON data types and Oracle data type differ, I’ll provide a quick list.
- String
- Boolean (true or false)
- Number
Some list Arrays as a data type – fine, you are correct too. They are a data type in as much as JSON is a data type or Object is a data type. I’ll return to this in a bit.
Oracle does not have a boolean datatype within tables (at least not through 19c). Oracle 21C has numerous JSON features. We can explore these in another post during 2021. Furthermore, Oracle data fields are powerful and useful. JSON dates tend to be in the ISO-8601 format which we normally convert with a function based on this model:
return to_date(substr(P_JSON_DATE,1,16),'YYYY-MM-DD"T"HH24:MI') ;
Frankly, we store the JSON boolean data as varchar until we migrate the data further into our Oracle tables. In our intermediate staging tables, the data is hybrid of Oracle structures (VARCHAR2, NUMBER, DATE) and JSON stuff such as booleans.
JSON Objects & Arrays
I came to JSON with an Oracle brain that had spent years frustrated with poor solutions including CSV. My words and thoughts are wrong, I know that but when I see JSON Arrays I think “Rows”. When I see JSON Objects I think: Column. Sure, wrong and also not universal, but really close. So many people use arrays for small data sets like latitude and longitude.
The squiggle brackets define the limits of Objects. The square bracket provide limits for arrays. Here is a classic example (and a lovely example too) – note this is partial JSON and not expected to validate:
{
"count": 834,
"entries": [{
"API": "Cat Facts",
"Description": "Daily cat facts",
"Auth": "",
"HTTPS": true,
"Cors": "no",
"Link": "https://alexwohlbruck.github.io/cat-facts/",
"Category": "Animals"
}, {
"API": "Cataas",
"Description": "Cat as a service (cats pictures and gifs)",
"Auth": "",
"HTTPS": true,
"Cors": "unknown",
"Link": "https://cataas.com/",
"Category": "Animals"
}, {
"API": "catAPI",
"Description": "Random pictures of cats",
"Auth": "",
These data came from https://api.publicapis.org/entries.
I love that the JSON object starts with a count. The array has the name “entries”. The array has the square bracket. Array elements are separated with comma. In this case, the phrase “arrays are rows” holds up. In some cases people use arrays to represent sets of data such the latitude and longitude as noted below. Yes, array! No, not really data rows.
"geometry": {
"type": "Point",
"coordinates": [
-71.049999999999997,
42.350000000000001
]
},
Querying JSON data
Numerous on-line articles and Oracle manuals provide great guidance on chasing down the dotted notation for data such as geometry.type
The challenge we’re constantly looking up within our own code is the managing of arrays. Some API provide a count or other meta data in the first elements. These first elements give you a handle to start picking data from. Many folks also name their arrays such as:
"entries": [{
"API": "Cat Facts",
The array name is entries so when doing the query, the query looks like this:
select
s.staging_pk
,json.count
,json.api_name
,json.description
,json.auth
,json.https
,json.cors
,json.link
,json.category
from api_staging s,
json_table
(s.json_response, '$' columns
count number path '$.count',
nested path '$.entries[*]' columns
(
api_name varchar2(255) path '$.API'
,description varchar2(255) path '$.Description'
,auth varchar2(255) path '$.Auth'
,https varchar2(255) path '$.HTTPS'
,cors varchar2(255) path '$.Cors'
,link varchar2(255) path '$.Link'
,category varchar2(255) path '$.Category'
)
) json
where api_module = 'api_openapi.entry_get' and delete_ok is null and status_code = 200
order by json.api_name
The $ indicates the starting point. The nested path dives deeper in to elements. The [*] creates a new line for each row within the array. The api_name is basically entries.api.
What if [{ is first?
Sometimes the JSON data does not contain meta data such as a count or the URI or other useful stuff at the front. Sometimes API authors do not name their arrays. When this happens, the first two characters in the JSON data are [{. We experienced this when writing an API for Okta. In this case, the Oracle syntax compresses to ‘$[*]’. Very difficult to perform an internet search for this situation and I find it hard to remember. The resultant query looks like this (an example from our Okta API):
for j in (
select
json.okta_id
,json.status
,json.status_changed
,json.first_name
,json.last_name
,json.email
from api_staging s,
json_table
(s.json_response, '$[*]' columns
okta_id varchar2(128) path '$.id'
status varchar2(128) path '$.status'
status_changed varchar2(128) path '$.statusChanged'
first_name varchar2(128) path '$.profile.firstName'
last_name varchar2(128) path '$.profile.lastName'
email varchar2(128) path '$.profile.email'
) json
where staging_pk = P_STAGING_PK
) loop
Sample JSON data related to this query is (again this is a partial selection and will not validate as JSON):
[{
"id": "00u2iy2tcgELZEoUR357",
"status": "PROVISIONED",
"created": "2020-01-17T20:30:51.000Z",
"activated": "2020-01-17T20:30:51.000Z",
"statusChanged": "2020-01-17T20:30:51.000Z",
"lastLogin": null,
"lastUpdated": "2020-01-17T20:30:51.000Z",
"passwordChanged": null,
"profile": {
"firstName": "One",
"lastName": "Two",
Multiple Arrays?
Sometimes the JSON data will map to two or more tables within your Oracle structure, classically a header and detail table. When examining JSON data we may find an array for the “header” data, then a nested array for the child or detail data. As we extract the data from the API staging table, we use nested loops with implicit cursors (for loops with a select statement).
When querying JSON data, we flatten out the complexity we see. Then we face a classic situation of remodelling the relational structure that was present within the JSON data and that we want in our Oracle tables.
Sometimes, we use one loop for the data set and track of the header data changed as we spin through the JSON array. It is both a matter of style and a matter of data presentation. While writing I am looking at an example of an invoice from an API by Recurly. The invoice has an invoice header and invoice detail (line items).
Here is code (it is a bit long, sorry)
for j in (
select
object
,invoice_id
,account_id
,invoice_number
,subscription_id
,plan_id
,addon_id
,addon_code
,accounting_code
.... abbreviated
,line_item_id
,item_id
.... abbreviated
from api_staging s,
json_table
(s.json_response, '$' columns object varchar2(128) path '$.object'
,account_id varchar2(255) path '$.account.id'
,collect_method varchar2(50) path '$.collection_method'
,po_number varchar2(50) path '$.po_number'
.... abbreviated
,nested path '$.line_items.data[*]' columns
(
line_item_id varchar2(128) path '$.id'
,subscription_id varchar2(255) path '$.subscription_id'
... abbreviated
)
) js
where staging_pk = P_STAGING_PK
order by invoice_id
) loop
Then as we spin through the rows, we test each loop to see if the header is the same as the prior one or new.
--
-- I N V O I C E H E A D E R
--
-- if the invoice_id is "new" (does not match the current), then insert or update the invoice header data
if j.invoice_id <> l_current_invoice_id then -- this is a new invoice header for this run of the code
l_current_invoice_id := j.invoice_id ;
l_loop_counter := 1;
-- does the invoice exist?
select count(*) into l_select_count from rc_invoice where invoice_id = j.invoice_id;
-- convert the date
r_invoice.invoiced_on := json_to_date(j.invoiced_on_json);
-- invoice does not exist, so insert it
if l_select_count = 0 then
.... continued
As we plod along we process each invoice detail row.
--
-- I N V O I C E D E T A I L
--
-- we need the invoice_pk
select count(*) into l_select_count from rc_invoice where invoice_id = j.invoice_id;
if l_select_count = 1 then
select * into r_invoice from rc_invoice where invoice_id = j.invoice_id;
else
raise_application_error (-20000,'Invoice PK for: ' || j.invoice_id|| ' is null in: ' || l_procedure); -- blow up we can't be here without an invoice header row!
end if;
-- does the invoice line item exist?
select count(*) into l_select_count from rc_invoice_item where line_item_id = j.line_item_id;
l_item_pk := item_pk(j.item_id);
l_start_date := rc_staging.json_to_date(j.start_date);
l_end_date := rc_staging.json_to_date(j.end_date);
The Right Way
There is no right way. First, JSON is not a protocol but a data format. We don’t get to argue right/not-right in such an environment. Furthermore, REST (representational state transfer) is an “architectural style”. It is not a protocol that require adherence to a limited set of rules. My answer is to play and have fun. Maybe your API host provides nice meta data and counters at the top. Maybe they name their arrays and keep the nesting of the data to something that doesn’t break my Oracle-centric brain. I see some 1980s and 1990s legacy dependency on code and loops in my style. And I see the value of embracing JSON. To each their strength.
Have some fun and stay safe.
cmoore