Querying JSON with Oracle SQL | Storm Petrel, LLC

Abstract

Querying JSON data within Oracle improves year by year. For nearly 50 years, Oracle has been a relational database relying on linking data between tables. JSON often contains arrays of data within elements. Querying JSON successfully relies on more than knowing syntax, but also understanding how the data structures differ. We will explore SQL select statements to parse JSON data including arrays. You’ll need to utilize these skills while developing Application Programming Interfaces (API).

This is part of a series of articles on developing and supporting API written with Oracle PL/SQL and Oracle APEX.

Keywords

Oracle Native JSON, Oracle JSON_TABLE, Oracle apex_json, plsql, plsql api, Restful API

Introduction and History

Parsing JSON data may be accomplished within an Oracle select statement. Years ago, we relied on parsing the data with APEX functions such as apex_json.parse and apex_json.get_varchar2. When I compare an older API I wrote in PLSQL using these apex_json functions to the current technique involving the select statement, we reduce the amount of code needed by 75%. Reducing code with the select statement reduces the risk of errors and reduces the cost of developing API.

One can observe a trend at Oracle that continues to include JSON as an important and ongoing source of data. With Oracle 12c, the support for native JSON improved. This was released in July of 2013. Publishing a blog about this in 2021 does not push the envelop on early adoption of technology. Instead, I indent to increase parity between the JSON datatype support with the increasing support of RESTful API.

RESTful API development and rapid parsing of JSON data must travel hand-in-hand.

I write with a slight bias that involves fetching data via a RESTful GET. In an effort to provide a single point of reference, I’ll touch on resources that permit hosting your own data in a RESTful format for others to perform a GET.

API Series

This article is part of a series of articles about developing an API with Oracle PL/SQL and Oracle APEX.

Structure

Nearly all of the Oracle documentation, guidance, and related blog posts start with the awkward premise of populating the data. I am going to skip this step. Finding interesting JSON data on the web is easier than creating JSON data. Additionally, creating JSON data from a blog post may introduce an error that will be impossible to find. Most of us that attend “Internet University” have experienced copying text from a website only to find unprintable (hidden) ASCII characters that waste an hour of troubleshooting. Sometimes even the process of pasting to a text-based note pad then copying won’t avoid these problems.

You can fetch some JSON data at the following links:

https://jsonAPI.org/examples/

https://api.publicapis.org/

And if you are reading this article, then you likely have an API target selected. Go pull data from there with your browser, cURL, Postman or something.

I tend to query JSON from our API_STAGING table. You can find this table structure at my github:

https://github.com/cmoore-sp/plsql-api/blob/f66d0e6865bdc2a40f40f0c7560ed6820bfb27d9/api_setup.sql

Table API_STAGING

As mentioned, you can copy/paste the API_STAGING table structure from the GitHub link above. This table includes a CLOB for JSON data. I do not put in the JSON integrity checks, just as a personal preference. Because we are working with is coming from a remote connection, I decided it is better to accept the data as presented then generate an error. It is easier to have the data then diagnose the issue. The option is to give an error before seeing the problem.

CREATE TABLE API_STAGING (
  STAGING_PK    NUMBER generated by default on null as identity, 
  SCHEMA_NAME   VARCHAR2(30), 
  API_NAME      VARCHAR2(100), 
  API_MODULE    VARCHAR2(100), 
  DATA_TYPE     VARCHAR2(100), 
  ACTION        VARCHAR2(100), 
  ACTION_DATE   DATE, 
  BASE_URL      VARCHAR2(1000), 
  APPEND        VARCHAR2(2000), 
  URL           VARCHAR2(4000), 
  STATUS_CODE   VARCHAR2(10), 
  JSON_RESPONSE CLOB, 
  HTTP_RESPONSE CLOB, 
  BODY          CLOB, 
  DELETE_OK     DATE );

This table is used for the data coming from a remote site. JSON data drops into the JSON_RESPONSE clob. In the event that we get errors or an HTML response, we put that into the HTTP_RESPONSE field. When preparing data to be pushed to a remote site, we can use the BODY blob for a payload. Our DELETE_OK field aids us with clean up which is done via a scheduler.job regularly. The table serves as a log for the RESTful activity between Oracle and the RESTful target.

When I find JSON I wish to explore, I copy the JSON into a new row and paste it in the JSON_RESPONSE clob. Now I don’t worry about typos (my own anyway). The second advantage is query/code I develop to explore the data will dovetail into my final product. I don’t have to re-do efforts. The query starts with:

select
  s.staging_pk
  from api_staging s,
    json_table (s.json_response, $ columns

JSON Data Types

As a courtesy and quick reference because JSON data types and Oracle data type differ, I’ll provide a quick list.

String
Boolean (true or false)
Number

Some list Arrays as a data type – fine, you are correct too. They are a data type in as much as JSON is a data type or Object is a data type. I’ll return to this in a bit.

Oracle does not have a boolean datatype within tables (at least not through 19c). Oracle 21C has numerous JSON features. We can explore these in another post during 2021. Furthermore, Oracle data fields are powerful and useful. JSON dates tend to be in the ISO-8601 format which we normally convert with a function based on this model:

return to_date(substr(P_JSON_DATE,1,16),'YYYY-MM-DD"T"HH24:MI') ;

Frankly, we store the JSON boolean data as varchar until we migrate the data further into our Oracle tables. In our intermediate staging tables, the data is hybrid of Oracle structures (VARCHAR2, NUMBER, DATE) and JSON stuff such as booleans.

JSON Objects & Arrays

I came to JSON with an Oracle brain that had spent years frustrated with poor solutions including CSV. My words and thoughts are wrong, I know that but when I see JSON Arrays I think “Rows”. When I see JSON Objects I think: Column. Sure, wrong and also not universal, but really close. So many people use arrays for small data sets like latitude and longitude.

The squiggle brackets define the limits of Objects. The square bracket provide limits for arrays. Here is a classic example (and a lovely example too) – note this is partial JSON and not expected to validate:

{
    "count": 834,
    "entries": [{
            "API": "Cat Facts",
            "Description": "Daily cat facts",
            "Auth": "",
            "HTTPS": true,
            "Cors": "no",
            "Link": "https://alexwohlbruck.github.io/cat-facts/",
            "Category": "Animals"
        }, {
            "API": "Cataas",
            "Description": "Cat as a service (cats pictures and gifs)",
            "Auth": "",
            "HTTPS": true,
            "Cors": "unknown",
            "Link": "https://cataas.com/",
            "Category": "Animals"
        }, {
            "API": "catAPI",
            "Description": "Random pictures of cats",
            "Auth": "",

These data came from https://api.publicapis.org/entries.

I love that the JSON object starts with a count. The array has the name “entries”. The array has the square bracket. Array elements are separated with comma. In this case, the phrase “arrays are rows” holds up. In some cases people use arrays to represent sets of data such the latitude and longitude as noted below. Yes, array! No, not really data rows.

    "geometry": {
        "type": "Point",
        "coordinates": [
            -71.049999999999997,
            42.350000000000001
        ]
    },

Querying JSON data

Numerous on-line articles and Oracle manuals provide great guidance on chasing down the dotted notation for data such as geometry.type

The challenge we’re constantly looking up within our own code is the managing of arrays. Some API provide a count or other meta data in the first elements. These first elements give you a handle to start picking data from. Many folks also name their arrays such as:

    "entries": [{
            "API": "Cat Facts",

The array name is entries so when doing the query, the query looks like this:

select 
  s.staging_pk                                       
  ,json.count
  ,json.api_name
  ,json.description
  ,json.auth
  ,json.https
  ,json.cors 
  ,json.link
  ,json.category
from api_staging s,
  json_table
  (s.json_response, '$' columns
    count    number		path	'$.count',
    nested path '$.entries[*]' columns
    (
      api_name      varchar2(255)   path '$.API'
      ,description  varchar2(255)   path '$.Description'			
      ,auth         varchar2(255)   path '$.Auth'  
      ,https        varchar2(255)   path '$.HTTPS'
      ,cors         varchar2(255)   path '$.Cors'
      ,link         varchar2(255)   path '$.Link'
      ,category     varchar2(255)   path '$.Category'	
    )
  ) json
 where api_module = 'api_openapi.entry_get' and delete_ok is null and status_code = 200
order by json.api_name

The $ indicates the starting point. The nested path dives deeper in to elements. The [*] creates a new line for each row within the array. The api_name is basically entries.api.

What if [{ is first?

Sometimes the JSON data does not contain meta data such as a count or the URI or other useful stuff at the front. Sometimes API authors do not name their arrays. When this happens, the first two characters in the JSON data are [{. We experienced this when writing an API for Okta. In this case, the Oracle syntax compresses to ‘$[*]’. Very difficult to perform an internet search for this situation and I find it hard to remember. The resultant query looks like this (an example from our Okta API):

 for j in (
  select     					
    json.okta_id
    ,json.status
    ,json.status_changed
    ,json.first_name
    ,json.last_name
    ,json.email
  from api_staging s,
    json_table
    (s.json_response, '$[*]' columns
      okta_id          varchar2(128)  path  '$.id'
      status           varchar2(128)  path  '$.status'
      status_changed   varchar2(128)  path  '$.statusChanged'
      first_name       varchar2(128)  path  '$.profile.firstName'
      last_name        varchar2(128)  path  '$.profile.lastName'
      email            varchar2(128)  path  '$.profile.email'
    ) json
  where staging_pk = P_STAGING_PK
) loop

Sample JSON data related to this query is (again this is a partial selection and will not validate as JSON):

[{
        "id": "00u2iy2tcgELZEoUR357",
        "status": "PROVISIONED",
        "created": "2020-01-17T20:30:51.000Z",
        "activated": "2020-01-17T20:30:51.000Z",
        "statusChanged": "2020-01-17T20:30:51.000Z",
        "lastLogin": null,
        "lastUpdated": "2020-01-17T20:30:51.000Z",
        "passwordChanged": null,
        "profile": {
            "firstName": "One",
            "lastName": "Two",

Multiple Arrays?

Sometimes the JSON data will map to two or more tables within your Oracle structure, classically a header and detail table. When examining JSON data we may find an array for the “header” data, then a nested array for the child or detail data. As we extract the data from the API staging table, we use nested loops with implicit cursors (for loops with a select statement).

When querying JSON data, we flatten out the complexity we see. Then we face a classic situation of remodelling the relational structure that was present within the JSON data and that we want in our Oracle tables.

Sometimes, we use one loop for the data set and track of the header data changed as we spin through the JSON array. It is both a matter of style and a matter of data presentation. While writing I am looking at an example of an invoice from an API by Recurly. The invoice has an invoice header and invoice detail (line items).

Here is code (it is a bit long, sorry)

for j in (
   select   
      object
      ,invoice_id
      ,account_id
      ,invoice_number
      ,subscription_id
      ,plan_id
      ,addon_id
      ,addon_code
      ,accounting_code
.... abbreviated
      ,line_item_id
      ,item_id 	
.... abbreviated
    from api_staging s,
      json_table
      (s.json_response, '$' columns  object	varchar2(128)		path '$.object'
         ,account_id      varchar2(255)   path '$.account.id'
         ,collect_method  varchar2(50)    path '$.collection_method'
         ,po_number       varchar2(50)    path '$.po_number'	
.... abbreviated	
         ,nested path '$.line_items.data[*]' columns
           (
            line_item_id      varchar2(128)   path '$.id'
            ,subscription_id  varchar2(255)   path '$.subscription_id'
... abbreviated
           )					
     ) js
  where staging_pk  = P_STAGING_PK
  order by invoice_id
) loop

Then as we spin through the rows, we test each loop to see if the header is the same as the prior one or new.

--
-- I N V O I C E   H E A D E R 
--
-- if the invoice_id is "new" (does not match the current), then insert or update the invoice header data
if j.invoice_id <> l_current_invoice_id then -- this is a new invoice header for this run of the code
  l_current_invoice_id := j.invoice_id ;
  l_loop_counter	:= 1;
  -- does the invoice exist?
  select count(*) into l_select_count from rc_invoice where invoice_id = j.invoice_id;
  -- convert the date
  r_invoice.invoiced_on	:= json_to_date(j.invoiced_on_json);
  -- invoice does not exist, so insert it
  if l_select_count = 0 then
.... continued

As we plod along we process each invoice detail row.

--
-- I N V O I C E   D E T A I L
--
-- we need the invoice_pk
select count(*) into l_select_count from rc_invoice where invoice_id = j.invoice_id;
if l_select_count = 1 then
  select * into r_invoice from rc_invoice where invoice_id = j.invoice_id;
else
  raise_application_error (-20000,'Invoice PK for: ' || j.invoice_id|| ' is null in: ' || l_procedure); -- blow up we can't be here without an invoice header row!
end if;

-- does the invoice line item exist?
select count(*) into l_select_count from rc_invoice_item where line_item_id = j.line_item_id;
l_item_pk 	:= item_pk(j.item_id);
l_start_date	:= rc_staging.json_to_date(j.start_date);
l_end_date	:= rc_staging.json_to_date(j.end_date);

The Right Way

There is no right way. First, JSON is not a protocol but a data format. We don’t get to argue right/not-right in such an environment. Furthermore, REST (representational state transfer) is an “architectural style”. It is not a protocol that require adherence to a limited set of rules. My answer is to play and have fun. Maybe your API host provides nice meta data and counters at the top. Maybe they name their arrays and keep the nesting of the data to something that doesn’t break my Oracle-centric brain. I see some 1980s and 1990s legacy dependency on code and loops in my style. And I see the value of embracing JSON. To each their strength.

Have some fun and stay safe.

cmoore