First Journey with SelectAI

April 3, 2024

cloud, oracle, SelectAI, SQL

Large Language Models (LLM) are everywhere now, now you can even get help with your SQL. A few months back, Oracle introduced a feature in the Autonomous Database where you can ask normal questions in order to query your data.

First Journey with SelectAI

Brendan Tierney at Oralytics has written two good articles, SelectAI – the beginning of a journey and SelectAI – Doing something useful on a cool feature in the Oracle Autonomous database; SelectAI. He shows how you in a few steps can query your data using natural language. I followed the steps in his posts, and in a few minutes got it to work in my own Autonomous Database that I am using in the Oracle Free Tier.

There are plenty of interesting data sets out there, but I wanted to explore some of my own data, because I obviously know them better and have other motivations to play with them.

I have the simplest data model you can imagine and the data are collected with a made-in-2-minutes APEX app I have had on my mobile for a while.

Data model with two tables, category and event

The app allows me to record where I am with a NAME (Name or description) at different times (EVENT_TIME) with location stored in LATITUDE and LONGITUDE, and I may upload a picture into the IMAGE column. Obviously I had to make it slightly fancy and added a second table CATEGORY whose purpose is quite obvious.

The tables are stored in a schema called WKSP_EVENT and this is the code to create the profile.

BEGIN
  --DBMS_CLOUD_AI.drop_profile(profile_name => 'OPEN_AI');
  DBMS_CLOUD_AI.create_profile(
      profile_name => 'OPEN_AI',
      attributes => '{"provider": "openai",
                      "credential_name": "OPENAPI_CRED",
                      "object_list": [{"owner": "WKSP_EVENT", "name": "EVENT"},  
                                      {"owner": "WKSP_EVENT", "name": "CATEGORY"}]
       }');
END;

The rest of the commands are the same as shown in Brendan’s posts.

Since I have a paid subscription to OpenAI I decided to use the openai provider. It failed at first attempt with HTTP code 429 and I figured out it was because my monthly subscription did not include API access. After adding $10 to my credit I was up and running.

My first attempt did not go well:

select ai how many event;

When it fails it returns one row with an apology and some kind of suggestions with SQL, in this case:

“Sorry, unfortunately a valid SELECT statement could not be generated for your natural language prompt. Here is some more information to help you further: To count the number of events in the “WKSP_EVENT” table, you can use the following Oracle SQL query:

SELECT COUNT(*) AS event_count
FROM WKSP_EVENT

Looking at this SQL it is obvious what is missing. Peeking back at Brendan’s post I reckoned I should express myself better:

select ai how many events exists;

In this case it returns with an answer. In order to see the SQL it generated I can get it by adding an action after the ai keyword as documented here and demonstrated in his second blog post.

select ai showsql how many events exists;

It looked right to me. These two also worked fine:

select ai when is the earliest event; 
select ai how many events in each category;

Time to try something more difficult. Two of my categories are Home and Conference. I asked:

select ai what is the average distance between events at home and events at conference;

At one attempt it returned just one row and a column with NULL, later it failed. When I asked it to show the SQL using the same question it reported back:

SELECT AVG(distance) AS average_distance
FROM (
    SELECT SQRT((e1.LONGITUDE - e2.LONGITUDE) * (e1.LONGITUDE - e2.LONGITUDE) + (e1.LATITUDE - e2.LATITUDE) * (e1.LATITUDE - e2.LATITUDE)) AS distance
    FROM WKSP_EVENT.EVENT e1
    JOIN WKSP_EVENT.EVENT e2 ON e1.CATEGORY_ID = e2.CATEGORY_ID
    WHERE e1.NAME = 'Home' AND e2.NAME = 'Conference'
) subquery

But the SQL has two errors; it uses NAME from EVENT rather than from CATEGORY, and the formula used is wrong (unless you believe the Earth is flat). I had now idea about the formula, but ChatGPT gave me a lesson on the Haversine formula

Screen shot of the Haversine formula as shown by ChatGPT

My expectations were perhaps too high for this tool.

I work from home a lot and wondered what is the longest stretch I have stayed at home:

select ai what is the longest period  I stayed at home;

Didn’t work either, and it came up with what suggestions about what kind of tables and columns I needed to answer the question, what an insult.

My two tables did not have any comments, and I wondered if that would make any difference. So I added comments to the tables and the columns as best as I could. Then I recreated the profile with an extra attribute "comments": true:

BEGIN
  DBMS_CLOUD_AI.drop_profile(profile_name => 'OPEN_AI');
  DBMS_CLOUD_AI.create_profile(
      profile_name => 'OPEN_AI',
      attributes => '{"provider": "openai",
                      "credential_name": "OPENAPI_CRED",
                      "comments": true,
                      "object_list": [{"owner": "WKSP_EVENT", "name": "EVENT"},  
                                      {"owner": "WKSP_EVENT", "name": "CATEGORY"}]
       }');
END;
/
EXEC DBMS_CLOUD_AI.set_profile('OPEN_AI');

But it did not make any difference, sigh.

I tried to reformulate my question a bit, but no luck. At one point I think OpenAI got tired of SQL altogether and returned with:

Sorry, unfortunately a valid SELECT statement could not be generated for your natural language prompt. Here is some more information to help you further:

The longest period of events in the category of Home can vary depending on the specific context or perspective. However, some possible examples of long-lasting events in the category of Home could include:

  1. Homeownership: The period of time during which an individual or family owns and resides in a particular home. This can span several decades or even a lifetime.

  2. Home renovations: The process of making significant changes or improvements to a home. Depending on the scale and complexity of the renovations, this can take months or even years to complete.

  3. Home construction: The duration of time it takes to build a new home from scratch. This can range from several months to a couple of years, depending on the size and complexity of the project.

  4. Home restoration: The process of restoring an older or damaged home to its original condition. This can be a lengthy process, especially for historical homes or properties requiring extensive repairs.

  5. Home gardening: The ongoing cultivation and maintenance of a garden or outdoor space at home. This can be a continuous activity that spans multiple seasons or even years.

It’s important to note that the length of these events can vary greatly depending on individual circumstances, resources, and goals.

Anyway, I noticed in the OpenAI dashboard at platform.openai.com that all requests to the API used the GPT-3.5 Turbo model. Reading the documentation is a good idea. You can specify which model to use like this:

BEGIN
  DBMS_CLOUD_AI.drop_profile(profile_name => 'OPEN_AI2');
  DBMS_CLOUD_AI.create_profile(
      profile_name => 'OPEN_AI2',
      attributes => '{"provider": "openai",
                      "model": "gpt-4",
                      "credential_name": "OPENAPI_CRED2",
                      "object_list": [{"owner": "WKSP_EVENT", "name": "EVENT"},  
                                      {"owner": "WKSP_EVENT", "name": "CATEGORY"}]
       }');
END;

The available models are listed in the page I linked to above. OpenAI explains this:

For many basic tasks, the difference between GPT-4 and GPT-3.5 models is not significant. However, in more complex reasoning situations, GPT-4 is much more capable than any of our previous models.

One could think GPT-4 will make a difference in this situation. The new profile was activated with EXEC DBMS_CLOUD_AI.set_profile('OPEN_AI2');.

New attempt:

select ai showsql what is the longest chronological sequence of events with the same category;

and new dissapointment:

Sorry, unfortunately a valid SELECT statement could not be generated for your natural language prompt. Here is some more information to help you further: To find the longest chronological sequence of events with the same category, you can use the LAG and LEAD functions to compare the current row with the previous and next rows. Then, use the ROW_NUMBER function to assign a unique number to each row in the sequence. Finally, use the MAX function to find the longest sequence.

Here is the SQL query:

WITH seq AS (
  SELECT e1.ID, e1.EVENT_TIME, e1.CATEGORY_ID, 
         CASE WHEN e1.CATEGORY_ID = LAG(e1.CATEGORY_ID) OVER (ORDER BY e1.EVENT_TIME) 
              THEN 0 
              ELSE 1 
         END AS start_of_sequence,
         CASE WHEN e1.CATEGORY_ID = LEAD(e1.CATEGORY_ID) OVER (ORDER BY e1.EVENT_TIME) 
              THEN 0 
              ELSE 1 
         END AS end_of_sequence
  FROM WKSP_EVENT.EVENT e1
),
seq2 AS (
  SELECT s1.*, 
         SUM(s1.start_of_sequence) OVER (ORDER BY s1.EVENT_TIME) AS sequence_number
  FROM seq s1
),
seq3 AS (
  SELECT s2.sequence_number, 
         COUNT(*) AS sequence_length, 
         MIN(s2.EVENT_TIME) AS sequence_start, 
         MAX(s2.EVENT_TIME) AS sequence_end, 
         s2.CATEGORY_ID
  FROM seq2 s2
  WHERE s2.start_of_sequence = 1 OR s2.end_of_sequence = 1
  GROUP BY s2.sequence_number, s2.CATEGORY_ID
)
SELECT MAX(s3.sequence_length) AS longest_sequence, 
       s3.sequence_start, 
       s3.sequence_end, 
       c.NAME AS category_name
FROM seq3 s3
JOIN WKSP_EVENT.CATEGORY c ON c.ID = s3.CATEGORY_ID
ORDER BY longest_sequence DESC
FETCH FIRST ROW ONLY;

Certainly more fancy but still not correct, even after correcting the error in the main select.

At this point I think I have the following options:

  1. Learn to ask better (English) questions
  2. Wait for an improvement of SelectAI
  3. Lower my expectations
  4. Find more relevant data

While I was writing 1. above I wondered, does it understand Norwegian?

select ai Hvor mange rader finnes det i event;

It does! What about that sequence I tried earlier?

select ai Hva er den lengste kronologiske rekken av rader i event med samme kategori;

And the SQL is:

WITH event_sequence AS (
  SELECT e.*, 
         ROW_NUMBER() OVER (ORDER BY event_time) AS seq_num,
         ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY event_time) AS seq_num_category
  FROM wksp_event.event e
)
SELECT MIN(event_time) AS start_time, 
       MAX(event_time) AS end_time, 
       COUNT(*) AS sequence_length
FROM event_sequence
GROUP BY seq_num - seq_num_category, category_id
ORDER BY sequence_length DESC
FETCH FIRST ROW ONLY

The result looks right from listing the events in order, but it took me a few seconds (not telling you how many) to understand the group by expression.

I think this can actually become a nice feature.

There is more to explore in this feature, like varying the temperature attribute in the profile.

By the way, writing this post cost me $0.06 to OpenAI, and $0.00 to Oracle since I am using the Free Tier (See Your Own Oracle Database Lab in 1-2-3 and in Norwegian Gratis infrastruktur med Oracle Free Tier )