Thursday, February 12, 2026

Using Autonomous DB - Data Pump

The Data Pump tool in Autonomous DB is designed to import DMP files as an object from a bucket. By default, the database does not have access to the buckets, that is why we need to do the following steps. Similar steps are required for external tables from buckets (not covered here).

Few relevant links:

Oracle Doc - importing data using data pump

CarlosAL - how to import dump files into oci autonomous database using database actions

Stuart Coggins (Coggs) - cloud credentials with oci

If you try using Data Pump with no setup, you should expect an error:


Unable to load the import tool
A resource principal or a cloud service credential must be configured in order to access the Object Storage bucket. For detailed information please refer to the documentation.


In most cases in Autonomous DB things are rather autonomous or one click away. This is not one of those cases.

Part of the setup is done in the DB and part in the OCI itself. Since I use it only once, I'm going to use the Admin user in DB and my user in OCI. If you plan using Data Pump, as repeating process, you might want to consider otherwise, especially regarding the OCI user, so you are not dependent on a specific person.

As Admin user in SQL I run: 

EXEC DBMS_CLOUD_ADMIN.ENABLE_RESOURCE_PRINCIPAL();

This creates the credential "OCI$RESOURCE_PRINCIPAL" in the Database.


Don't bother the Data Pump yet... You wouldn't get the previous error and will be able to select Credential Name (OCI$RESOURCE_PRINCIPAL) but will not be able to select Compartment or a Bucket.

To access the OCI bucket I will 
  • Create a Dynamic Group & relevant Policy in OCI
  • Create API keys for my user in OCI
  • Use them in a relevant SQL 
Only then, I can use the Data Pump.

Create a Dynamic Group & relevant Policy in OCI

In OCI go to your Autonomous AI DB, click the ... (3 dots) and copy OCID. We will use it in the dynamic group setting


For managing dynamic groups, you must have one of the following privileges:
  • You are a member of the Administrators group.
  • You are granted the Identity Domain Administrator role or the Security Administrator role.
  • You are a member of a group that is granted manage identity-domains or manage dynamic-groups.

Under OCI console: Identity and Security, Domains  

In the domain (I used the default) click Dynamic Groups and create Dynamic group.


Set the Rule in the format 

any { resource.id = 'Copied DB OCID'}

Press Create

Under OCI console: Identity and Security, Policies  

Press Create Policy

The documentation named the policy ObjectStorageReadersPolicy, I will do the same.

Select the relevant compartment level.

In policy builder switch to manual and enter

Allow dynamic-group YOUR_DYNAMIC_GROUP_NAME to read buckets in tenancy

Create and add a second statement

Allow dynamic-group YOUR_DYNAMIC_GROUP_NAME to read objects in tenancy

  


The flow is: the Dynamic Group is connected to the DB and the Policy is connected to the Dynamic Group.


Create API keys for my user in OCI

The documentation advised to create a user for this task. I used my own.

Click the profile icon on the Right and click your username.

This brings you to Identity & Security, My Profile

Select Tokens and Keys and press Add API key. 

Download the Private key. Now we can press Add.


Copy and save the Configurate data from next screen. Close.

The user, fingerprint and tenancy information will be used later to create your user database.


Use them in a relevant SQL 

Return to the SQL in Autonomous AI databases (in your database select the SQL Database Action).

Use the value from previous section of API keys to in the code. All cove from the data you saved, except the private key (only the text between -----BEGIN PRIVATE KEY----- and -----END PRIVATE KEY-----):

BEGIN

  DBMS_CLOUD.CREATE_CREDENTIAL (

    credential_name => 'API_KEY',

    user_ocid              => 'YOUR User OCID',

    tenancy_ocid           => 'YOUR Tenancy OCID',

    private_key            => 'M. . .T=', 

    fingerprint            => 'YOUR fingerprint');

END;

(It might also work with AUH_TOKEN as described in  Stuart Coggins (Coggs) - cloud credentials with oci, didn't try it myself.) 


Now it's time go back to the Data Pump tool in the Database, and press Import. In my case it took a minute or 2 for data Pump to be able to actually see the bucket.




Thursday, August 7, 2025

Hacking OAS dataset files

We had a request to save a copy of every dataset created and saved in Oracle Analytics Server (OAS). I will cover only local OAS datasets that are not in the database. Usually, this kind of requests come because of legal demands to save such data on external server.

In theory I can create a script that periodically copies the dataset files to the backup server/FTP.

The datasets files are saved on the server under 

  • Domain_Home/bidata/components/DSS/storage/ssi
  • Domain_Home/servers/obis1/cache/ xsastorage
  • Domain_Home/servers/obis1/cache/xsacache
NEVER update or do anything directly on those files. Only deal with copies of the files that are located in your folders.
NEVER change manually RCU database data.

The files I want are in Domain_Home/bidata/components/DSS/storage/ssi. Only files with DSS extension are relevant. The files names are strings that have nothing to do with the dataset name in OAS. 



My datasets can come from 2 different sources:

  • A file (CSV/Text/Excel) loaded as dataset from the UI.
  • A dataset created by Data Flow.

The loaded based DSS files are usually in the original files format, so renaming the DSS make them usable and in Parquet format. 

The Data Flow based DSS files are in CSV format and in Parquet format.

For Parquet files we usually need a special reader utility. 


How can I know what those DSS files formats are?

How can I know the real Dataset name?

The secret is in the XXX_BIFOUNDATION schema table S_NQ_DSS_DATASETS.


In that table I see different behavior for loaded files vs. Data Flow based ones.

While the Data Flow based datasets are represented by one row the Loaded datasets are represented by few rows (2 in my case).  

Data Flow based dataset

For Data Flow based dataset, the DATASET_NAME column holds the real Dataset name, and the DATAMODEL_ID is NULL. The MIMETYPE column value is "text/csv".

In the FILESTORE_ID column I see the DSS file names. The CSV file and the Parquet file. For example: 

{"src":"06a918f7-916f-49de-be33-9c046d53e869","src_parquet":"f76f107c-cb48-4853-a0d7-52839d4f194e"}

Checking the Domain_Home/bidata/components/DSS/storage/ssi I can see 2 DSS files:

  • 06a918f7-916f-49de-be33-9c046d53e869.dss - actually a CSV file
  • f76f107c-cb48-4853-a0d7-52839d4f194e.dss - actually a Parquet file

Now I have 2 options to handle the files:

  • Checking the S_NQ_DSS_DATASETS table for updates (LAST_MODIFIED column). Finding the relevant files names in FILESTORE_ID. Copying them with correct filenames and extensions.
  • Checking the Folder for new files on Linux/Windows level and finding the file name in the S_NQ_DSS_DATASETS table for proper dataset name and extension.


Loaded files, based dataset

For Loaded files, based dataset, we have a dataset row and the real data row in the S_NQ_DSS_DATASETS table. It is very similar to what we see in the UI where we can have several tables or files in a single dataset.

For the entire dataset row the DATASET_NAME column holds the real Dataset name, and the DATAMODEL_ID is NULL. The MIMETYPE column value is NULL. We can disregard the FILESTORE_ID column here. The FILE_LENGTH column value is -1.

For the data row in S_NQ_DSS_DATASETS we can see in the DATAMODEL_ID  column, the value of the dataset row DATASET_ID. By joining the 2 we can have the original dataset name for the data row.

In the data row, in the FILESTORE_ID column I see the DSS file names. The CSV file and the Parquet file. For example: 

{"src":"8f815cc1-044a-4caa-9026-431be828e347","src_parquet":"261dfba0-9adc-4572-a7d7-9cd4c95dccf5"}

Checking the Domain_Home/bidata/components/DSS/storage/ssi I can see 2 DSS files:

  • 8f815cc1-044a-4caa-9026-431be828e347.dss - actually a CSV or EXCEL or Text file
  • 261dfba0-9adc-4572-a7d7-9cd4c95dccf5.dss - actually a Parquet file

How do I know the proper format? By using the MIMETYPE column where I have values:

  • application/csv for CSV file.
  • application/vnd.openxmlformats-officedocument.spreadsheetml.sheet for Excel file.

Now I have 2 options to handle the files:

  • Checking the S_NQ_DSS_DATASETS table for updates (LAST_MODIFIED column). Finding the relevant files names in FILESTORE_ID .Copying them with correct filenames and extensions.
  • Checking the Folder for new files on Linux/Windows level and finding the file name in the S_NQ_DSS_DATASETS table for proper dataset name and extension.


I recommend a select of the following type to work with the data, this way you have proper names of DSS files and in FULL Name the name of the dataset:


SELECT

    a.dataset_id,

    a.stripe_id,

    a.dataset_name,

    b.dataset_name DS_name,

    b.dataset_name||(case when b.dataset_name IS NULL then '' else '-' end)||a.dataset_name FULL_NAME,

    a.filestore_id,

    substr(a.filestore_id, 9, instr(a.filestore_id, ',')-1-9)||'.dss' Orig_file,

    case when substr(a.filestore_id, instr(a.filestore_id, ':', -1)+1,4)='null' then NULL else     substr(a.filestore_id, instr(a.filestore_id, ':', -1)+2, length(a.filestore_id)-(instr(a.filestore_id, ':', -1)+2)-1 )||'.dss' end Parquet_file,

    a.acl_id,

    a.filetype_id,

    a.mimetype,

    a.owner_id,

    a.createdby_id,

    a.provider_id,

    a.type_options_xml,

    a.provider_options_xml,

    a.filetype_knobs,

    a.file_length,

    a.storage_file_length,

    a.last_modified,

    a.created_date,

    a.biservermetadata_last_modified,

    a.rawfile_last_modified,

    a.is_embryonic,

    a.is_user_striped,

    a.internal_id,

    a.namespace,

    a.connection_id,

    a.partition_key,

    a.refresh_option,

    a.schedule_xml,

    a.datamodel_id,

    a.error_code,

    a.error_message,

    a.parent_dataset_id,

    a.child_dataset_type,

    a.app_module_id,

    a.properties_last_modified

FROM

    dev_biplatform.s_nq_dss_datasets A, dev_biplatform.s_nq_dss_datasets B

where A.file_length<>-1 and a.datamodel_id=b.dataset_id(+)   

ORDER BY a.last_modified DESC

  ;






...Please notice the PARQUET files don't have the actual column names and use names like F0, F1

:You can find the real names in the following 2 columns

  • TYPE_OPTIONS_XML
  • FILETYPE_KNOBS

The XML in both columns are similar. I found 3 copies of files column names in different formats.

Here is TYPE_OPTIONS_XML example:


And here is FILETYPE_KNOBS:



Here is a sample of a dataset from TYPE_OPTIONS_XML (only first and last columns - "CUSTOMER_ID" and "date1"):

<ds:datasetImportOptions xmlns:ds="com.oracle.bi/datasetimport/v1">
<ds:tables><ds:table name="Data" totalRowNumber="-1">


<ds:columns><ds:column><ds:name>CUSTOMER_ID</ds:name><ds:datatype>number</ds:datatype></ds:column>
 .... 
<ds:column><ds:name>date1</ds:name><ds:datatype>timestamp</ds:datatype></ds:column>
</ds:columns><ds:delimiter>comma</ds:delimiter><ds:headerRow>true</ds:headerRow></ds:table></ds:tables>


<ds:prepareTables><ds:table name="Data" transformed="false">
<ds:columns><ds:column><ds:name>CUSTOMER_ID</ds:name></ds:column>
 .... 
<ds:column><ds:name>date1</ds:name></ds:column>
</ds:columns></ds:table></ds:prepareTables>


<ds:outputTables><ds:table name="Data">
<ds:columns><ds:column exclude="false">
<ds:name>CUSTOMER_ID</ds:name><ds:display_names><ds:display_name><ds:name>CUSTOMER_ID</ds:name><ds:locale>sys</ds:locale></ds:display_name></ds:display_names><ds:datatype>number</ds:datatype><ds:sourceexpr>&quot;CUSTOMER_ID&quot;</ds:sourceexpr><ds:aggr_rule>none</ds:aggr_rule></ds:column><ds:column exclude="false">
 .... 
<ds:name>date1</ds:name><ds:display_names><ds:display_name><ds:name>date1</ds:name><ds:locale>sys</ds:locale></ds:display_name></ds:display_names><ds:datatype>timestamp</ds:datatype><ds:sourceexpr>&quot;date1&quot;</ds:sourceexpr><ds:aggr_rule>none</ds:aggr_rule></ds:column></ds:columns>
</ds:table></ds:outputTables>

<ds:uiOptions>null</ds:uiOptions>
<ds:version>v5</ds:version>
</ds:datasetImportOptions>




Thursday, July 31, 2025

Oracle APEX - Using user input text in an AI RAG Source - APEX$AI_LAST_USER_PROMPT

With APEX we can create a chatbot and connect it to AI data source. There are lots of good materials about it, but I was struggling with the problem how to pass the user input text as part of the RAG code. My colleague Yohay Asraf gave me the tip of using APEX$AI_LAST_USER_PROMPT that solved my problem. Thanks, Yohay!

In APEX we can now use AI options and configure RAG sources, Oracle 23ai Database features like SelectAI and OCI AI Agents. See links to few Labs at the end of the post.


In an APEX application under "Shared Components" - "Generative AI Configuration" I created a RAG query that uses the Vector option of the 23ai but how can we make it dynamic?


In my case, I loaded various car guides into a table and created chunks of the books and vectors from those chunks. Next, I'm looking for chunks of text that are similar to a specific text (Bla Bla Bla in the next example). 

In the example 

  • I'm using existing Encoder named doc_model to convert the input string to vector and name it "embedding". 
  • I compare the input vector to existing text chunks looking for minimal distance and limiting it to the selected Vehicle Make and model. 
  • I return 4 best results.

 WITH query_vector AS (

            SELECT VECTOR_EMBEDDING(doc_model USING 'Bla Bla Bla' AS data) as embedding)

SELECT embed_id, TEXT_CHUNK

FROM WKSP_APEX.VECTOR_STORE, query_vector

Where VECTOR_DISTANCE(EMBED_VECTOR, query_vector.embedding, COSINE)<0.6

AND VEHICLE_MAKE=:P2_MAKE  AND VEHICLE_MODEL=:P2_MODEL

ORDER BY VECTOR_DISTANCE(EMBED_VECTOR, query_vector.embedding, COSINE)

FETCH APPROX FIRST 4 ROWS ONLY


How do I make the input string dynamic? Here we have 2 options 

  • APEX$AI_LAST_USER_PROMPT - Retrieves the most recent user-entered prompt.
  • APEX$AI_ALL_USER_PROMPTS - Retrieves all user prompts, concatenated into a single string.

 https://docs.oracle.com/en/database/oracle/apex/24.2/htmdb/managing-ai-configurations-and-rag-sources.html#GUID-D1274017-381C-4988-8E6E-36095BA06655


Here is the new code:

WITH query_vector AS (

            SELECT VECTOR_EMBEDDING(doc_model USING :APEX$AI_LAST_USER_PROMPT AS data) as embedding)

SELECT embed_id, TEXT_CHUNK

FROM WKSP_APEX.VECTOR_STORE, query_vector

Where VECTOR_DISTANCE(EMBED_VECTOR, query_vector.embedding, COSINE)<0.6

AND VEHICLE_MAKE=:P2_MAKE AND VEHICLE_MODEL=:P2_MODEL

ORDER BY VECTOR_DISTANCE(EMBED_VECTOR, query_vector.embedding, COSINE)

FETCH APPROX FIRST 4 ROWS ONLY


As you can see, we are using 2 RAG sources Car Problems table and a Books table.



There is a nice blog by Harry Snart that uses it - RAG based Chatbots made easy 


If you want to start using AI or RAG and APEX, there are many Blogs and great labs in Oracle LiveLabs.