Monday, July 31, 2023

Oracle DB - CLOB file in English loaded with Chinese characters

 I tried to follow AskTom blog - Load JSON data file into Oracle 12c. Unfortunately my was one of the rare cases that appear where the English file, loaded to CLOB in the Oracle database was loaded in Chinese characters. Since I didn't find detailed solution, I wrote this post.

Readers Digest:  dbms_lob.loadfromfile() does not have a character set parameter. Use dbms_lob.loadclobfromfile () instead.  


Starting position Oracle DB 19 on my Windows laptop with the standard AL32UTF8 character set:

To simplify the story, lets assume I'm loading a simple file. In my case I placed in a directory 'TEST_DIR' the file test.csv with the content of a single string:
I created a table with a CLOB column (my_json is just a name in this case):

create table t2 ( my_json clob);

Running the loading process following the above mentioned post:

declare
    b bfile := bfilename('TEST_DIR','test.csv');
    c clob;
  begin
    dbms_lob.open(b);
    dbms_lob.createtemporary(c,true);

    dbms_lob.loadfromfile(c,b,dbms_lob.getlength(b));
    insert into t2 values (c);
    commit;
    dbms_lob.freetemporary(c);
  end;
  / 

The result is in Chinese:

After a fair amount of digging found a clue hidden in the middle of this stackoverflow case and a detailed syntax example in this very old (8i) oracle-base page.

As I mentioned above the problem is because dbms_lob.loadfromfile() does not have a character set parameter. Using dbms_lob.loadclobfromfile () instead, solved the problem. This has nothing to do with the database NLS_Characterset parameter.

 

The changes are in bold:

 declare
    b bfile := bfilename('TEST_DIR','test.csv');
    c clob;
    l_dest_offset   integer := 1;
    l_src_offset    integer := 1;
    l_bfile_csid    number  := 0;
    l_lang_context  integer := 0;
    l_warning       integer := 0;

  begin
    dbms_lob.open(b);
    dbms_lob.createtemporary(c,true);
    dbms_lob.loadclobfromfile (
    dest_lob      => c,
    src_bfile     => b,
    amount        => dbms_lob.getlength(b),
    dest_offset   => l_dest_offset,
    src_offset    => l_src_offset,
    bfile_csid    => l_bfile_csid ,
    lang_context  => l_lang_context,
    warning       => l_warning);

    insert into t2 values (c);
    commit;
    dbms_lob.freetemporary(c);
  end;
  /


If we return to the JSON loading example from AskTom, this is the JSON loading that solved my problem:

create table t ( my_json clob, constraint chk check ( my_json is json ));

declare
    b bfile := bfilename('TEST_DIR','test.json');
    c clob;
  l_dest_offset   integer := 1;
  l_src_offset    integer := 1;
  l_bfile_csid    number  := 0;
  l_lang_context  integer := 0;
  l_warning       integer := 0;
  begin
    dbms_lob.open(b);
    dbms_lob.createtemporary(c,true);
    --dbms_lob.loadclobfromfile(c,b,dbms_lob.getlength(b),1,1,1,873,0);
    dbms_lob.loadclobfromfile (
    dest_lob      => c,
    src_bfile     => b,
    amount        => dbms_lob.getlength(b),
    dest_offset   => l_dest_offset,
    src_offset    => l_src_offset,
    bfile_csid    => l_bfile_csid ,
    lang_context  => l_lang_context,
    warning       => l_warning);
    insert into t values (c);
    commit;
    dbms_lob.freetemporary(c);
  end;
  /



Sunday, July 9, 2023

Hackathon Posts - Creating Oracle Analytics Cloud (OAC) instance

Few posts planned for use in expected Hackathons. 

Oracle Analytics Cloud (OAC) is usually a very useful component of a Hackaton. It is great both as Analytic tool and as Data preprocessing and Understanding tool as well.

 


How to create an OAC instance?

Under Analytics & AI, select Analytics Cloud

 

There make sure you are in the correct compartment (on the left), preferably not the root compartment and press "Create Instance"


Give the instance a name (it will later be part of the URL)

For Hackaton it's usually better to have OCPU based 2 OCPU instance.


For most cases Professional Edition is sufficient (the Enterprise one includes all Professional capabilities).

In most case Public access is fine. Press Create.
 

It might take 10-20 minutes to create an OAC instance.
 

Once it is created you can upload file (Create - Dataset)


Or if you want to work with Autonomous Database (or anything else) create a new connection:


Select the Autonomous Database

Press Select.. button to upload the Autonomous Database wallet zip file (* see bellow how to obtain it).

Fill the Connection name, Username and Password (only).

If successfully saved, we can create new datasets based on DB tables, views, Manual SQL....



* How to obtain Autonomous Database wallet?

In the main OCI Menu go to Oracle Database and select the Autonomous Database:

In the specific database you work with, press "Database Connection"

 

Press "Download wallet"

In the next screen enter any password (we will not need it) twice and press "Download".

This will download the wallet zip file we will use (no need to extract for OAC).



Hackathon Posts - Oracle AI Service: Vision

 

 Few posts planned for use in expected Hackathons. 

We want to use Oracle AI  Service of Vision. Please note there is a specific Document Understanding Service as well.

OCI Vision is an AI service for performing deep-learning–based image analysis at scale. With prebuilt models available out of the box, developers can easily build image recognition and text recognition into their applications without machine learning (ML) expertise. For industry-specific use cases, developers can automatically train custom vision models with their own data. These models can be used to detect visual anomalies in manufacturing, organize digital media assets, and tag items in images to count products or shipments.

This is the documentation home, with most of the essentials.


Permissions

First, we need to gain access to the service. It can be granted to all OCI users of the tenancy or specific group you belong to. For this we need to create a Policy that allows to use ai-service-vision-family in tenancy.

    To create a Policy, in the burger menu go to "Identity & Security" and select Policies. There press "Create Policy". Once you did enter Name and description of the Policy and turn on the "show manual editor" under Policy Builder. Now you have 2 options to allow use for specific group or all the users. Enter one of the following:
  •  allow group <group-name> to use ai-service-vision-family in tenancy
  • allow any-user to use ai-service-vision-family in tenancy 
 
For vision we probably need access to object storage where the source files can be located.
In Hackaton we might be careless and grant the general access to object storage:
  • allow group <group_name> to use object-family in tenancy
 In real life we will probably restrict it to a specific compartment:
  • allow group <group_name> to use object-family in compartment <input_bucket_located_object_storage_compartment>
And a policy to access output location in object storage of a specific compartment
  • allow group <group_name> to manage object-family in compartment <output_bucket_located_object_storage_compartment>
 
In case you plan to use later the Oracle Cloud shell, you might want to do the same with "cloud-shell in tenancy". For example: allow any-user to use cloud-shell in tenancy 
 

 Intro

Now we can access the AI Service - Vision (under the Analytics & AI menu).

 


I really liked the way the actual product in OCI seems to be self documented:


 

Links

 

 

At the bottom of the page we see the Service capabilities:

  • Image classification - Categorizes scene-based features and objects within an image
  • Object detection - Locates and identifies objects within an image
  • Text detection - Provides word-level and line-level text as well as the bounding box coordinates of where the text is located
  • Custom image classification and object detection - Image classification and object detection models trained on your own labeled dataset



On the left part of the screen they are the link to use the console UI:

 

 See also:

 







Hackathon Posts - Oracle AI Service: Document Understanding

 Few posts planned for use in expected Hackathons. 

We want to use Oracle AI  Service of Document Understanding. Personally it seem like a nice combination of the Language and Vision Services.

Oracle Cloud Infrastructure (OCI) Document Understanding is an AI service that enables developers to extract text, tables, and other key data from document files through APIs and command line interface tools. With OCI Document Understanding, you can automate tedious business processing tasks with prebuilt AI models and customize document extraction to fit your industry-specific needs. 

This is the documentation home, with most of the essentials.

 


Permissions

First, we need to gain access to the service. It can be granted to all OCI users of the tenancy or specific group you belong to. For this we need to create a Policy that allows to use ai-service-document-family in tenancy.

    To create a Policy, in the burger menu go to "Identity & Security" and select Policies. There press "Create Policy". Once you did enter Name and description of the Policy and turn on the "show manual editor" under Policy Builder. Now you have 2 options to allow use for specific group or all the users. Enter one of the following:
  •  allow group <group-name> to use ai-service-document-family in tenancy
  • allow any-user to use ai-service-document-family in tenancy 
 
For document we probably need access to object storage where the source document can be located.
In Hackaton we might be careless and grant the general access to object storage:
  • allow group <group_name> to use object-family in tenancy
 In real life we will probably restrict it to a specific compartment:
  • allow group <group_name> to use object-family in compartment <input_bucket_located_object_storage_compartment>
We are required to have a policy to access output location in object storage of a specific compartment
  • allow group <group_name> to manage object-family in compartment <output_bucket_located_object_storage_compartment>
 
In case you plan to use later the Oracle Cloud shell, you might want to do the same with "cloud-shell in tenancy". For example: allow any-user to use cloud-shell in tenancy 
 

 Intro

Now we can access the AI Service - Document Understanding (under the Analytics & AI menu).
 


I really liked the way the actual product in OCI seems to be self documented:


On the service page you can see links to:

 




 

At the bottom of the page we see the Service capabilities:

  • Text extraction -Provides word-level and line level text as well as the bounding box coordinates of where the text is located. Optionally, you can create a searchable PDF which embeds a transparent layer on top of a document image in PDF format to make it searchable by keywords.
  • Table extraction - Identifies tables and individual cells in order to extract content in tabular format.
  • Key value extraction - Identifies a predefined set of key fields from documents such as receipts, invoices, driver licenses, and passports. Note: some supported documents may be in limited availability (LA) only.
  • Document classification - Classifies documents into different types based on their visual appearance and high-level features, including invoice, receipt, bank statement, driver license, passport, tax form, and resume.
  • Custom key value extraction - Key value extraction model trained on your own labeled dataset for documents like domain-specific forms and intake documents.
  • Custom document classification - Document classification model trained on your own labeled dataset for industry-specific document types or granular types of one document


On the left part of the screen they are the link to use the console UI:


 The sources can be original Demo files or your Local files ot you Object storage.



 See also: