Thursday, December 31, 2020

כנס עיצוב מערכות אנאליטיות, UX/UI design ב26 לינואר

נסיים את 2020 עם הפנים לשנה הבאה.

ב26 לינואר נקיים כנס (בזום) בנושא עיצוב מערכות אנאליטיות. 

בכנס נדבר על המלצות מעשיות וכללים לעיצוב ממשקים. כמו כן  נדבר על כללים לבחירת תצוגות ונעבור על מגוון דוגמאות. 

בשל הנסיון לשלב כמות גדולה של חומר, הכנס יהיה ארוך מהרגיל וימשך מעל 3 שעות.

https://oracle.zoom.us/webinar/register/2816087961719/WN_f9UqTKjVRA26iaByLzQ0Ig

הכנס יכלול הדגמות בכלי אורקל, אך לא יתמקד בהן ומיועד לקהל הכללי.

עדכון: אם פספסתם את הכנס או רוצים לצפות בו שוב, ניתן לעשות זאת מהקישור הזה לערוץ בעברית של oracle cloud.

https://oracle.zoom.us/webinar/register/2816087961719/WN_f9UqTKjVRA26iaByLzQ0Ig

 

For our worldwide audience - the lectures will be in Hebrew.


 

 

Wednesday, November 18, 2020

OML4SQL - (Binary) Classification model, Once More, with feeling

We had to do a Binary Classification model. I can find lots of materials on the topic at various Oracle and non-Oracle blogs. I wanted to do a slightly deeper than usual investigation of creating such model with SQL and PLSql. In this example we have a very organized 100K example of loan credit score. This is a rather long post.

What to expect:

1. Creating a basic model, with scoring of results.

2. Comparing possible classification models.

3. Finding optimal Hyper-parameters.

4. Testing Partition on a model.

  

Creating a basic model, with scoring of results.

In Oracle DB 19 Classification models can use Naive Bayes, Neural Network, Decision Tree, Logistic Regression, Random Forest, Support Vector Machines, or Explicit Semantic Analysis. The default is Naive Bayes.  See more in the documentation.

Classification is considered an instance of supervised learning i.e., learning where a training set of correctly identified observations is available. It specifies the class to which data elements belong to and is best used when the output has finite and discrete values. It predicts a class for an input variable as well.

The original DB table is credit_scoring.

Check Target values:
select distinct credit_score_bin from credit_scoring;
 Other Credit
 Good Credit 

We have 2 target values, so it is a specific case called binary classification. In OML4SQL, there is no special handling of binary cases.

First thing to do is split the data to train and test parts.

In Oracle we can Split train / test data using  ORA_HASH or Sample (for additional reading see here
and here). I used the Sample option that creates train_credit_scoring and test_credit_scoring tables with 70/30 split. The seed is used for testing consistency:

create table train_credit_scoring as
  select * from (
    select * from credit_scoring SAMPLE (70) SEED (7));  
 
create table test_credit_scoring as
select * from (
   select * from credit_scoring
   MINUS
   select * from train_credit_scoring);


Relevant Algorithms are documented here, 4 of them existing in 11, 12c DB and 3 are new:

  • ALGO_NAIVE_BAYES  
  • ALGO_NEURAL_NETWORK  - new
  • ALGO_DECISION_TREE 
  • ALGO_GENERALIZED_LINEAR_MODEL
  • ALGO_RANDOM_FOREST  - new
  • ALGO_SUPPORT_VECTOR_MACHINES
  • ALGO_EXPLICIT_SEMANTIC_ANALYS  - new

 

The usual process from here is to create a setting table where we can enter model parameters, run model creation script and evaluating / using the model. 

 

Lets Start With Decision Tree. For model settings documentation see here.

We will enter 2 values:

The selected Classification Model (the default is NAIVE BAYES) and telling oracle to auto-prepare the data (When you enable Automatic Data Preparation, the model uses heuristics to transform the build data according to the requirements of the algorithm. Instead of fully Automatic Data Preparation, the user can request that the data be shifted and/or scaled with the PREP_SCALE* and PREP_SHIFT* settings. The default is prep_auto_on).

create table CR_SCR_set_DT (
setting_name varchar2(50),
setting_value VARCHAR2(50)
);

 

Begin

Delete from CR_SCR_set_DT;

INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
Values(dbms_data_mining.algo_name, dbms_data_mining.algo_Decision_Tree);

INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
Values(dbms_data_mining.prep_auto, dbms_data_mining.prep_auto_on);

Commit;

End;
/


We might want to delete the model if existing from previous tests:



BEGIN
  DBMS_DATA_MINING.DROP_MODEL(model_name => 'credit_DT');
  EXCEPTION WHEN OTHERS THEN NULL; -- to prevent the error when no model
END;
/

And create the Model, we must have a primary key/case id column defined:

BEGIN
   DBMS_DATA_MINING.CREATE_MODEL(
     model_name          => 'credit_DT',
     mining_function     => dbms_data_mining.classification,
     data_table_name     => 'train_credit_scoring',
     case_id_column_name => 'CUSTOMER_ID',
     target_column_name  => 'credit_score_bin',
     settings_table_name => 'CR_SCR_set_DT');
END;
/

On my laptop, for 70K rows, Task completed in 4.362 seconds.

Let's evaluate the model using the test table, by using confusion matrix:

I'm looking for the Good Credit customers. Let's treat the customers I predicted to be Good Credit as True customers.



TP - True Positive are those True customers I predicted correctly.

FN - False Negative are customers I predicted  as False (Other Credit) but was wrong.

FP - False Positive are customers I predicted as True and was Wrong.

TN - True Negative are customer I predicted correctly as False.

Calculations based on them:

Accuracy = TP+TN/TP+FP+FN+TN  (Ratio of correctly predicted observation to the total observations.)

Precision = TP/TP+FP  (Ratio of correctly predicted positive observations to the total predicted positive observations.)

Recall (Sensitivity) = TP/TP+FN (Ratio of correctly predicted positive observations to the all observations in actual class - true).

F1 Score = 2*(Recall * Precision) / (Recall + Precision)  The weighted average of Precision and Recall.

 

In different cases we might prefer one calculation over the other. 


SELECT TP, FN, FP,  TN,  TP/(TP+FP) PRECISION, TP/(FN + TP) RECALL, 2*TP/(FN+2*TP+FP) F1, (TP+TN)/(TP+FP+TN+FN) Model_Acuracy, FP/(FP+TN) False_Positive
FROM (
    SELECT SUM(TP) TP, SUM(FP) FP, SUM(TN) TN, SUM(FN) FN
        FROM (
                SELECT CASE WHEN pred = 'Good Credit' AND target = 'Good Credit' THEN 1 ELSE 0 END AS TP,
                    CASE WHEN pred = 'Good Credit' AND target = 'Other Credit' THEN 1 ELSE 0 END AS FP,
                    CASE WHEN pred = 'Other Credit' AND target = 'Other Credit' THEN 1 ELSE 0 END AS TN,
                    CASE WHEN pred = 'Other Credit' AND target = 'Good Credit' THEN 1 ELSE 0 END AS FN
                FROM (
                    SELECT CUSTOMER_ID, prediction(credit_DT using *) pred, credit_score_bin target
                    FROM test_credit_scoring) a ) b ) c;

The result is not bad:

        TP         FN         FP         TN        PRECISION  RECALL    F1            MODEL_ACURACY   FALSE_POSITIVE
 ---------- ---------- ---------- -----------    ----------------  ------------  ----------   ---------------------------   --------------
      6487        766       2674      20031    .7081104       .8943885    .7904228    .8851726                      .1177714


We can use the model with prediction and probability functions:

select
    customer_id,
    PREDICTION(credit_DT using *) pred,
    credit_score_bin target,
    PREDICTION_PROBABILITY(credit_DT using *) prob
FROM test_credit_scoring;

 

There are several general View built in the system to check information about your models (see here and here for better explanations).

 

ALL_MINING_MODELS - general information about each model


ALL_MINING_MODEL_ATTRIBUTES, a view with information about the columns in the training data that were used to build the model.


ALL_MINING_MODEL_SETTINGS – a view describing the settings of the mining model. (You can check possible setting values here instead of the documentation.)


ALL_MINING_MODEL_VIEWS - A view deascribing model specific view. See examples bellow.


Oracle creates Model Detail Views  for our convenience, that describe the model.

Each View is named DM$V  + One_Letter + You_Model_Name. For example: DM$VGcredit_DT

Here are few views relevant for our model:

DM$VOcredit_DT  (the tree nodes)


DM$VPcredit_DT  (tree nodes and attribute, including suurogate split. Surrogate split is used when the main split has no values.)


DM$VTcredit_DT (with split by rows of target, and if exists target weight).

DM$VIcredit_DT (with tree nodes and target support count).



Comparing possible classification models.

 In Oracle DB 19 Classification models can use Naive Bayes, Neural Network, Decision Tree, Logistic Regression, Random Forest, Support Vector Machines, or Explicit Semantic Analysis. The default is Naive Bayes.  See more in the documentation. Here are their formal descriptions:

  • ALGO_NAIVE_BAYES  
  • ALGO_NEURAL_NETWORK
  • ALGO_DECISION_TREE 
  • ALGO_GENERALIZED_LINEAR_MODEL
  • ALGO_RANDOM_FOREST
  • ALGO_SUPPORT_VECTOR_MACHINES
  • ALGO_EXPLICIT_SEMANTIC_ANALYS

 In the previous section we created a model based on Decision Tree and used the following sql code:

Updated the setting table:

Begin

Delete from CR_SCR_set_DT;

INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
Values(dbms_data_mining.algo_name, dbms_data_mining.algo_Decision_Tree);

INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
Values(dbms_data_mining.prep_auto, dbms_data_mining.prep_auto_on);

Commit;

End;
/

Created the Model:
BEGIN
   DBMS_DATA_MINING.CREATE_MODEL(
     model_name          => 'credit_DT',
     mining_function     => dbms_data_mining.classification,
     data_table_name     => 'train_credit_scoring',
     case_id_column_name => 'CUSTOMER_ID',
     target_column_name  => 'credit_score_bin',
     settings_table_name => 'CR_SCR_set_DT');
END;
/

Evaluated the model:

SELECT TP, FN, FP,  TN,  TP/(TP+FP) PRECISION, TP/(FN + TP) RECALL, 2*TP/(FN+2*TP+FP) F1, (TP+TN)/(TP+FP+TN+FN) Model_Acuracy, FP/(FP+TN) False_Positive
FROM (
    SELECT SUM(TP) TP, SUM(FP) FP, SUM(TN) TN, SUM(FN) FN
        FROM (
                SELECT CASE WHEN pred = 'Good Credit' AND target = 'Good Credit' THEN 1 ELSE 0 END AS TP,
                    CASE WHEN pred = 'Good Credit' AND target = 'Other Credit' THEN 1 ELSE 0 END AS FP,
                    CASE WHEN pred = 'Other Credit' AND target = 'Other Credit' THEN 1 ELSE 0 END AS TN,
                    CASE WHEN pred = 'Other Credit' AND target = 'Good Credit' THEN 1 ELSE 0 END AS FN
                FROM (
                    SELECT CUSTOMER_ID, prediction(credit_DT using *) pred, credit_score_bin target
                    FROM test_credit_scoring) a ) b ) c;


If you want to test other algorithms, one option is to reuse the existing model. In that case all I need is to change the algorithm (in bold) in the setting table.


I prefer to create a new setting table for each algorithm and update the code accordingly. I need to create a new setting table for each algorithm and change the model name. For example, for Naive Bayes here is my code (the changes from previous in bold):


create table CR_SCR_set_NB (
setting_name varchar2(50),
setting_value VARCHAR2(50)
);

 

Begin

Delete from CR_SCR_set_NB;

INSERT INTO CR_SCR_set_NB (setting_name, setting_value)
Values(dbms_data_mining.algo_name, dbms_data_mining.algo_Naive_Bayes);

INSERT INTO CR_SCR_set_NB (setting_name, setting_value)
Values(dbms_data_mining.prep_auto, dbms_data_mining.prep_auto_on);

Commit;

End;
/

 

BEGIN
   DBMS_DATA_MINING.CREATE_MODEL(
     model_name          => 'credit_
NB',
     mining_function     => dbms_data_mining.classification,
     data_table_name     => 'train_credit_scoring',
     case_id_column_name => 'CUSTOMER_ID',
     target_column_name  => 'credit_score_bin',
     settings_table_name => 'CR_SCR_set_
NB');
END;
/

 

SELECT TP, FN, FP,  TN,  TP/(TP+FP) PRECISION, TP/(FN + TP) RECALL, 2*TP/(FN+2*TP+FP) F1, (TP+TN)/(TP+FP+TN+FN) Model_Acuracy, FP/(FP+TN) False_Positive
FROM (
    SELECT SUM(TP) TP, SUM(FP) FP, SUM(TN) TN, SUM(FN) FN
        FROM (
                SELECT CASE WHEN pred = 'Good Credit' AND target = 'Good Credit' THEN 1 ELSE 0 END AS TP,
                    CASE WHEN pred = 'Good Credit' AND target = 'Other Credit' THEN 1 ELSE 0 END AS FP,
                    CASE WHEN pred = 'Other Credit' AND target = 'Other Credit' THEN 1 ELSE 0 END AS TN,
                    CASE WHEN pred = 'Other Credit' AND target = 'Good Credit' THEN 1 ELSE 0 END AS FN
                FROM (
                    SELECT CUSTOMER_ID, prediction(credit_
NB using *) pred, credit_score_bin target
                    FROM test_credit_scoring) a ) b ) c;

 the NB results are quite good, in our case:

        TP         FN         FP         TN      PRECISION     RECALL         F1             MODEL_ACURACY    FALSE_POSITIVE
--------------- ---------- ---------- ------- ------------------ ----------------- ------------- ----------------------------- -----------------------
      7008        245       3154      19551 .689628026     .966220874     .804823428    .886541158                 .138912134

If you are curious, here are the results of all 7 algorithms of this case:

 

 

Naturally, the Model Detail Views are different for every algorithm. We can find more info in the documentation here.

 

 

DBMS_DATA_MINING.CREATE_MODEL2 (update)

 There is a newer version of create_model named create_model2. It has slightly different parameters (bold bellow). The main advantage here is the ability to create setting list instead of physical table and the usage of data query instead of fixed data table.


BEGIN
  DBMS_DATA_MINING.DROP_MODEL(model_name => 'credit_DT');
  EXCEPTION WHEN OTHERS THEN NULL; -- to prevent the error when no model
END;
/

DECLARE
    v_setlst DBMS_DATA_MINING.SETTING_LIST;
   
BEGIN
    v_setlst('PREP_AUTO') := 'ON';
    v_setlst('ALGO_NAME') := 'ALGO_DECISION_TREE';
   
    DBMS_DATA_MINING.CREATE_MODEL2(
     model_name         => 'credit_DT',
     mining_function    => dbms_data_mining.classification,
     data_query         => 'select * from train_credit_scoring',
     set_list           => v_setlst,
     case_id_column_name=> 'CUSTOMER_ID',
     target_column_name => 'credit_score_bin'
);

END;
/

 

I wanted to create a standard script to find my optimal algorithm. For that purpose I will combine Create_model2 and a "for" loop. In that loop I will create a model with every algorithm and update a table with the performance of each algorithm. I named the table alg_results.

drop table alg_results;
 
create table alg_results (algo varchar2(60), TP Number(16,6), FN Number(16,6), FP Number(16,6),  TN Number(16,6),  PRECISION Number(16,6), RECALL Number(16,6), F1 Number(16,6), Model_Acuracy Number(16,6), False_Positive Number(16,6));

 

 Next I will do the previous code with a FOR loop, use a different algorithm each time and update the results to our alg_results table.

 

DECLARE
    v_setlst DBMS_DATA_MINING.SETTING_LIST;
BEGIN
delete from alg_results;
FOR I IN 1..7 LOOP

BEGIN
  DBMS_DATA_MINING.DROP_MODEL(model_name => 'credit_tmp');
  EXCEPTION WHEN OTHERS THEN NULL;
END;
 
BEGIN
    v_setlst('PREP_AUTO') := 'ON';
    v_setlst('ALGO_NAME') := case i when 1 then 'ALGO_DECISION_TREE' when 2 then 'ALGO_NAIVE_BAYES' when 3 then 'ALGO_NEURAL_NETWORK' when 4 then 'ALGO_GENERALIZED_LINEAR_MODEL' when 5 then 'ALGO_RANDOM_FOREST' when 6 then 'ALGO_SUPPORT_VECTOR_MACHINES' else 'ALGO_EXPLICIT_SEMANTIC_ANALYS' end;
   
    DBMS_DATA_MINING.CREATE_MODEL2(
     model_name         => 'credit_tmp',
     mining_function    => dbms_data_mining.classification,
     data_query         => 'select * from train_credit_scoring',
     set_list           => v_setlst,
     case_id_column_name=> 'CUSTOMER_ID',
     target_column_name => 'credit_score_bin'
);

END;

BEGIN
insert into alg_results (ALGO, TP, FN, FP,  TN,  PRECISION, RECALL, F1, Model_Acuracy, False_Positive)
SELECT case i when 1 then 'ALGO_DECISION_TREE' when 2 then 'ALGO_NAIVE_BAYES' when 3 then 'ALGO_NEURAL_NETWORK' when 4 then 'ALGO_GENERALIZED_LINEAR_MODEL' when 5 then 'ALGO_RANDOM_FOREST' when 6 then 'ALGO_SUPPORT_VECTOR_MACHINES' else 'ALGO_EXPLICIT_SEMANTIC_ANALYS' end ALGO,
TP, FN, FP,  TN,  TP/(TP+FP) PRECISION, TP/(FN + TP) RECALL, 2*TP/(FN+2*TP+FP) F1, (TP+TN)/(TP+FP+TN+FN) Model_Acuracy, FP/(FP+TN) False_Positive
FROM (
    SELECT SUM(TP) TP, SUM(FP) FP, SUM(TN) TN, SUM(FN) FN
        FROM (
                SELECT CASE WHEN pred = 'Good Credit' AND target = 'Good Credit' THEN 1 ELSE 0 END AS TP,
                    CASE WHEN pred = 'Good Credit' AND target = 'Other Credit' THEN 1 ELSE 0 END AS FP,
                    CASE WHEN pred = 'Other Credit' AND target = 'Other Credit' THEN 1 ELSE 0 END AS TN,
                    CASE WHEN pred = 'Other Credit' AND target = 'Good Credit' THEN 1 ELSE 0 END AS FN
                FROM (
                    SELECT CUSTOMER_ID, prediction(credit_tmp using *) pred, credit_score_bin target
                    FROM test_credit_scoring) a ) b ) c;
commit;
END;



END LOOP;
end;
/
 

The entire process took less then 2 minutes on my laptop (109 seconds).

Now it is easy to compare the performance of each algorithm:

 

 


Finding optimal Hyper-parameters

We can define various parameters in the setting table, with different list for every algorithm.   (See here for details).

I will stay with the Decision Tree, since it is easier to understand.

 Here is the list of its specific hyper-parameters:


I can always set specific setting values such as tree max depth (TREE_TERM_MAX_DEPTH, default 7 and TREE_TERM_MINREC_NODE - Minimum number of rows in the node, default 10).  We can always check for specific values:

    Begin
        Delete from CR_SCR_set_DT;
        INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
        Values(dbms_data_mining.algo_name, dbms_data_mining.algo_Decision_Tree);
        INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
        Values(dbms_data_mining.prep_auto, dbms_data_mining.prep_auto_on);
        INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
        Values(dbms_data_mining.TREE_TERM_MAX_DEPTH, 10);
        INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
        Values(dbms_data_mining.TREE_TERM_MINREC_NODE, 20);
        Commit;
    End;

 

 

I would like to check what are the optimal values of of the 2,TREE_TERM_MAX_DEPTH (values 5-15) and TREE_TERM_MINREC_NODE (values 10-50, steps of 10).

I can run the 4 steps in a for loop with PLSql. Where I update the setting table, delete the previous model (assumes there is a previous model), build new model based on the latest setting and place the results of my model evaluation values in a table.

I drop and create the results table

drop table DT_results;
 

create table DT_results (Depth Number, Split NUMBER(16,6), TP Number(16,6), FN Number(16,6), FP Number(16,6),  TN Number(16,6),  PRECISION Number(16,6), RECALL Number(16,6), F1 Number(16,6), Model_Acuracy Number(16,6), False_Positive Number(16,6));

Run 2 "FOR" loops:

declare J1 number(16,6); -- no reason for this format
BEGIN
Delete from DT_RESULTS;
FOR I IN 5..15 LOOP
FOR J IN 1..5 LOOP

J1:=J*10;
    Begin
        Delete from CR_SCR_set_DT;
        INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
        Values(dbms_data_mining.algo_name, dbms_data_mining.algo_Decision_Tree);
        INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
        Values(dbms_data_mining.prep_auto, dbms_data_mining.prep_auto_on);
        INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
        Values(dbms_data_mining.TREE_TERM_MAX_DEPTH, I);
        INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
        Values(dbms_data_mining.TREE_TERM_MINREC_NODE, J1);
        Commit;
    End;


BEGIN
  DBMS_DATA_MINING.DROP_MODEL(model_name => 'credit_DT');
  EXCEPTION WHEN OTHERS THEN NULL; -- to prevent the error when no model
END;

    BEGIN
        DBMS_DATA_MINING.CREATE_MODEL(
            model_name          => 'credit_DT',
            mining_function     => dbms_data_mining.classification,
            data_table_name     => 'train_credit_scoring',
            case_id_column_name => 'CUSTOMER_ID',
            target_column_name  => 'credit_score_bin',
            settings_table_name => 'CR_SCR_set_DT');
    END;

BEGIN
insert into DT_results (Depth, Split, TP, FN, FP,  TN,  PRECISION, RECALL, F1, Model_Acuracy, False_Positive)
SELECT I DEPTH, J1 Split, TP, FN, FP,  TN,  TP/(TP+FP) PRECISION, TP/(FN + TP) RECALL, 2*TP/(FN+2*TP+FP) F1, (TP+TN)/(TP+FP+TN+FN) Model_Acuracy, FP/(FP+TN) False_Positive
FROM (
    SELECT SUM(TP) TP, SUM(FP) FP, SUM(TN) TN, SUM(FN) FN
        FROM (
                SELECT CASE WHEN pred = 'Good Credit' AND target = 'Good Credit' THEN 1 ELSE 0 END AS TP,
                    CASE WHEN pred = 'Good Credit' AND target = 'Other Credit' THEN 1 ELSE 0 END AS FP,
                    CASE WHEN pred = 'Other Credit' AND target = 'Other Credit' THEN 1 ELSE 0 END AS TN,
                    CASE WHEN pred = 'Other Credit' AND target = 'Good Credit' THEN 1 ELSE 0 END AS FN
                FROM (
                    SELECT CUSTOMER_ID, prediction(credit_DT using *) pred, credit_score_bin target
                    FROM test_credit_scoring) a ) b ) c;
commit;
END;

END LOOP;
END LOOP;
END;
/

Now I can test the results:

Apparently, TREE_TERM_MAX_DEPTH is best in shallow tree of depth 5-6 and TREE_TERM_MINREC_NODE changes don't influence the results.



Testing Partition on a model

The idea of partition is to have a sub-model for specific values. It was nicely covered here by Brendan Tierney that also suggested to run it in parallel here

I desided to try partitioning on Loan_type and Martial_status. That should create 20 sub-models. The only change is by adding the bold line in the setting table.

Begin
DElete from CR_SCR_set_DT;
INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
Values(dbms_data_mining.algo_name, dbms_data_mining.algo_Decision_Tree);
INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
Values(dbms_data_mining.prep_auto, dbms_data_mining.prep_auto_on);
INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
Values(dbms_data_mining.TREE_TERM_MAX_DEPTH, 6);
INSERT INTO CR_SCR_set_DT (setting_name, setting_value)
VALUES(dbms_data_mining.odms_partition_columns, 'LOAN_TYPE, MARITAL_STATUS');

Commit;
End;
/

Surprisingly it did not inprove the results in this case.



What Else can we  do?

There are many more options I could try, for example specifying cost as described here.

A cost matrix is a mechanism for influencing the decision making of a model. A cost matrix can cause the model to minimize costly misclassifications. It can also cause the model to maximize beneficial accurate classifications.

For example, if a model classifies a customer with poor credit as low risk, this error is costly. A cost matrix could bias the model to avoid this type of error. The cost matrix might also be used to bias the model in favor of the correct classification of customers who have the worst credit history.

ROC is a useful metric for evaluating how a model behaves with different probability thresholds. You can use ROC to help you find optimal costs for a given classifier given different usage scenarios. You can use this information to create cost matrices to influence the deployment of the model.

Like a confusion matrix, a cost matrix is an n-by-n matrix, where n is the number of classes. Both confusion matrices and cost matrices include each possible combination of actual and predicted results based on a given set of test data.

A confusion matrix is used to measure accuracy, the ratio of correct predictions to the total number of predictions. A cost matrix is used to specify the relative importance of accuracy for different predictions. In most business applications, it is important to consider costs in addition to accuracy when evaluating model quality.



Oracle Autonomous Data Warehouse (ADW)

You can see the same example in the Oracle Autonomous Data Warehouse machine learning notebook classification example. If you don't have access to ADW see the workshop here.






Monday, November 16, 2020

Restore / Recover unsaved SQL code in SQLDeveloper

I'm working with Oracle SQLDeveloper 19. The Windows crashed... Where can I find my SQL code that I'm sure I saved few times, but apparently, didn't?

It is good you asked. After a short panic period I found it in C:\Users\YOUR_USER\AppData\Roaming\SQL Developer, in files with no type / extension.


Hope it helps.


Tuesday, November 3, 2020

OAS - Oracle Analytics Server 5.5 Bundle Patch 5.5.0.0.201012 is available + Weblogic Security Patch

OAS - Oracle Analytics Server 5.5 Bundle Patch 5.5.0.0.201012 is available at support site.

This patch, 32003790, is for Windows and Linux and has a long list of Bug fixes included. 

 

You can find the patch here: https://support.oracle.com/epmos/faces/PatchResultsNDetails?patchId=32003790

And the readme here: https://updates.oracle.com/Orion/Services/download?type=readme&aru=23857617



There is also a critical security patch for Weblogic, described here Security Alert CVE-2020-14750 Patch Availability Document for Oracle WebLogic Server (Doc ID 2724951.1)

Sunday, November 1, 2020

Dynamic Dashboard and Dashboard Page Title (and color) in OAC, OAS and OBIEE

 Following the previous post (Font size of Dashboard name and Dashboard pages names in OAC, OAS and OBIEE), in rare cases we need it, we can have dynamic Dashboard and Dashboard Page names.

Please notice this trick only works with online Dashboards - not printing or exporting of the Dashboard or even Catalog.

 I will use a simplified version of it here. Usually real cases will be more complex and involve using URL to activate specific dashboard page.


What we did in previous post is controlling the Dashboards and pages fonts using HTML script. For example this is the Dashboard code:

<script>
    document.getElementsByClassName("masterH1")[0].innerHTML =  '<font size="3" > Dashboard Name</font>'
</script>

We also noticed we can use analysis that pass the script to Dashboards. 

In similar fashion we can remove the font part of the code (or leave if you prefer) and make the name part dynamic using Variables.


For example, we can modify the previous script to this:

<script>
    document.getElementsByClassName("masterH1")[0]> '@1'
</script>


How is it done?

I created an Analysis with one column that is a Presentation Variable (Could be any sort of variable such as Session or Repository...). In this case Presentation Variable Named P_Name with a default value.

 

Used it in Narrative view (@1 means the first column, Rows to display = 1, Mark the HTML option)



Just to demo it, created a Prompt with that Presentation Variable:



And used it in a Dashboard (you can't see the analysis itself in the Dashboard, since it is just HTML script).






In a similar fashion we could control the Page Name with several Variables:


<script>
    document.getElementsByClassName("masterH3")[0].innerHTML =  '@1'
    document.getElementsByClassName("masterH3")[1].innerHTML =  '@2'
    document.getElementsByClassName("masterH3")[2].innerHTML =  '@3'
</script>




Of course, this can work with the dynamic color as well:

 <script>
    document.getElementsByClassName("masterH1")[0].innerHTML =  '<span style=\"color:@2\">@1'
</script>