Wednesday, November 27, 2024

Last occurrence of sub-string in a string - The locate function secret part in Oracle Analytics

 I need to find the last occurrence of a specific character in a string (and using only the part that follows that character). For example in a string 'abc-de-fghi-jklm' I want to locate the last '-' and get the sub-string of 'jklm'. In my cases there is unknown number of '-' characters.

The basic function to locate a sub-string in a string is "Locate". 

While it seems Locate has only 2 parameters (the string we are searching and the full string), actually they are 3 parameters. The third one is optional: start position.

LOCATE(CharacterExpression, CharacterExpression [, StartPosition] ) 

The Start Position can accept negative values as well, meaning search from the end of the string.

So in my case Locate('-', 'abc-de-fghi-jklm', -1) is the solution.

To get the sub-string 'jklm' I will use 

SUBSTRING( 'abc-de-fghi-jklm'

                   FROM  Locate('-', 'abc-de-fghi-jklm', -1)+1)

The SUBSTRING function parameters are:

SUBSTRING(CharacterExpression FROM StartPosition [FOR length]) 










Friday, October 25, 2024

Installing OAS on mobile laptop

Once in a while I install OAS (Oracle Analytics Server) on my laptop. I do it for testing purposes mostly... I don't want anyone but myself to connect to this OAS. Sort of rich man's OA Desktop.

 First of all, don't do it. Even if you want OAS on a laptop, you would better use a VM or Docker. This type of installation is not supported and usually is problematic.

This post is for myself, so I will stop reinventing the wheel each time I do it...

Problem: 

  • OAS is a server that requires a static IP. 
  • My laptop changes the IPs each time I move from one network to another. 
  • During the installation the IP is hard coded in some of the config files.

Work around: 

  • Create a MS loopback adapter with a static IP. Example from Oracle DB 21c documentation is here.
  • Set that IP as Ethernet network on the laptop.
  • Before OAS installation turn of Wi-Fi or any other network, so the installation is forced to use the loopback IP.

Once done remember to stop and start OAS on the same network.

Thursday, October 24, 2024

Oracle OAS 2024 (7.6) October 2024 patch and the missing OPATCH

 Oracle has released the Oracle Analytics Server - OAS 2024 (7.6) October 2024 bundle patch, specifically 09-Oct-2024 patch. You can find it at the regular place: Doc ID 2832967.2 in the OAS tab.



The patch requires upgrade of OPatch,  at least to version 13.9.4.2.13. For some reason I couldn't find the relevant OPatch version at the regular place (https://updates.oracle.com/download/6880880.html) where the FMW patch is described as Obsolete, (correct for 24 Oct. 2024).

You can visit this support note (Patch 28186730: OPATCH 13.9.4.2.17 FOR EM 13.5 AND FMW/WLS 12.2.1.4.0 AND 14.1.1.0.0) to download OPatch 13.9.4.2.17 that should do the trick.



Tuesday, June 4, 2024

Oracle Spatial - Millions of Points in Thousands of Polygons

I wrote about Oracle Spatial in the past. (Intro, data loading Points, Lines, Polygons). Now I had a new challenge.

We had a POC. Customer wanted to test performance of 7 million data points in 3200 polygons. In the POC I learned there is difference in performance, based on the way we create the query. I was lucky to gain some guidance from Daniel Geringer (who wrote this guide) of Oracle Spatial. Bellow is the summary of the process. It was done on Autonomous Database (ADW), but should be similar on any Oracle DB.

First part will cover the performance benefits from using the proper function. Second, some advisable data setting and cleaning steps we did. 


Part 1 

Customer loaded ~7 million data points (lng/lat) of USA accidents from a similar data source and loaded to ADW. Loaded ~3200 US counties from this or similar source

We could update the longitude and latitude (lng/lat) data and place them in a new spatial format column. For many reasons we preferred not to.  We worked directly with the lng/lat points and the get_geometry function (see page 25 here or slightly longer version here). Next created index (see pages 26, 27 here)... and fixed some inconsistencies (see second part, bellow).

My default intention was to use the SDO_CONTAINS function. Instead of SDO_CONTAINS, Daniel suggested top use  SDO_ANYINTERACT or SDO_INSIDE instead and to switch the order of the geometry parameter to the spatial operator. 

SDO_ANYINTERACT can be used if it's ok to return points on the polygon boundary.  If the requirement is only to return points inside the polygon, SDO_INSIDE should be used. In switching the geometry order, there is a huge difference between running 3200 times comparisons for 7 million points and running 7 million times comparisons for 3200 polygons. In our tests it was ~390 seconds vs. ~5 seconds.

In all cases the following format was used:

select count(*) from accident a, counties c where SDO_... ='TRUE';

Let's compare the following:

  • SDO_CONTAINS(C.GEOMETRY, GET_GEOMETRY(LNG,LAT)) = 'TRUE'
  • SDO_ANYINTERACT(GET_GEOMETRY(LNG,LAT),c.geometry)='TRUE'
  • SDO_ANYINTERACT(c.geometry,GET_GEOMETRY(LNG,LAT))='TRUE'
  • SDO_INSIDE(GET_GEOMETRY(LNG,LAT),c.geometry)='TRUE'

In all cases we had level 8 of parallel execution:

  • SDO_CONTAINS(C.GEOMETRY, GET_GEOMETRY(LNG,LAT)) = 'TRUE' returned in 387.7 sec.
  • SDO_ANYINTERACT(GET_GEOMETRY(LNG,LAT),c.geometry)='TRUE' returned in 4.8 sec.
  • SDO_ANYINTERACT(c.geometry,GET_GEOMETRY(LNG,LAT))='TRUE' returned in 389.2 sec.
  • SDO_INSIDE(GET_GEOMETRY(LNG,LAT),c.geometry)='TRUE' returned in 5.0 sec.

This is 80 times faster!

Both SDO_INSIDE and SDO_CONTAINS return the same number (6,973,650) while both SDO_ANYINERACT return slightly higher number (6,973,846).


So we have 2 winners that can use the better parameters order. 

I will focus on this option:

select count(*) 
from accident a, counties c
where SDO_ANYINTERACT(GET_GEOMETRY(LNG,LAT),c.geometry)='TRUE';


I wanted to see how does the parallelism influence this query performance. I rerun it with various combinations using ADW High connection (bellow 8 ECPU explain plan): 



The results are impressive:



What happens if we increase the amount of data? 

I duplicated the original data 21 times, while moving each data points randomly by 10-100 meters each time towards the center of USA.  I didn't want duplicated locations for the test. 

The result was a table of over 146 million locations. I used similar query and tested it with various amounts of ECPU. Here are the results:







Part 2 

For better performance , before execution, it is advisable to:

  • Validate geometry (fix if needed)
  • Create Index

We validated the Counties table

select name,  sdo_geom.validate_geometry_with_context(GEOMETRY, 0.005) status
from  COUNTIES

and received lots of errors of the type:  13367 [Element <1>] [Ring <1>]

Daniel explained that it is probably due to polygon directionality. It was solved by running the sdo_util.rectify_geometry like this:

update counties set geometry=sdo_util.rectify_geometry(geometry,0.005);

Good code for similar problems is in the blog post by Hans Viehmann here


See also the Spatial Data Validation chapter (slides 3-8) in the  Oracle Spatial Best Practices and Tuning Tips by Daniel Geringer here.


Last part was the index creation. While I covered index creation for  PointsLinesPolygons earlier, I learned there is performance benefit in creating CBTREE organized index for streaming point data. See slides 29, 30 here

Our code was:

INSERT INTO USER_SDO_GEOM_METADATA VALUES (
  'ACCIDENTS',  -- table name
  'ADMIN.GET_GEOMETRY(LNG,LAT)',    -- geometry column name
  SDO_DIM_ARRAY(
    SDO_DIM_ELEMENT('X', -180, 180, 0.005),  -- 'X',[min x],[max x],[tolerance]
    SDO_DIM_ELEMENT('Y', -90, 90, 0.005)),    -- 'Y',[min y],[max y],[tolerance]
   4326           -- indicates longitude/latitude coordinates (coordinate system)
 );


CREATE INDEX ACCIDENTS ON ACCIDENTS_2023_03 ( ADMIN.GET_GEOMETRY(LNG,LAT) )
INDEXTYPE IS MDSYS.SPATIAL_INDEX_V2 PARAMETERS('layer_gtype=point cbtree_index=true');

Please note the INDEXTYPE part that is different from the regular: 
INDEXTYPE IS MDSYS.SPATIAL_INDEX_V2;


P.S. 

One more interesting option to test is the SDO_POINTINPOLYGON function that was described Jean Ihm here and Karin Patange here