Showing posts with label Learn. Show all posts
Showing posts with label Learn. Show all posts

Saturday, January 21, 2017

Using SAP HANA Cloud Plattform from ABAP to find root causes for runtime bottlenecks

Hi guys,

continuing my previous experiments, I have now set aside AWS Machine Learning for a while and turned towards the SAP HANA Cloud Platform. SAP HCP also offers Machine Learning capabilities under the label of "Predictive Analysis". I was reading a bit and found out about additional capabilities to the prediction like introspection mechanisms on the models you create within SAP HCP Predictive Analysis like, finding key influencing factors for target figures in your data. This so far is not offered by Amazon's Machine Learning, so it got me excited.

Goals

The goal of this experiment is to enable every ABAP developer to use SAP HCP Predictive Analysis via an native API layer without the hassle of having to understanding architecture, infrastructure and details of the implementation. This will bring the capabilities much closer to where the hidden data treasure lies, waiting for insights: The ERP System - even if it's not S/4 HANA or even sitting on top of a HANA database itself.

This ultimately enables ABAP developers to extend existing business functionality with Predictive Analysis capabilities.




Use Case

I decided to keep topic the same as in my previous articles: Predicting the runtime of the SNP System Scan. This time however I wanted to take advantage of the introspection capabilities and ask what factors influences the runtime the most.

I am going in with an expectation that the runtime will most likely depend on the database, the database vendor and version. Of course the version should play a significant role, as we are contantly trying to improve performance. However this may be in a battle with features we are adding. Also the industry could be of interest because this may lead to different areas of the database being popuplated with different amounts of data, due to different processes being used. Let's see what HCP fiends out.

Preparation

In order to make use of SAP HCP Predictive Services you have to fulfill some prerequisites. Rather than describing them completely I will reference the tutorials, I was following to do the same within my company. If you already have setup an SAP HCP account and the SAP HANA Cloud Connector you only need to perform steps 2, 4 and 5.
  1. Create an SAP HCP trial account.
  2. Deploy SAP HANA Cloud Platform predictive services on your trial account by following (see this tutorial)
  3. Install and setup the SAP HANA Cloud Connector in your corporate system landscape
  4. Configure access in the SAP HANA Cloud Connector to the HANA database you have set up in step 2.
  5. This makes your HCP database appear to be in your own network and you can configure it as an ADBC resource in transaction DBCO of your SAP NetWeaver system. That you will execute your ABAP code on later.

Architecture

After having set up the infrastructure let's think about the architecture of the application. It's not one of the typical extension patterns, that SAP forsees because the application logic resides on the ABAP system, making use of an application in the cloud rather than a new application sitting in SAP HCP that is treating your on-premise SAP systems as datasources.



Example Implementation

So by this time all necessary prerequisites are fulfilled and it's time to have some fun with ABAP - hard to believe ;-) But as long as you can build an REST-Client you can extend your core with basically anything you can imagine nowadays. So here we go:

FORM main.
*"--- DATA DEFINITION -------------------------------------------------
  DATA: lr_scan_data TYPE REF TO data.
  DATA: lr_prepared_data TYPE REF TO data.
  DATA: lr_ml TYPE REF TO /snp/hcp01_cl_ml.
  DATA: lv_dataset_id TYPE i.
  DATA: lr_ex TYPE REF TO cx_root.
  DATA: lv_msg TYPE string.

  FIELD-SYMBOLS: <lt_data> TYPE table.

*"--- PROCESSING LOGIC ------------------------------------------------
  TRY.
      "fetch the data into an internal table
      PERFORM get_system_scan_data CHANGING lr_scan_data.
      ASSIGN lr_scan_data->* TO <lt_data>.

      "prepare data (e.g. convert, select features)
      PERFORM prepare_data USING <lt_data> CHANGING lr_prepared_data.
      ASSIGN lr_prepared_data->* TO <lt_data>.

      "create a dataset (called a model on other platforms like AWS)
      CREATE OBJECT lr_ml.
      PERFORM create_dataset USING lr_ml <lt_data> CHANGING lv_dataset_id.

      "check if...
      IF lr_ml->is_ready( lv_dataset_id ) = abap_true.

        "...creation was successful
        PERFORM find_key_influencers USING lr_ml lv_dataset_id.

      ELSEIF lr_ml->is_failed( lv_dataset_id ) = abap_true.

        "...creation failed
        lv_msg = /snp/cn00_cl_string_utils=>text( iv_text = 'Model &1 has failed' iv_1 = lv_dataset_id ).
        MESSAGE lv_msg TYPE 'S' DISPLAY LIKE 'E'.

      ENDIF.

    CATCH cx_root INTO lr_ex.

      "output errors
      lv_msg = lr_ex->get_text( ).
      PERFORM display_lines USING lv_msg.

  ENDTRY.

ENDFORM.
This is basically the same procedure as last time, when connecting AWS Machine Learning. Again I was fetching the data via a REST Service from the SNP Data Cockpit instance I am using to keep statistics on all executed SNP System Scans. However, you can basically fetch your data that will be used as a data source for your model in any way that you like. Most probably you will be using OpenSQL SELECTs to fetch the data accordingly. Just as a reminder, the results looked somewhat like this:


Prepare the Data

This is the raw data and it's not perfect! The data quality is not quite good and in the shape that it's in. According to this article there are some improvements that I need to do in order to improve its quality.

  1. Normalizing values (e.g. lower casing, mapping values or clustering values). E.g.
    • Combining the database vendor and the major version of the database because those two values only make sense when treated in combination and not individually
    • Clustering the database size to 1.5TB chunks as these values can be guessed easier when executing predictions
    • Clustering the runtime into exponentially increasing categories does not work with HCP Predictive Services as you only solve regression problems so far that rely on numeric values.
  2. Filling up empty values with reasonable defaults. E.g.
    • treating all unknown SAP client types as test clients
  3. Make values and field names more human readable. This is not necessary for the machine learning algorithms, but it makes for better manual result interpretation
  4. Removing fields that do not make good features, like 
    • IDs
    • fields that cannot be provided for later predictions, because values cannot be determined easily or intuitively
  5. Remove records that still do not have good data quality. E.g. missing values in
    • database vendors
    • SAP system types
    • customer industry
  6. Remove records that are not representative. E.g. 
    • they refer to scans with exceptionally short runtimes probably due to intentionally limiting the scope
    • small database sizes that are probably due to non productive systems
So the resulting coding to do this preparation and data cleansing looks almost the same as in the AWS Example:

FORM prepare_data USING it_data TYPE table CHANGING rr_data TYPE REF TO data.
*"--- DATA DEFINITION -------------------------------------------------
  DATA: lr_q TYPE REF TO /snp/cn01_cl_itab_query.

*"--- PROCESSING LOGIC ------------------------------------------------
  CREATE OBJECT lr_q.

  "selecting the fields that make good features
  lr_q->select( iv_field = 'COMP_VERSION'       iv_alias = 'SAP_SYSTEM_TYPE' ).
  lr_q->select( iv_field = 'DATABASE'           iv_uses_fields = 'NAME,VERSION' iv_cb_program = sy-repid iv_cb_form = 'ON_VIRTUAL_FIELD' ).
  lr_q->select( iv_field = 'DATABASE_SIZE'      iv_uses_fields = 'DB_USED' iv_cb_program = sy-repid iv_cb_form = 'ON_VIRTUAL_FIELD' ).
  lr_q->select( iv_field = 'OS'                 iv_alias = 'OPERATING_SYSTEM' ).
  lr_q->select( iv_field = 'SAP_CLIENT_TYPE'    iv_uses_fields = 'CCCATEGORY' iv_cb_program = sy-repid iv_cb_form = 'ON_VIRTUAL_FIELD'  ).
  lr_q->select( iv_field = 'COMPANY_INDUSTRY1'  iv_alias = 'INDUSTRY' ).
  lr_q->select( iv_field = 'IS_UNICODE'         iv_cb_program = sy-repid iv_cb_form = 'ON_VIRTUAL_FIELD' ).
  lr_q->select( iv_field = 'SCAN_VERSION' ).
  lr_q->select( iv_field = 'RUNTIME_MINUTES'    iv_ddic_type = 'INT4' ). "make sure this column is converted into a number

  "perform the query on the defined internal table
  lr_q->from( it_data ).

  "filter records that are not good for results
  lr_q->filter( iv_field = 'DATABASE'           iv_filter = '-' ). "no empty values in the database
  lr_q->filter( iv_field = 'SAP_SYSTEM_TYPE'    iv_filter = '-' ). "no empty values in the SAP System Type
  lr_q->filter( iv_field = 'INDUSTRY'           iv_filter = '-' ). "no empty values in the Industry
  lr_q->filter( iv_field = 'RUNTIME_MINUTES'    iv_filter = '>=10' ). "Minimum of 10 minutes runtime
  lr_q->filter( iv_field = 'DATABASE_GB_SIZE'   iv_filter = '>=50' ). "Minimum of 50 GB database size

  "sort by runtime
  lr_q->sort( 'RUNTIME_MINUTES' ).

  "execute the query
  rr_data = lr_q->run( ).

ENDFORM.
Basically the magic is done using the SNP/CN01_CL_ITAB_QUERY class, which is part of the SNP Transformation Backbone framework. It enables SQL like query capabilities on ABAP internal tables. This includes transforming field values, which is done using callback mechanisms.

FORM on_virtual_field USING iv_field is_record TYPE any CHANGING cv_value TYPE any.
*"--- DATA DEFINITION -------------------------------------------------
  DATA: lv_database TYPE string.
  DATA: lv_database_version TYPE string.
  DATA: lv_tmp TYPE string.
  DATA: lv_int TYPE i.
  DATA: lv_p(16) TYPE p DECIMALS 1.

  FIELD-SYMBOLS: <lv_value> TYPE any.

*"--- MACRO DEFINITION ------------------------------------------------
  DEFINE mac_get_field.
    clear: &2.
    assign component &1 of structure is_record to <lv_value>.
    if sy-subrc = 0.
      &2 = <lv_value>.
    else.
      return.
    endif.
  END-OF-DEFINITION.

*"--- PROCESSING LOGIC ------------------------------------------------
  CASE iv_field.
    WHEN 'DATABASE'.

      "combine database name and major version to one value
      mac_get_field 'NAME' lv_database.
      mac_get_field 'VERSION' lv_database_version.
      SPLIT lv_database_version AT '.' INTO lv_database_version lv_tmp.
      CONCATENATE lv_database lv_database_version INTO cv_value SEPARATED BY space.

    WHEN 'DATABASE_SIZE'.

      "categorize the database size into 1.5 TB chunks (e.g. "up to 4.5 TB")
      mac_get_field 'DB_USED' cv_value.
      lv_p = ( floor( cv_value / 1500 ) + 1 ) * '1.5'. "simple round to full 1.5TB chunks
      cv_value = /snp/cn00_cl_string_utils=>text( iv_text = 'up to &1 TB' iv_1 = lv_p ).
      TRANSLATE cv_value USING ',.'. "translate commas to dots to the CSV does not get confused

    WHEN 'SAP_CLIENT_TYPE'.

      "fill up the client category type with a default value
      mac_get_field 'CCCATEGORY' cv_value.
      IF cv_value IS INITIAL.
        cv_value = 'T'. "default to (T)est SAP client
      ENDIF.

    WHEN 'IS_UNICODE'.

      "convert the unicode flag into more human readable values
      IF cv_value = abap_true.
        cv_value = 'unicode'.
      ELSE.
        cv_value = 'non-unicode'.
      ENDIF.

  ENDCASE.

ENDFORM.
After that the data looks nice and cleaned up this time like this:


Creating the Dataset

In SAP HANA Cloud Platform Predictive Services you rather create a dataset. This is basically split up into:

  1. Creating a Database Table in a HANA Database 
  2. Uploading Data into that Database Table
  3. Registering the dataset 

This mainly corresponds to creating a datasource with AWS Machine Learning API. However, you do not explicitly train the dataset or create a model. This is done implicitly done - maybe. We'll discover more about that soon.

FORM create_dataset USING ir_hcp_machine_learning TYPE REF TO /snp/hcp01_cl_ml
                          it_table TYPE table
                 CHANGING rv_dataset_id.

  rv_dataset_id = ir_hcp_machine_learning->create_dataset(

    "...by creating a temporary table in a HCP HANA database
    "   instance from an internal table
    "   and inserting the records, so it can be used
    "   as a machine learning data set
    it_table = it_table

  ).

ENDFORM.

My API will create a temporary table for each interal table you are creating a dataset on. It's a column table without a primary key. All column types are determined automatically using runtime type inspection. If colums of the internal table are strings, I rather determine the length by scanning the content than creating CLOBs which are not suited well for Predictive Services.

Please note that uploading speed significantly suffers, if you are inserting content line-by-line, which is the case if cl_sql_statement does not support set_param_table on your release. This also was the case for my system, so I had to build that functionality myself.

After that it is finally time to find the key influencers, that affect the runtime of the SNP System Scan the most...

FORM find_key_influencers USING ir_ml TYPE REF TO /snp/hcp01_cl_ml
                                iv_dataset_id TYPE i.
*"--- DATA DEFINITION -------------------------------------------------
  DATA: lt_key_influencers TYPE /snp/hcp00_tab_key_influencer.

*"--- PROCESSING LOGIC ------------------------------------------------
  "...introspect the model, e.g. finding the features (=columns) that influence
  "   a given figure (=target column) in descending order
  lt_key_influencers = ir_ml->get_key_influencers(

    "which dataset should be inspected
    iv_dataset_id = iv_dataset_id

    "what is the target columns, for which the key influencers
    "should be calculated
    iv_target_column = 'RUNTIME_MINUTES'

    "how many key influencers should be calculated?
    iv_number_of_influencers = 5

  ).

  "DATABASE_SIZE:    37% Influence
  "DATABASE:         23% Influence (e.g. ORACLE 12, SAP HANA DB etc.)
  "SCAN_VERSION:     15% Influence
  "OPERATING_SYSTEM: 10% Influence
  "SAP_SYSTEM_TYPE    5% Influence (e.g. SAP R/3 4.6c; SAP ECC 6.0; S/4 HANA 16.10 etc.)

  "...remove dataset afterwards
  ir_ml->remove_dataset( iv_dataset_id ).

ENDFORM.


As mentioned above databse size, database vendor and scan version were not a suprise. I didn't think that the operating system would have such a big influence, as SAP NetWeaver is abstracting that away. I expected the SAP system type to have more of an influence, as I figured, that different data models will have a bigger impact on performance. So all in all not so many suprises, but then again, that makes the model trustworthy...

Challenges

Along the way I have found some challenges.


  • Authentication: I always seem to have a problem finding the simples things like which flags to set in the authentication mechanism. Just make sure to switch on "Trusted SAML 2.0 identiy provider", "User name and password", "Client Certificate" and "Application-to-Application SSO" on the FORM card of the Authentication Configuration and do not waste hours like me.
  • Upload Speed: As stated above, if you are inserting the contents of you internal table line-by-line you are significantly suffering performance. On the other hand inserting multiple 100k of records was not so much of a problem, once you untap mass insert/update. It may not be available in your ADBC implementation, depending on the version of your SAP NetWeaver stack, so consider backporting it from a newer system. It's definately worth it.
  • Table creation: I am a big fan of dynamic programming despite the performance penalties it has some times. However, when you are creating database tables to persist your dataset in you HCP  HANA database you have to make sure that columns are as narrow as possible for good performance or even relevance of your results.

Features of SAP HCP Predictive Analysis


  • Key Influencers: This is the use case that I have shown in this article
  • Scoring Equation: You can get the code that is doing the calculation of predictions either as an SQL query executable on any HANA database or a score card. The first is basically a decision tree, which can easily be transpiled into other languages and thereby be used for on-premise deliverys on the other hand this show, that the mechanics unterneath the SAP HCP Predictive Analysis application are currently quite simple, which I will dig into more in the conlusion below
  • Forecast: based on a time based column you can predict future values
  • Outliers: You can find exceptions in your dataset. While key influencers are more focussed on the structure, as they represent influencial columns to a result. Outliers show the exceptional rows to the calculated equation.
  • WhatIf: Simulates a planned action and return the significant changes resulting from it
  • Recommendations: Finding interesting products based on a purchase history by product and/or user. This can also be transferred to other recommendation scenarios.


Conclusion

So after this rather lengthy article I have come to a conclusion about SAP HCP Predictive Services, especially compared to AWS Machine Learning capabilities:

Pros
  • Business oriented scenarios: You definately do not have to think as much about good use cases. The API present them to you as shown in "Features" section above.
  • Fast: Calculation is really fast. Predictions are available almost instantaniously. Especially if benchmarked against AWS where building a datasource and training a model took well enough 10 minutes. But do not overestimate this, as the Cons will show you.
  • Introspection: Many services are about looking into the scenario. AWS is just about prediction at the moment. This transparency about dependencies inside the dataset were most interesting for me.
  • On Premise delivery of models via desicion trees: The fact that models are esposed as executable decision trees that can easily be transpiled into any other programming language makes on premise delivery possible. Basically prediction is effortless after doing so. But then again you have to manage the model life cycle and how updates to it are rolled out.
Cons
  • Higher Cost of Infrastructure: At least on the fixed cost part a productive HCP acount is not cheap. But then again there is no variable cost if you are able to deploy your models on premise.
  • Only Regression: Currently target figures have to be nummeric. So only regession problems can be solved. No classification problems. Of course HANA also has natural language processing on board but this is not availble for machine learning purposes per se.
  • Little options for manipulating learning: You just register a model. Nothing said about how training and validation is to be performed, how to normalize data on the platform and so on
  • Trading speed for quality: As stated registering a model is fast, introspecting it etc. is also very fast. But then again I was able to achive different results with the same dataset when I sorted it differently. Consistently. And not just offsetting the model by 2% but rather big time. For example, when sorting my dataset differently the key influencers turned out to be completely different ones. This is actually quite concerning. Maybe I am missing something, but maybe this is why training AWS models takes significantly longer, because they scramble datasets and run multiple passes over it to determine the best an most stable model.
While SAP HCP Predictive Services looks very promising, has good use cases and is appealing especially for it's transparency, stability and reliability have to improve before it's safe to rely on it for business decisions. Well I only have a HCP trial account at the moment, maybe this intentional. Let's see how predictive services on-premise on a HANA 2.0 database are doing...

Thursday, January 05, 2017

Creating AWS Machine Learning Models from ABAP

Hi guys,

extending my previous article about "Using AWS Machine Learning from ABAP to predict runtimes" I have now been able to extend the ABAP based API to create models from ABAP internal tables (which is like a collection of records, for the Non-ABAPers ;-).

This basically enables ABAP developers to utilize Machine Learning full cycle without ever having to leave their home turf or worry about the specifics of the AWS Machine Learning implementations.

My use case still is the same: Predicting runtimes of the SNP System Scan based on well known parameters like database vendor (e.g. Oracle, MaxDB), database size, SNP System Scan version and others. But since my first model was not quite meeting my expectations I wanted to be able to play around easily, adding and removing attributes from the model with a nice ABAP centric workflow. This probably makes it most effective for other ABAP developers to utilize Machine Learning. So let's take a look at the basic structure of the example program:

1:   REPORT /snp/aws01_ml_create_model.  
2:    
3:   START-OF-SELECTION.  
4:    PERFORM main.  
5:    
6:   FORM main.  
7:   *"--- DATA DEFINITION -------------------------------------------------  
8:    DATA: lr_scan_data TYPE REF TO data.  
9:    DATA: lr_prepared_data TYPE REF TO data.  
10:   DATA: lr_ml TYPE REF TO /snp/aws00_cl_ml.  
11:   DATA: lv_model_id TYPE string.  
12:   DATA: lr_ex TYPE REF TO cx_root.  
13:   DATA: lv_msg TYPE string.  
14:    
15:   FIELD-SYMBOLS: <lt_data> TYPE table.  
16:    
17:  *"--- PROCESSING LOGIC ------------------------------------------------  
18:   TRY.  
19:     "fetch the data into an internal table  
20:     PERFORM get_system_scan_data CHANGING lr_scan_data.  
21:     ASSIGN lr_scan_data->* TO <lt_data>.  
22:    
23:     "prepare data (e.g. convert, select features)  
24:     PERFORM prepare_data USING <lt_data> CHANGING lr_prepared_data.  
25:     ASSIGN lr_prepared_data->* TO <lt_data>.  
26:    
27:     "create a model  
28:     CREATE OBJECT lr_ml.  
29:     PERFORM create_model USING lr_ml <lt_data> CHANGING lv_model_id.  
30:    
31:     "check if...  
32:     IF lr_ml->is_ready( lv_model_id ) = abap_true.  
33:    
34:      "...creation was successful  
35:      lv_msg = /snp/cn00_cl_string_utils=>text( iv_text = 'Model &1 is ready' iv_1 = lv_model_id ).  
36:      MESSAGE lv_msg TYPE 'S'.  
37:    
38:     ELSEIF lr_ml->is_failed( lv_model_id ) = abap_true.  
39:    
40:      "...creation failed  
41:      lv_msg = /snp/cn00_cl_string_utils=>text( iv_text = 'Model &1 has failed' iv_1 = lv_model_id ).  
42:      MESSAGE lv_msg TYPE 'S' DISPLAY LIKE 'E'.  
43:    
44:     ENDIF.  
45:    
46:    CATCH cx_root INTO lr_ex.  
47:    
48:     "output errors  
49:     lv_msg = lr_ex->get_text( ).  
50:     PERFORM display_lines USING lv_msg.  
51:    
52:   ENDTRY.  
53:    
54:  ENDFORM.  

And now let's break it down into it's individual parts:

Fetch Data into an Internal Table

In my particular case I was fetching the data via a REST Service from the SNP Data Cockpit instance I am using to keep statistics on all executed SNP System Scans. However, you can basically fetch your data that will be used as a data source for your model in any way that you like. Most probably you will be using OpenSQL SELECTs to fetch the data accordingly. Resulting data looks somewhat like this:

Prepare Data

This is the raw data and it's not perfect! The data quality is not quite good and in the shape that it's in. According to this article there are some improvements that I need to do in order to improve its quality.
  • Normalizing values (e.g. lower casing, mapping values or clustering values). E.g.
    • Combining the database vendor and the major version of the database because those two values only make sense when treated in combination and not individually
    • Clustering the database size to 1.5TB chunks as these values can be guessed easier when executing predictions
    • Clustering the runtime into exponentially increasing categories (although this may also hurt accuracy...)
  • Filling up empty values with reasonable defaults. E.g.
    • treating all unknown SAP client types as test clients
  • Make values and field names more human readable. This is not necessary for the machine learning algorithms, but it makes for better manual result interpretation
  • Removing fields that do not make good features, like 
    • IDs
    • fields that cannot be provided for later predictions, because values cannot be determined easily or intuitively
  • Remove records that still do not have good data quality. E.g. missing values in
    • database vendors
    • SAP system types
    • customer industry
  • Remove records that are not representative. E.g. 
    • they refer to scans with exceptionally short runtimes probably due to intentionally limiting the scope
    • small database sizes that are probably due to non productive systems
1:   FORM prepare_data USING it_data TYPE table CHANGING rr_data TYPE REF TO data.  
2:   *"--- DATA DEFINITION -------------------------------------------------  
3:    DATA: lr_q TYPE REF TO /snp/cn01_cl_itab_query.  
4:    
5:   *"--- PROCESSING LOGIC ------------------------------------------------  
6:    CREATE OBJECT lr_q.  
7:    
8:    "selecting the fields that make good features  
9:    lr_q->select( iv_field = 'COMP_VERSION'      iv_alias = 'SAP_SYSTEM_TYPE' ).  
10:   lr_q->select( iv_field = 'DATABASE'          iv_uses_fields = 'NAME,VERSION' iv_cb_program = sy-repid iv_cb_form = 'ON_VIRTUAL_FIELD' ).  
11:   lr_q->select( iv_field = 'DATABASE_SIZE'     iv_uses_fields = 'DB_USED' iv_cb_program = sy-repid iv_cb_form = 'ON_VIRTUAL_FIELD' ).  
12:   lr_q->select( iv_field = 'OS'                iv_alias = 'OPERATING_SYSTEM' ).  
13:   lr_q->select( iv_field = 'SAP_CLIENT_TYPE'   iv_uses_fields = 'CCCATEGORY' iv_cb_program = sy-repid iv_cb_form = 'ON_VIRTUAL_FIELD' ).  
14:   lr_q->select( iv_field = 'COMPANY_INDUSTRY1' iv_alias = 'INDUSTRY' ).  
15:   lr_q->select( iv_field = 'IS_UNICODE'        iv_cb_program = sy-repid iv_cb_form = 'ON_VIRTUAL_FIELD' ).  
16:   lr_q->select( iv_field = 'SCAN_VERSION' ).  
17:   lr_q->select( iv_field = 'RUNTIME'           iv_uses_fields = 'RUNTIME_HOURS' iv_cb_program = sy-repid iv_cb_form = 'ON_VIRTUAL_FIELD' ).  
18:    
19:   "perform the query on the defined internal table  
20:   lr_q->from( it_data ).  
21:    
22:   "filter records that are not good for results  
23:   lr_q->filter( iv_field = 'DATABASE'         iv_filter = '-' ). "no empty values in the database  
24:   lr_q->filter( iv_field = 'SAP_SYSTEM_TYPE'  iv_filter = '-' ). "no empty values in the SAP System Type  
25:   lr_q->filter( iv_field = 'INDUSTRY'         iv_filter = '-' ). "no empty values in the Industry  
26:   lr_q->filter( iv_field = 'RUNTIME_MINUTES'  iv_filter = '>=10' ). "Minimum of 10 minutes runtime  
27:   lr_q->filter( iv_field = 'DATABASE_GB_SIZE' iv_filter = '>=50' ). "Minimum of 50 GB database size  
28:    
29:   "sort by runtime  
30:   lr_q->sort( 'RUNTIME_MINUTES' ).  
31:    
32:   "execute the query  
33:   rr_data = lr_q->run( ).  
34:    
35:  ENDFORM.  

Basically the magic is done using the SNP/CN01_CL_ITAB_QUERY class, which is part of the SNP Transformation Backbone framework. It enables SQL like query capabilities on ABAP internal tables. This includes transforming field values, which is done using callback mechanisms.


1:   FORM on_virtual_field USING iv_field is_record TYPE any CHANGING cv_value TYPE any.  
2:   
3:    "...  
4:    
5:    CASE iv_field.  
6:     WHEN 'DATABASE'.  
7:    
8:      "combine database name and major version to one value  
9:      mac_get_field 'NAME' lv_database.  
10:     mac_get_field 'VERSION' lv_database_version.  
11:     SPLIT lv_database_version AT '.' INTO lv_database_version lv_tmp.  
12:     CONCATENATE lv_database lv_database_version INTO cv_value SEPARATED BY space.  
13:    
14:    WHEN 'DATABASE_SIZE'.  
15:    
16:     "categorize the database size into 1.5 TB chunks (e.g. "up to 4.5 TB")  
17:     mac_get_field 'DB_USED' cv_value.  
18:     lv_p = ( floor( cv_value / 1500 ) + 1 ) * '1.5'. "simple round to full 1.5TB chunks  
19:     cv_value = /snp/cn00_cl_string_utils=>text( iv_text = 'up to &1 TB' iv_1 = lv_p ).  
20:     TRANSLATE cv_value USING ',.'. "translate commas to dots to the CSV does not get confused  
21:    
22:    WHEN 'SAP_CLIENT_TYPE'.  
23:    
24:     "fill up the client category type with a default value  
25:     mac_get_field 'CCCATEGORY' cv_value.  
26:     IF cv_value IS INITIAL.  
27:      cv_value = 'T'. "default to (T)est SAP client  
28:     ENDIF.  
29:    
30:    WHEN 'IS_UNICODE'.  
31:    
32:     "convert the unicode flag into more human readable values  
33:     IF cv_value = abap_true.  
34:      cv_value = 'unicode'.  
35:     ELSE.  
36:      cv_value = 'non-unicode'.  
37:     ENDIF.  
38:    
39:    WHEN 'RUNTIME'.  
40:    
41:     "categorize the runtime into human readable chunks  
42:     mac_get_field 'RUNTIME_HOURS' lv_int.  
43:     IF lv_int <= 1.  
44:      cv_value = 'up to 1 hour'.  
45:     ELSEIF lv_int <= 2.  
46:      cv_value = 'up to 2 hours'.  
47:     ELSEIF lv_int <= 3.  
48:      cv_value = 'up to 3 hours'.  
49:     ELSEIF lv_int <= 4.  
50:      cv_value = 'up to 4 hours'.  
51:     ELSEIF lv_int <= 5.  
52:      cv_value = 'up to 5 hours'.  
53:     ELSEIF lv_int <= 6.  
54:      cv_value = 'up to 6 hours'.  
55:     ELSEIF lv_int <= 12.  
56:      cv_value = 'up to 12 hours'.  
57:     ELSEIF lv_int <= 24.  
58:      cv_value = 'up to 1 day'.  
59:     ELSEIF lv_int <= 48.  
60:      cv_value = 'up to 2 days'.  
61:     ELSEIF lv_int <= 72.  
62:      cv_value = 'up to 3 days'.  
63:     ELSE.  
64:      cv_value = 'more than 3 days'.  
65:     ENDIF.  
66:    
67:   ENDCASE.  
68:    
69:  ENDFORM.  

After running all those preparations, the data is transformed into a record set that looks like this:


Create a Model

Ok, preparing data for a model is something that the developer has to do for each individual problem he wants to solve. But I guess this is done better if performed in a well known environment. After all this is the whole purpose of the ABAP API. Now we get to the parts that's easy, as creating the model based on the internal table we have prepared so far is fully automated. As a developer you are completely relieved from the following tasks:

  • Converting the internal table into CSV
  • Uploading it into an AWS S3 bucket and assigning the correct priviledges, so it can be used for machine learning
  • Creating a data source based on the just uploaded AWS S3 object and providing the input schema (e.g. which fields are category fields, which ones are numeric etc.). As this information can automatically be derived from DDIC information
  • Creating a model from the datasource
  • Training the model
  • Creating an URL Endpoint so the model can be used for predictions as seen in the previous article.
That's quite a lot of stuff, that you do not need to do anymore. Doing all this is just one API call away:

1:   FORM create_model USING ir_aws_machine_learning TYPE REF TO /snp/aws00_cl_ml  
2:                           it_table TYPE table  
3:                  CHANGING rv_model_id.  
4:    
5:     rv_model_id = ir_aws_machine_learning->create_model(  
6:    
7:     "...by creating a CSV file from an internal table  
8:     "  and upload it to AWS S3, so it can be used  
9:     "  as a machine learning data source  
10:    it_table = it_table  
11:    
12:    "...by defining a target field that is used  
13:    iv_target_field = 'RUNTIME'  
14:    
15:    "...(optional) by defining a title  
16:    iv_title = 'Model for SNP System Scan Runtimes'  
17:    
18:    "...(optional) to create an endpoint, so the model  
19:    "  can be used for predictions. This defaults to  
20:    "  true, but you may want to switch it off  
21:    
22:    " IV_CREATE_ENDPOINT = ABAP_FALSE  
23:    
24:    "...(optional) by defining fields that should be  
25:    "  treated as text rather than as a category.  
26:    "  By default all character based fields are treated  
27:    "  as categorical fields  
28:    
29:    " IV_TEXT_FIELDS = 'COMMA,SEPARATED,LIST,OF,FIELDNAMES'  
30:    
31:    "...(optional) by defining fields that should be  
32:    "  treated as numerical fields rather than categorical  
33:    "  fields. By detault the type will be derived from the  
34:    "  underlying data type, but for convenience reasons  
35:    "  you may want to use this instead of creating and  
36:    "  filling a completely new structure  
37:    
38:    " IV_NUMERIC_FIELDS = 'COMMA,SEPARATED,LIST,OF,FIELDNAMES'  
39:    
40:    "...(optional) by defining if you want to create the model  
41:    "  synchronously or asynchronously. By default a the  
42:    "  datasource, model, evaluation and endpoint are created  
43:    "  synchronously so that after returning from the method call  
44:    "  you can immediately start with predictions.  
45:    
46:    " IV_WAIT = ABAP_TRUE by default  
47:    " IV_SHOW_PROGRESS = ABAP_TRUE by default  
48:    " IV_REFRESH_RATE_IN_SECS = 5 seconds by default  
49:    
50:   ).  
51:    
52:  ENDFORM.  

As you see, most stuff is optional. Sane default values are provided that assume synchronously uploading the data, creating the datasource, model, training and endpoint. So you can directly perform predictions afterwards. Creating all of this in an asynchronous fashion is also possible. Just in case you do not rely on performing predictions directly. After all, the whole process takes up 10 to 15 minutes - which is why showing progress becomes important, especially since you do not want to run into time out situations, when doing this in online mode with a GUI connected.

The Result

After all is done, you can perform predictions. Right let's just hop over into AWS machine learning console and see the results:

A CSV file was created in an AWS S3 bucket...


...then a datasource, ML model and an evaluation for training the model were created (also an endpoint, but the screenshot does not show it) ...


...and finally we can inspect the model performance.

Conclusion

This is a big step towards making Machine Learning available to many without the explicit need to cope with vendor specific aspects. However understanding the principles of machine learning, especially in regards to the problems, you can apply it to and what good data quality means for good predictions is a requirement.

Machine Learning Recipes

A cool series for learning the principles of machine learning...














Friday, December 09, 2016

Voice User Interface - The New UI



It's amazing how quickly the AI topic develops. In the past few months, I have seens so much new stuff emerging.

When I first saw this, I was amazed (happens a lot lately - maybe I am just too easy to motivate ;-). Then I thought: well voice enabled computing isn't new really, I mean all the hotlines do it. So while the Conversion UI basically serves the same purpose as traditional hotline conversation machines, the new thing about  it is really the context based natural language interaction opposed to: request - response that it was in the past.


Goal for user:
Triggering a particular service


Goal for the AI:
Fetching all neccessary parameters from a natural language interaction in order to invoke the service.


Benefits for the User:
Natural language conversation, where the AI considers context (what has been said during the conversation, what environment the user is currently in (etc. location, time). Habits of the user are also taken into account as the history of how the user has been interacting with Google devices and services is tracked.


Benefits for the Developer:
Declarative approach to defining the conversation. It's based on examples rather than fixed rules, that the service can then turn into algorithms using it's AI.









Tuesday, November 29, 2016

Enterprise Machine Learning in a Nutshell

What I stumbled across a lot lately is the issue of clearly identifying business problems that are applicable to machine learning, especially opposed to traditional programming solutions... The openSAP course "Enterprise Machine Learning in a Nutshell" does an excellent job in exactly that!

I will probably dive much deeper into this topic - many of my companies issues in knowledge management seem to be perfectly applicable to machine learning. Many issues in that field are regarding classification, that enable effective communication of knowlege in a fast growing organization.

Generally speaking machine learning is best applicable on highly repetative tasks (much data available) which has been done for a while. You can define success and failure for the training and testing data but solving the task is performed more along the lines of human intuition and experience.




Thursday, November 17, 2016

A.I. Experiments

Sometimes I stumble across things that are just jaw dropping. I have read a lot of articles and watched a lot of YouTube videos about AI. Google has great platforms and technology to power this, but what really is fascinating when you find something that steps from behind the glass wall of "me staring at it astonished" to "oh, hi there, let's play". This is what


A.I. Experiments

is. An experimental Site where you can find an A.I. driven virtual playground. It may seem a little childish at first but then again children grow up really fast and become adults. This is something you don't want to be missing... not just watching but being a part of it.

I got caught on watching this...



...and then this...



...and guess what: you can try that for yourself!



And this here could be something, really useful for analytic technology fields. Probably one can find way's to apply this kind of pattern recognition to Business Intelligence releated questions. Isn't BI all about people trying to find patterns in data using visual tools like graphs and charts. We'll maybe machines can do that job quite well finding patterns in massive amounts of data...




Thursday, November 10, 2016

MarI/O - Machine Learning for Video Games

For everybody who want's to get started with the concepts of machine learning in a fun way... As far as I get it, the basic idea to mimic evolutionary or learning based on experience.

A computer player in this scenario basically needs:

  1. Senses like sight: Abstracting the actual screen to individual items like save ground, blocks/walls, gaps, enemies. Fortunately platforming games like Mario support abstraction really well as sprites can be though of as blocks of the above mentioned types
  2. The ability to push buttons on the controller: Well there are APIs for that, right ;-)
  3. Knowing what success means: e.g. advancing to the right side of the level without getting killed in the shortest amount of time
  4. Patience, Endurance or the ability to massively parallelize the process of trying, trying, trying... as you are basically brute forcing every combination of button presses. But fortunately when you are failing you will have a sense of how successful you were based on how far you got in the level and how quickly you got there. Then you can take that as a starting point for further tries... or the next generation when shifting to evolutionary terms...
Worth a read is this paper which explains it much better than I tried (maybe my children will be more successful in explaining ;-)



And while we are at it... here goes Mario Kart

Software Development on SAP HANA (Update Q4/2016)



In this course, we will focus on the new and improved features that were introduced in SAP HANA SPS 11 and 12. Developers taking this course should be able to get up to speed quickly and begin leveraging these new features to enhance their own productivity, as well…

When I just finished up first week's lectures and assignments I must admit I finally feel home. With XSA finally open and industry standards are adopted, rathen than reinventing the wheel over and over again:

  • Using GIT as a versioning repository instead of storing design time artefacts in database tables of the production system
  • Incorporating concepts of container based isolation opposed to "single-process-rules-them-all" approaches
  • Building on Cloud Foundary and thereby being much more open for existing applications and runtimes being migrated and then run on SAP HANA
  • Abstracting application users from technical users for accessing system or database ressources while unifying authentication and authorization
Now the HANA Platform opens itself to the rich ecosystem of that open source has to offer. As a developer I can use existing know how from non SAP backgrounds and maybe even new patterns and technologies emerge that you can take away from HANA and apply it in other areas and on other platforms.

Yes, it finally feels like the right direction.



Wednesday, November 09, 2016

Google Slides API may power my future Slide Applications

So now Google is publishing its Slides API for programmatic use. This opens up a whole new world of slide generation.



This is especially interesting because I have previously been building my own REST Services for a Slide Service that we use for our own SaaS products at SNP Schneider-Neureither & Partner AG - such as the SNP System Scan. Results may look like this and a fully generated using a home-grown REST Service API.



open fullscreen in new window

As far as the REST API is concerned it takes some slide definition in JSON fomat that looks something like this:


{
   "title":"Fun with auto generated Slide Show",
   "author":"Dominik Wittenbeck",
   "subtitle":"",
   "header":"SNP Slideshow",
   "footer":"
",
   "slides":[
      {
         "id":"005056BF5BE41EE6A9D91E8EC1102DD3",
         "title":"First Slide with some HTML",
         "topline":"SNP Slides Showcase",
         "html":[
            "..."
         ]
      },
      {
         "id":"005056BF5BE41EE6A9D91E8EC1104DD3",
         "title":"Second Slide with child slides",
         "topline":"SNP Slides Showcase",
         "items":[
            {
               "id":"005056BF5BE41EE6A9D91E8EC1106DD3",
               "title":"Second Slide with child slides",
               "topline":"SNP Slides Showcase",
               "html":[
                  "..."
               ]
            },
            {
               "id":"005056BF5BE41EE6A9D91E8EC1108DD3",
               "title":"Child Slide 2",
               "topline":"SNP Slides Showcase",
               "html":[
                  "..."
               ]
            }
         ]
      },
      {
         "id":"005056BF5BE41EE6A9D91E8EC110ADD3",
         "title":"Third Slide with a Chart",
         "topline":"SNP Slides Showcase",
         "layout":"vertical",
         "html":[
            "...",
            "..."
         ]
      }
   ]
}


Besides the REST API I have build additional higher level APIs, that can used e.g. directly in ABAP.

REPORT zdwi_generate_slides_test.
*"--- DATA DEFINITION -------------------------------------------------
TYPE-POOLSabap.

*"--- PROCESSING LOGIC ------------------------------------------------
START-OF-SELECTION.
  PERFORM main.

FORM main.
*"--- DATA DEFINITION -------------------------------------------------
  DATAlr_deck TYPE REF TO /snp/cn02_cl_slidedeck.
  DATAlr_slide TYPE REF TO /snp/cn02_cl_slide.
  DATAlr_sub_slide TYPE REF TO /snp/cn02_cl_slide.
  DATAlr_chart TYPE REF TO /snp/cn02_cl_slide_chart.
  DATAlv_id TYPE string.
  DATAlt_t000 TYPE TABLE OF t000.
  DATAlv_layout TYPE string.
  DATAlv_html TYPE string.
  DATAlv_url TYPE string.

*"--- PROCESSING LOGIC ------------------------------------------------
  lv_id /snp/cn00_cl_string_utils=>uuid).

  "Generate the slidedeck
  lr_deck /snp/cn02_cl_slidedeck=>create(
    iv_id lv_id
    iv_title 'Fun with auto generated Slide Show'
    iv_author 'Dominik Wittenbeck'
    iv_header 'SNP Slideshow'
    iv_footer '<br/>'
  ).

  "--- Add first slide with some HTML Content
  lr_slide /snp/cn02_cl_slide=>create(
    iv_title 'First Slide with some HTML'
    iv_topline 'SNP Slides Showcase'
  ).

  CONCATENATE
    '<h1>' 'Headline' '</h1>'
    '<p>'
      'Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam'
      'nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam'
      'erat, sed diam voluptua. At vero eos et accusam et justo duo dolores'
      'et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est'
      'Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur'
      'sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore'
      'et dolore magna aliquyam erat, sed diam voluptua. At vero eos'
      'et accusam et justo duo dolores et ea rebum. Stet clita kasd'
      'gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.'
    '</p>'
  INTO lv_html SEPARATED BY space.
  lr_slide->add_htmllv_html ).
  lr_deck->add_slidelr_slide ).


  "--- Create a second Slide with child slides
  lr_slide /snp/cn02_cl_slide=>create(
    iv_title 'Second Slide with child slides'
    iv_topline 'SNP Slides Showcase'
  ).

  "...with one child slide...
  lr_sub_slide /snp/cn02_cl_slide=>create(
    iv_title 'Second Slide with child slides'
    iv_topline 'SNP Slides Showcase'
  ).

  CONCATENATE
    '<p>'
      'Check out the arrows on the lower right, this slide has another child slide'
    '</p>'
  INTO lv_html.

  lr_sub_slide->add_htmllv_html ).
  lr_slide->add_slidelr_sub_slide ).

  "...and a second child slide...
  lr_sub_slide /snp/cn02_cl_slide=>create(
    iv_title 'Child Slide 2'
    iv_topline 'SNP Slides Showcase'
  ).

  lr_sub_slide->add_html'Content of child slide 2' ).
  lr_slide->add_slidelr_sub_slide ).

  "...oh, and don't forget to add the main slide to the deck ;-)
  lr_deck->add_slidelr_slide ).


  "--- On the 3rd Slide letzt incorporate some data
  "Let's just fetch basic information about all clients...
  SELECT FROM t000 INTO TABLE lt_t000.

  "also split that slide into several parts using a layout
  lr_slide /snp/cn02_cl_slide=>create(
    iv_title 'Third Slide with a Chart'
    iv_topline 'SNP Slides Showcase'
    iv_layout 'vertical'
  ).

  "...and put that data in a bar chart in the
  " first part of the layout (=left side)
  lr_chart /snp/cn02_cl_slide_chart=>create_bar).
  lr_chart->set_data(
    it_data lt_t000
    iv_x_columns 'ORT01' "Show number of clients per location
  ).
  lr_slide->add_chartlr_chart ).

  "...and put some descriptive text to the second part of
  " the layout (=right side)
  CONCATENATE
    '<p>'
      'This is some descriptive text for the chart'
    '</p>'
    '<ul>'
      '<li>' 'and while' '</li>'
      '<li>' 'we''re at it, let''s' '</li>'
      '<li>' 'have a few bullet points' '</li>'
    '</ul>'
  INTO lv_html SEPARATED BY space.

  lr_slide->add_htmllv_html ).


  "...oh, and don't forget to add the main slide to the
  " deck... again  ;-)
  lr_deck->add_slidelr_slide ).

  "Publish the slide deck via the REST Service and Report
  " back the URL that would show it in a browser
  lv_url lr_deck->get_url).
  WRITE/ lv_url.

ENDFORM.

So with the newly published Google Slides API maybe I could take this one step further....

Friday, October 21, 2016

Semantic Notation – The Next Big Thing in BI?


From a methodological viewpoint, the visualization of management information is very similar to visualization in engineering drawings, the visualization of music, and visualizations in many other disciplines such