Class PentahoDataTransmuter

  • All Implemented Interfaces:
    IPentahoDataTypes

    public class PentahoDataTransmuter
    extends DataUtilities
    Author:
    wseyler Provides various methods to transmutes an IPentahoResultSet such that the resulting IPentahoResultSet dimensionality can be used for different purposes (ie creating a pie or bar chart)
    • Constructor Detail

      • PentahoDataTransmuter

        public PentahoDataTransmuter​(IPentahoResultSet resultSet)
    • Method Detail

      • getSourceResultSet

        public IPentahoResultSet getSourceResultSet()
        Returns:
        Returns the sourceResultSet.
      • transmute

        public static IPentahoResultSet transmute​(IPentahoResultSet source,
                                                  String[] columnForRowHeaders,
                                                  String[] rowForColumnHeaders,
                                                  String[][] rowsToInclude,
                                                  String[][] columnsToInclude,
                                                  boolean pivot)
        Returns a memory result set that represents a grid of data and it's associated headers.
        Parameters:
        source - - The source IPentahoResultSet
        columnForRowHeaders - - a 0 based column number whose data will be used as the row headers. If null and rowHeaders exist in the source then the rowHeaders will be returned otherwise the first column (column 0) will be used for row headers
        rowForColumnHeaders - - a 0 based row number whose data will be used as the columnHeaders. If null and columnHeaders exist in the source then the columnHeaders will be returned. Otherwise the first row (row 0) will be used for the columnHeaders
        columnsToInclude - - An 2D String array where each row is a String[] representing a column header[] of a column to include in the result.
        rowsToInclude - - An 2D String array where each row is a String[] representing a row header[] of a row to include in the result.
        pivot - - pivot the entire IPentahoResultSet - NOTE this occurs last in the process. Other params referencing rows and colums are based on the source before it is pivoted.
        Returns:
        - a copy of the result set containing the requested data.
      • transmute

        public static IPentahoResultSet transmute​(IPentahoResultSet source,
                                                  Integer columnForRowHeaders,
                                                  Integer rowForColumnHeaders,
                                                  Integer[] rowsToInclude,
                                                  Integer[] columnsToInclude,
                                                  boolean pivot)
        Returns a memory result set that represents a grid of data and it's associated headers.
        Parameters:
        source - - The source IPentahoResultSet
        columnForRowHeaders - - a 0 based column number whose data will be used as the row headers. If null and rowHeaders exist in the source then the rowHeaders will be returned otherwise the first column (column 0) will be used for row headers
        rowForColumnHeaders - - a 0 based row number whose data will be used as the columnHeaders. If null and columnHeaders exist in the source then the columnHeaders will be returned. Otherwise the first row (row 0) will be used for the columnHeaders
        columnsToInclude - - An integer array of columns to include in the result.
        rowToInclude - - An integer array of rows to include in the result
        pivot - - pivot the entire IPentahoResultSet - NOTE this occurs last in the process. Other params referencing rows and colums are based on the source before it is pivoted.
        Returns:
        - a copy of the result set containing the requested data.
      • pivot

        public static IPentahoResultSet pivot​(IPentahoResultSet resultSet)
        This rotates a IPentahoResultSet such that the row and column data are reversed and the data headers are also rotated. ie source. |Cheader1,0|CHeader1,1|CHeader1,2|Cheader1,3| |CHeader0,0|CHeader0,1|CHeader0,2|CHeader0,3| --------------------------------------------- |RHeader0,2|RHeader0,1|RHeader0,0 |Data0,0 |Data0,1 |Data0,2 |Data 0,3 | |RHeader1,2|RHeader1,1|RHeader1,0 |Data1,0 |Data1,1 |Data1,2 |Data 1,3 | |RHeader2,2|RHeader2,1|RHeader2,0 |Data2,0 |Data2,1 |Data2,2 |Data 2,3 | |RHeader3,2|RHeader3,1|RHeader3,0 |Data3,0 |Data3,1 |Data3,2 |Data 3,3 | |RHeader4,2|RHeader4,1|RHeader4,0 |Data4,0 |Data4,1 |Data4,2 |Data 4,3 | |RHeader5,2|RHeader5,1|RHeader5,0 |Data5,0 |Data5,1 |Data5,2 |Data 5,3 | becomes: |RHeader0,2|RHeader1,2|RHeader2,2|RHeader3,2|RHeader4,2|RHeader5,2| |RHeader0,1|RHeader1,1|RHeader2,1|RHeader3,1|RHeader4,1|RHeader5,1| |RHeader0,0|RHeader1,0|RHeader2,0|RHeader3,0|RHeader4,0|RHeader5,0| ------------------------------------------------------------------- |CHeader1,0|CHeader0,0 |Data0,0 |Data1,0 |Data2,0 |Data3,0 |Data4,0 |Data5,0 | |CHeader1,1|CHeader0,1 |Data0,1 |Data1,1 |Data2,1 |Data3,1 |Data4,1 |Data5,1 | |CHeader1,2|CHeader0,2 |Data0,2 |Data1,2 |Data2,2 |Data3,2 |Data4,2 |Data5,2 | |CHeader1,3|CHeader0,3 |Data0,3 |Data1,3 |Data2,3 |Data3,3 |Data4,3 |Data5,3 |
        Parameters:
        resultSet - and IPentahoResultSet on which to operate. Note that this parameter remains unchaged after this call. The original data is preserved
        Returns:
        an IPentahoResultSet that represents the rotated IPentahoResultSet argumnent
      • rowNamesToIndexes

        public Integer[] rowNamesToIndexes​(String[][] names)
      • rowNamesToIndexes

        public static Integer[] rowNamesToIndexes​(IPentahoResultSet source,
                                                  String[][] names)
        Returns an array of row Indexes taht match the names parameter
        Parameters:
        source - IPentahoResultSet to operate agains
        names - a 2D string array[rows][columns] where each row is a string array to compare against the row headers
        Returns:
        an array of Integers that represes the selected rows
      • columnNamesToIndexes

        public Integer[] columnNamesToIndexes​(String[][] names)
      • columnNamesToIndexes

        public static Integer[] columnNamesToIndexes​(IPentahoResultSet source,
                                                     String[][] names)
        Returns an array of column Indexes that match the names parameter
        Parameters:
        source - IPentahoResultSet to operate against
        names - a 2D string array[rows][columns] where each row is a string array to compare against the column headers
        Returns:
        an array of Integers that represents the selected columns
      • constructColumnHeaders

        protected static Object[][] constructColumnHeaders​(IPentahoResultSet source,
                                                           Integer rowForColumnHeaders)
        Parameters:
        source -
        rowForColumnHeaders -
        Returns:
      • constructRowHeaders

        protected static Object[][] constructRowHeaders​(IPentahoResultSet source,
                                                        Integer columnForRowHeaders)
        Constructs the headers based on the rule as stated in the columnForRowHeaders param.
        Parameters:
        source - - The source IPentahoResultSet
        columnForRowHeaders - - a 0 based column number whose data will be used as the row headers. If null and rowHeaders exist in the source then the rowHeaders will be flattened and returned otherwise the first column (column 0) will be used for row headers
        Returns:
        2D array representing the manufactured rowHeaders
      • swapAndPivotRowAndColumnHeaders

        protected static IPentahoMetaData swapAndPivotRowAndColumnHeaders​(IPentahoMetaData metaData)
        Parameters:
        metaData - source
        Returns:
        IPentahoMetaData that contain the swapped and pivoted Row and ColumnHeaders
      • getCollapsedHeaders

        public String[] getCollapsedHeaders​(int axis,
                                            char seperator)
                                     throws Exception
        Throws:
        Exception
      • getCollapsedHeaders

        public static String[] getCollapsedHeaders​(int axis,
                                                   IPentahoResultSet resultSet,
                                                   char seperator)
                                            throws Exception
        Returns a string array where each element represents the concatenations of the headers for a sing column. Each concatenation is seperated by the parameter "seperator"
        Parameters:
        axis - row or column headers
        resultSet - to headers from
        a - character that represents the seprator between entities of a column header
        Returns:
        a String array that represents a fully qualified column headers
        Throws:
        Exception
      • flattenResultSet

        public static IPentahoResultSet flattenResultSet​(IPentahoResultSet resultSet,
                                                         int squashColumn)
        Flatten a resultset based on a particular column, new values for each row trigger a new row in the flattened resultset.
        Returns:
        IPentahoResultSet
      • crossTab

        public static IPentahoResultSet crossTab​(IPentahoResultSet source,
                                                 int columnToPivot,
                                                 int measureColumn,
                                                 Format pivotDataFormatter,
                                                 boolean orderedMaps)
        This method takes a column of data, and turns it into multiple columns based on the values within the column. The measure column specified is then distributed among the newly created columns. Sparse data is handled by populating missing cells with nulls.
        Parameters:
        source - The starting IPentahoResultSet
        columnToPivot - The column that becomes multiple columns
        measureColumn - The measures column to distribute to the new columns created
        pivotDataFormatter - If the column to pivot requires formatting, this is the formatter to use
        orderedMaps - If true, will sort the new column names alphabetically. If false, the colums will be created in the order of appearance in the rows
        Returns:
        IPentahoResultSet containing crosstabbed data.
      • crossTab

        public static IPentahoResultSet crossTab​(IPentahoResultSet source,
                                                 int columnToPivot,
                                                 int measureColumn,
                                                 int columnToSortColumnsBy,
                                                 Format pivotDataFormatter,
                                                 Format sortDataFormatter,
                                                 boolean orderedMaps)
        This method takes a column of data, and turns it into multiple columns based on the values within the column. The measure column specified is then distributed among the newly created columns. Sparse data is handled by populating missing cells with nulls. This version of the method also takes two additional parameters - the column to sort the new columns by, and a formatter for that column.
        Parameters:
        source - The starting IPentahoResultSet
        columnToPivot - The column that becomes multiple columns
        measureColumn - The measures column to distribute to the new columns created
        columnToSortColumnsBy - The column to use to sort the newly created columns by
        pivotDataFormatter - If the column to pivot requires formatting, this is the formatter to use
        sortDataFormatter - The formatter to use to convert the sort column to a string
        orderedMaps - If true, will sort the new column names alphabetically. If false, the colums will be created in the order of appearance in the rows.
        Returns:
        IPentahoResultSet containing crosstabbed data.
      • crossTabOrdered

        public static IPentahoResultSet crossTabOrdered​(IPentahoResultSet source,
                                                        int columnToPivot,
                                                        int measureColumn,
                                                        Format pivotDataFormatter)
        Marc Batchelor Diatribe It's 6am, I've been up all night, but I gotta right this down. So, here goes. This crosstab function is similar to the "old" version exception that it works in many more cases than the old one did. The only requirement is that either the data is ordered by the left-over columns that will indicate that the row is a continuation of the previous row, or the method is handed a column that will be used to find unique rows in the result set. The old crosstab function used maps for each data element of a row. This only works in very certain conditions, but is extremely fragile. But, that approach allowed the same row of data to be spread out all over the result set. So, with better capability and performance comes a limitation. This can best be seen by attempting to crosstab a simple query from the sample data (and crosstabbing on REGION): select REGION, DEPARTMENT, SUM(ACTUAL) as ACTUAL from quadrant_actuals group by REGION, DEPARTMENT order by REGION, DEPARTMENT REGION DEPARTMENT ACTUAL Central Executive Management 1776282 Central Finance 3106680 ... more rows ... Eastern Executive Management 1507580 Eastern Finance 3039180 ... more rows ... Southern Executive Management 1507580 Southern Finance 3039180 ... more rows ... Note that when the data is ordered this way, we would have to keep coming back to the Executive Management row multiple times to fill in the empty holes. In the old version of the crossTab utility, it would do this by using a complex series of maps to hold the data not being crosstabbed. This worked OK on very simple result sets, but absolutely fell apart once the same values appeared in the same row (like a 2 in one column and a 2 in another column), or when null values appeared in the data. Due to customer requirements, I needed to re-create the crosstab functionality, and I chose to do it a little more sensibly. Now, instead of maps of maps, I simply traverse the dataset twice. Once to gather all the new column headers, and then again to fill out the data. The down side to this approach is that the above data set could not be crosstabbed in that format. Crosstabbing the above format would result in data that looks like this: DEPARTMENT Central Eastern .... Executive Management 1776282 --- Finance 3106680 --- ... more rows ... Executive Management --- 1507580 Finance --- 3039180 ... more rows ... A completely useless crosstab. So, with this new function, a simple change needs to be made to the query as follows: select REGION, DEPARTMENT, SUM(ACTUAL) as ACTUAL from quadrant_actuals group by DEPARTMENT, REGION -- Switched the order order by DEPARTMENT, REGION -- Switched the order By simply making sure that the row to be crosstabbed is not the primary sort, the data comes out more sensibly for crosstabbing: REGION DEPARTMENT ACTUAL Central Executive Management 1776282 Eastern Executive Management 1507580 ... more rows ... Central Finance 3106680 Eastern Finance 3039180 ... more rows ... In other words, the remaining data that identifies a single row is grouped together. The other alternative is to pass in a uniqueRowIdentifierColumn. This column will be used to determine whether the row has been seen before. If it has, then it will grab the already written row, and update it. So, in the above example, you would specify the uniqueRowIdentifierColumn as column 1 (DEPARTMENT). Note that this doesn't perform as well since every row will result in a map lookup for this column. However, for XML resultsets, there may be no other way to accomplish what you need.
        Parameters:
        source -
        columnToPivot -
        measureColumn -
        columnToSortColumnsBy -
        pivotDataFormatter -
        sortDataFormatter -
        orderedMaps -
        uniqueRowIdentifierColumn -
        Returns:
      • crossTabOrdered

        public static IPentahoResultSet crossTabOrdered​(IPentahoResultSet source,
                                                        int columnToPivot,
                                                        int measureColumn,
                                                        int columnToSortColumnsBy,
                                                        Format pivotDataFormatter,
                                                        Format sortDataFormatter,
                                                        boolean orderedMaps,
                                                        int uniqueRowIdentifierColumn)