Package org.pentaho.commons.connection
Class PentahoDataTransmuter
- java.lang.Object
-
- org.pentaho.commons.connection.DataUtilities
-
- org.pentaho.commons.connection.PentahoDataTransmuter
-
- All Implemented Interfaces:
IPentahoDataTypes
public class PentahoDataTransmuter extends DataUtilities
- Author:
- wseyler Provides various methods to transmutes an IPentahoResultSet such that the resulting IPentahoResultSet dimensionality can be used for different purposes (ie creating a pie or bar chart)
-
-
Field Summary
Fields Modifier and Type Field Description protected IPentahoResultSetsourceResultSet-
Fields inherited from interface org.pentaho.commons.connection.IPentahoDataTypes
AXIS_COLUMN, AXIS_ROW, DATE_FORMAT, TYPE_BOOLEAN, TYPE_DATE, TYPE_DECIMAL, TYPE_DOUBLE, TYPE_FLOAT, TYPE_INT, TYPE_LONG, TYPE_STRING
-
-
Constructor Summary
Constructors Constructor Description PentahoDataTransmuter(IPentahoResultSet resultSet)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Integer[]columnNamesToIndexes(String[][] names)static Integer[]columnNamesToIndexes(IPentahoResultSet source, String[][] names)Returns an array of column Indexes that match the names parameterprotected static Object[][]constructColumnHeaders(IPentahoResultSet source, Integer rowForColumnHeaders)protected static Object[][]constructRowHeaders(IPentahoResultSet source, Integer columnForRowHeaders)Constructs the headers based on the rule as stated in the columnForRowHeaders param.static IPentahoResultSetcrossTab(IPentahoResultSet source, int columnToPivot, int measureColumn, int columnToSortColumnsBy, Format pivotDataFormatter, Format sortDataFormatter, boolean orderedMaps)This method takes a column of data, and turns it into multiple columns based on the values within the column.static IPentahoResultSetcrossTab(IPentahoResultSet source, int columnToPivot, int measureColumn, Format pivotDataFormatter)static IPentahoResultSetcrossTab(IPentahoResultSet source, int columnToPivot, int measureColumn, Format pivotDataFormatter, boolean orderedMaps)This method takes a column of data, and turns it into multiple columns based on the values within the column.static IPentahoResultSetcrossTabOrdered(IPentahoResultSet source, int columnToPivot, int measureColumn, int columnToSortColumnsBy, Format pivotDataFormatter, Format sortDataFormatter, boolean orderedMaps, int uniqueRowIdentifierColumn)static IPentahoResultSetcrossTabOrdered(IPentahoResultSet source, int columnToPivot, int measureColumn, Format pivotDataFormatter)Marc Batchelor Diatribe It's 6am, I've been up all night, but I gotta right this down.static IPentahoResultSetcrossTabOrdered(IPentahoResultSet source, int columnToPivot, int measureColumn, Format pivotDataFormatter, boolean orderedMaps)static Stringdump(IPentahoResultSet source)static Stringdump(IPentahoResultSet source, boolean useOrBar)static IPentahoResultSetflattenResultSet(IPentahoResultSet resultSet, int squashColumn)Flatten a resultset based on a particular column, new values for each row trigger a new row in the flattened resultset.String[]getCollapsedHeaders(int axis, char seperator)static String[]getCollapsedHeaders(int axis, IPentahoResultSet resultSet, char seperator)Returns a string array where each element represents the concatenations of the headers for a sing column.IPentahoResultSetgetSourceResultSet()IPentahoResultSetpivot()static IPentahoResultSetpivot(IPentahoResultSet resultSet)This rotates a IPentahoResultSet such that the row and column data are reversed and the data headers are also rotated. ie source.Integer[]rowNamesToIndexes(String[][] names)static Integer[]rowNamesToIndexes(IPentahoResultSet source, String[][] names)Returns an array of row Indexes taht match the names parameterprotected static IPentahoMetaDataswapAndPivotRowAndColumnHeaders(IPentahoMetaData metaData)IPentahoResultSettransmute(Integer[] rowsToInclude, Integer[] columnsToInclude, boolean pivot)IPentahoResultSettransmute(Integer columnForRowHeaders, Integer rowForColumnHeaders, boolean pivot)IPentahoResultSettransmute(Integer colForRowHeaders, Integer rowForColHeaders, Integer[] rowsToInclude, Integer[] columnsToInclude, boolean pivot)IPentahoResultSettransmute(String[][] rowsToInclude, String[][] columnsToInclude, boolean pivot)IPentahoResultSettransmute(String[] columnForRowHeaders, String[] rowForColumnHeaders, boolean pivot)IPentahoResultSettransmute(String[] columnForRowHeaders, String[] rowForColHeaders, String[][] rowsToInclude, String[][] columnsToInclude, boolean pivot)static IPentahoResultSettransmute(IPentahoResultSet source, boolean pivot)static IPentahoResultSettransmute(IPentahoResultSet source, Integer[] rowsToInclude, Integer[] columnsToInclude, boolean pivot)static IPentahoResultSettransmute(IPentahoResultSet source, Integer columnForRowHeaders, Integer rowForColumnHeaders, boolean pivot)static IPentahoResultSettransmute(IPentahoResultSet source, Integer columnForRowHeaders, Integer rowForColumnHeaders, Integer[] rowsToInclude, Integer[] columnsToInclude, boolean pivot)Returns a memory result set that represents a grid of data and it's associated headers.static IPentahoResultSettransmute(IPentahoResultSet source, String[][] rowsToInclude, String[][] columnsToInclude, boolean pivot)static IPentahoResultSettransmute(IPentahoResultSet source, String[] columnForRowHeaders, String[] rowForColumnHeaders, boolean pivot)static IPentahoResultSettransmute(IPentahoResultSet source, String[] columnForRowHeaders, String[] rowForColumnHeaders, String[][] rowsToInclude, String[][] columnsToInclude, boolean pivot)Returns a memory result set that represents a grid of data and it's associated headers.-
Methods inherited from class org.pentaho.commons.connection.DataUtilities
filterData, filterDataByColumns, filterDataByRows, getXMLString, pivotDimensions, toNumber, toNumbers
-
-
-
-
Field Detail
-
sourceResultSet
protected final IPentahoResultSet sourceResultSet
-
-
Constructor Detail
-
PentahoDataTransmuter
public PentahoDataTransmuter(IPentahoResultSet resultSet)
-
-
Method Detail
-
getSourceResultSet
public IPentahoResultSet getSourceResultSet()
- Returns:
- Returns the sourceResultSet.
-
transmute
public static IPentahoResultSet transmute(IPentahoResultSet source, boolean pivot)
-
transmute
public static IPentahoResultSet transmute(IPentahoResultSet source, Integer columnForRowHeaders, Integer rowForColumnHeaders, boolean pivot)
-
transmute
public IPentahoResultSet transmute(Integer columnForRowHeaders, Integer rowForColumnHeaders, boolean pivot)
-
transmute
public static IPentahoResultSet transmute(IPentahoResultSet source, String[] columnForRowHeaders, String[] rowForColumnHeaders, boolean pivot)
-
transmute
public IPentahoResultSet transmute(String[] columnForRowHeaders, String[] rowForColumnHeaders, boolean pivot)
-
transmute
public static IPentahoResultSet transmute(IPentahoResultSet source, Integer[] rowsToInclude, Integer[] columnsToInclude, boolean pivot)
-
transmute
public IPentahoResultSet transmute(Integer[] rowsToInclude, Integer[] columnsToInclude, boolean pivot)
-
transmute
public static IPentahoResultSet transmute(IPentahoResultSet source, String[][] rowsToInclude, String[][] columnsToInclude, boolean pivot)
-
transmute
public IPentahoResultSet transmute(String[][] rowsToInclude, String[][] columnsToInclude, boolean pivot)
-
transmute
public IPentahoResultSet transmute(String[] columnForRowHeaders, String[] rowForColHeaders, String[][] rowsToInclude, String[][] columnsToInclude, boolean pivot)
-
transmute
public static IPentahoResultSet transmute(IPentahoResultSet source, String[] columnForRowHeaders, String[] rowForColumnHeaders, String[][] rowsToInclude, String[][] columnsToInclude, boolean pivot)
Returns a memory result set that represents a grid of data and it's associated headers.- Parameters:
source- - The source IPentahoResultSetcolumnForRowHeaders- - a 0 based column number whose data will be used as the row headers. If null and rowHeaders exist in the source then the rowHeaders will be returned otherwise the first column (column 0) will be used for row headersrowForColumnHeaders- - a 0 based row number whose data will be used as the columnHeaders. If null and columnHeaders exist in the source then the columnHeaders will be returned. Otherwise the first row (row 0) will be used for the columnHeaderscolumnsToInclude- - An 2D String array where each row is a String[] representing a column header[] of a column to include in the result.rowsToInclude- - An 2D String array where each row is a String[] representing a row header[] of a row to include in the result.pivot- - pivot the entire IPentahoResultSet - NOTE this occurs last in the process. Other params referencing rows and colums are based on the source before it is pivoted.- Returns:
- - a copy of the result set containing the requested data.
-
transmute
public IPentahoResultSet transmute(Integer colForRowHeaders, Integer rowForColHeaders, Integer[] rowsToInclude, Integer[] columnsToInclude, boolean pivot)
-
transmute
public static IPentahoResultSet transmute(IPentahoResultSet source, Integer columnForRowHeaders, Integer rowForColumnHeaders, Integer[] rowsToInclude, Integer[] columnsToInclude, boolean pivot)
Returns a memory result set that represents a grid of data and it's associated headers.- Parameters:
source- - The source IPentahoResultSetcolumnForRowHeaders- - a 0 based column number whose data will be used as the row headers. If null and rowHeaders exist in the source then the rowHeaders will be returned otherwise the first column (column 0) will be used for row headersrowForColumnHeaders- - a 0 based row number whose data will be used as the columnHeaders. If null and columnHeaders exist in the source then the columnHeaders will be returned. Otherwise the first row (row 0) will be used for the columnHeaderscolumnsToInclude- - An integer array of columns to include in the result.rowToInclude- - An integer array of rows to include in the resultpivot- - pivot the entire IPentahoResultSet - NOTE this occurs last in the process. Other params referencing rows and colums are based on the source before it is pivoted.- Returns:
- - a copy of the result set containing the requested data.
-
pivot
public IPentahoResultSet pivot()
-
pivot
public static IPentahoResultSet pivot(IPentahoResultSet resultSet)
This rotates a IPentahoResultSet such that the row and column data are reversed and the data headers are also rotated. ie source. |Cheader1,0|CHeader1,1|CHeader1,2|Cheader1,3| |CHeader0,0|CHeader0,1|CHeader0,2|CHeader0,3| --------------------------------------------- |RHeader0,2|RHeader0,1|RHeader0,0 |Data0,0 |Data0,1 |Data0,2 |Data 0,3 | |RHeader1,2|RHeader1,1|RHeader1,0 |Data1,0 |Data1,1 |Data1,2 |Data 1,3 | |RHeader2,2|RHeader2,1|RHeader2,0 |Data2,0 |Data2,1 |Data2,2 |Data 2,3 | |RHeader3,2|RHeader3,1|RHeader3,0 |Data3,0 |Data3,1 |Data3,2 |Data 3,3 | |RHeader4,2|RHeader4,1|RHeader4,0 |Data4,0 |Data4,1 |Data4,2 |Data 4,3 | |RHeader5,2|RHeader5,1|RHeader5,0 |Data5,0 |Data5,1 |Data5,2 |Data 5,3 | becomes: |RHeader0,2|RHeader1,2|RHeader2,2|RHeader3,2|RHeader4,2|RHeader5,2| |RHeader0,1|RHeader1,1|RHeader2,1|RHeader3,1|RHeader4,1|RHeader5,1| |RHeader0,0|RHeader1,0|RHeader2,0|RHeader3,0|RHeader4,0|RHeader5,0| ------------------------------------------------------------------- |CHeader1,0|CHeader0,0 |Data0,0 |Data1,0 |Data2,0 |Data3,0 |Data4,0 |Data5,0 | |CHeader1,1|CHeader0,1 |Data0,1 |Data1,1 |Data2,1 |Data3,1 |Data4,1 |Data5,1 | |CHeader1,2|CHeader0,2 |Data0,2 |Data1,2 |Data2,2 |Data3,2 |Data4,2 |Data5,2 | |CHeader1,3|CHeader0,3 |Data0,3 |Data1,3 |Data2,3 |Data3,3 |Data4,3 |Data5,3 |- Parameters:
resultSet- and IPentahoResultSet on which to operate. Note that this parameter remains unchaged after this call. The original data is preserved- Returns:
- an IPentahoResultSet that represents the rotated IPentahoResultSet argumnent
-
rowNamesToIndexes
public static Integer[] rowNamesToIndexes(IPentahoResultSet source, String[][] names)
Returns an array of row Indexes taht match the names parameter- Parameters:
source- IPentahoResultSet to operate againsnames- a 2D string array[rows][columns] where each row is a string array to compare against the row headers- Returns:
- an array of Integers that represes the selected rows
-
columnNamesToIndexes
public static Integer[] columnNamesToIndexes(IPentahoResultSet source, String[][] names)
Returns an array of column Indexes that match the names parameter- Parameters:
source- IPentahoResultSet to operate againstnames- a 2D string array[rows][columns] where each row is a string array to compare against the column headers- Returns:
- an array of Integers that represents the selected columns
-
constructColumnHeaders
protected static Object[][] constructColumnHeaders(IPentahoResultSet source, Integer rowForColumnHeaders)
- Parameters:
source-rowForColumnHeaders-- Returns:
-
constructRowHeaders
protected static Object[][] constructRowHeaders(IPentahoResultSet source, Integer columnForRowHeaders)
Constructs the headers based on the rule as stated in the columnForRowHeaders param.- Parameters:
source- - The source IPentahoResultSetcolumnForRowHeaders- - a 0 based column number whose data will be used as the row headers. If null and rowHeaders exist in the source then the rowHeaders will be flattened and returned otherwise the first column (column 0) will be used for row headers- Returns:
- 2D array representing the manufactured rowHeaders
-
swapAndPivotRowAndColumnHeaders
protected static IPentahoMetaData swapAndPivotRowAndColumnHeaders(IPentahoMetaData metaData)
- Parameters:
metaData- source- Returns:
- IPentahoMetaData that contain the swapped and pivoted Row and ColumnHeaders
-
getCollapsedHeaders
public String[] getCollapsedHeaders(int axis, char seperator) throws Exception
- Throws:
Exception
-
getCollapsedHeaders
public static String[] getCollapsedHeaders(int axis, IPentahoResultSet resultSet, char seperator) throws Exception
Returns a string array where each element represents the concatenations of the headers for a sing column. Each concatenation is seperated by the parameter "seperator"- Parameters:
axis- row or column headersresultSet- to headers froma- character that represents the seprator between entities of a column header- Returns:
- a String array that represents a fully qualified column headers
- Throws:
Exception
-
flattenResultSet
public static IPentahoResultSet flattenResultSet(IPentahoResultSet resultSet, int squashColumn)
Flatten a resultset based on a particular column, new values for each row trigger a new row in the flattened resultset.- Returns:
- IPentahoResultSet
-
dump
public static String dump(IPentahoResultSet source)
-
dump
public static String dump(IPentahoResultSet source, boolean useOrBar)
-
crossTab
public static IPentahoResultSet crossTab(IPentahoResultSet source, int columnToPivot, int measureColumn, Format pivotDataFormatter)
-
crossTab
public static IPentahoResultSet crossTab(IPentahoResultSet source, int columnToPivot, int measureColumn, Format pivotDataFormatter, boolean orderedMaps)
This method takes a column of data, and turns it into multiple columns based on the values within the column. The measure column specified is then distributed among the newly created columns. Sparse data is handled by populating missing cells with nulls.- Parameters:
source- The starting IPentahoResultSetcolumnToPivot- The column that becomes multiple columnsmeasureColumn- The measures column to distribute to the new columns createdpivotDataFormatter- If the column to pivot requires formatting, this is the formatter to useorderedMaps- If true, will sort the new column names alphabetically. If false, the colums will be created in the order of appearance in the rows- Returns:
- IPentahoResultSet containing crosstabbed data.
-
crossTab
public static IPentahoResultSet crossTab(IPentahoResultSet source, int columnToPivot, int measureColumn, int columnToSortColumnsBy, Format pivotDataFormatter, Format sortDataFormatter, boolean orderedMaps)
This method takes a column of data, and turns it into multiple columns based on the values within the column. The measure column specified is then distributed among the newly created columns. Sparse data is handled by populating missing cells with nulls. This version of the method also takes two additional parameters - the column to sort the new columns by, and a formatter for that column.- Parameters:
source- The starting IPentahoResultSetcolumnToPivot- The column that becomes multiple columnsmeasureColumn- The measures column to distribute to the new columns createdcolumnToSortColumnsBy- The column to use to sort the newly created columns bypivotDataFormatter- If the column to pivot requires formatting, this is the formatter to usesortDataFormatter- The formatter to use to convert the sort column to a stringorderedMaps- If true, will sort the new column names alphabetically. If false, the colums will be created in the order of appearance in the rows.- Returns:
- IPentahoResultSet containing crosstabbed data.
-
crossTabOrdered
public static IPentahoResultSet crossTabOrdered(IPentahoResultSet source, int columnToPivot, int measureColumn, Format pivotDataFormatter)
Marc Batchelor Diatribe It's 6am, I've been up all night, but I gotta right this down. So, here goes. This crosstab function is similar to the "old" version exception that it works in many more cases than the old one did. The only requirement is that either the data is ordered by the left-over columns that will indicate that the row is a continuation of the previous row, or the method is handed a column that will be used to find unique rows in the result set. The old crosstab function used maps for each data element of a row. This only works in very certain conditions, but is extremely fragile. But, that approach allowed the same row of data to be spread out all over the result set. So, with better capability and performance comes a limitation. This can best be seen by attempting to crosstab a simple query from the sample data (and crosstabbing on REGION): select REGION, DEPARTMENT, SUM(ACTUAL) as ACTUAL from quadrant_actuals group by REGION, DEPARTMENT order by REGION, DEPARTMENT REGION DEPARTMENT ACTUAL Central Executive Management 1776282 Central Finance 3106680 ... more rows ... Eastern Executive Management 1507580 Eastern Finance 3039180 ... more rows ... Southern Executive Management 1507580 Southern Finance 3039180 ... more rows ... Note that when the data is ordered this way, we would have to keep coming back to the Executive Management row multiple times to fill in the empty holes. In the old version of the crossTab utility, it would do this by using a complex series of maps to hold the data not being crosstabbed. This worked OK on very simple result sets, but absolutely fell apart once the same values appeared in the same row (like a 2 in one column and a 2 in another column), or when null values appeared in the data. Due to customer requirements, I needed to re-create the crosstab functionality, and I chose to do it a little more sensibly. Now, instead of maps of maps, I simply traverse the dataset twice. Once to gather all the new column headers, and then again to fill out the data. The down side to this approach is that the above data set could not be crosstabbed in that format. Crosstabbing the above format would result in data that looks like this: DEPARTMENT Central Eastern .... Executive Management 1776282 --- Finance 3106680 --- ... more rows ... Executive Management --- 1507580 Finance --- 3039180 ... more rows ... A completely useless crosstab. So, with this new function, a simple change needs to be made to the query as follows: select REGION, DEPARTMENT, SUM(ACTUAL) as ACTUAL from quadrant_actuals group by DEPARTMENT, REGION -- Switched the order order by DEPARTMENT, REGION -- Switched the order By simply making sure that the row to be crosstabbed is not the primary sort, the data comes out more sensibly for crosstabbing: REGION DEPARTMENT ACTUAL Central Executive Management 1776282 Eastern Executive Management 1507580 ... more rows ... Central Finance 3106680 Eastern Finance 3039180 ... more rows ... In other words, the remaining data that identifies a single row is grouped together. The other alternative is to pass in a uniqueRowIdentifierColumn. This column will be used to determine whether the row has been seen before. If it has, then it will grab the already written row, and update it. So, in the above example, you would specify the uniqueRowIdentifierColumn as column 1 (DEPARTMENT). Note that this doesn't perform as well since every row will result in a map lookup for this column. However, for XML resultsets, there may be no other way to accomplish what you need.- Parameters:
source-columnToPivot-measureColumn-columnToSortColumnsBy-pivotDataFormatter-sortDataFormatter-orderedMaps-uniqueRowIdentifierColumn-- Returns:
-
crossTabOrdered
public static IPentahoResultSet crossTabOrdered(IPentahoResultSet source, int columnToPivot, int measureColumn, Format pivotDataFormatter, boolean orderedMaps)
-
crossTabOrdered
public static IPentahoResultSet crossTabOrdered(IPentahoResultSet source, int columnToPivot, int measureColumn, int columnToSortColumnsBy, Format pivotDataFormatter, Format sortDataFormatter, boolean orderedMaps, int uniqueRowIdentifierColumn)
-
-