Packages

package optimizer

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. case class Cost(card: BigInt, size: BigInt) extends Product with Serializable

    This class defines the cost model for a plan.

    This class defines the cost model for a plan.

    card

    Cardinality (number of rows).

    size

    Size in bytes.

  2. case class GetCurrentDatabase(catalogManager: CatalogManager) extends Rule[LogicalPlan] with Product with Serializable

    Replaces the expression of CurrentDatabase with the current database name.

  3. case class JoinGraphInfo(starJoins: Set[Int], nonStarJoins: Set[Int]) extends Product with Serializable

    Helper class that keeps information about the join graph as sets of item/plan ids.

    Helper class that keeps information about the join graph as sets of item/plan ids. It currently stores the star/non-star plans. It can be extended with the set of connected/unconnected plans.

  4. case class NormalizeNaNAndZero(child: Expression) extends UnaryExpression with ExpectsInputTypes with Product with Serializable
  5. abstract class Optimizer extends RuleExecutor[LogicalPlan]

    Abstract class all optimizers should inherit of, contains the standard batches (extending Optimizers can override this.

  6. case class OrderedJoin(left: LogicalPlan, right: LogicalPlan, joinType: JoinType, condition: Option[Expression]) extends BinaryNode with Product with Serializable

    This is a mimic class for a join node that has been ordered.

  7. class SimpleTestOptimizer extends Optimizer

Value Members

  1. object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper

    Simplifies boolean expressions: 1.

    Simplifies boolean expressions: 1. Simplifies expressions whose answer can be determined without evaluating both sides. 2. Eliminates / extracts common factors. 3. Merge same expressions 4. Removes Not operator.

  2. object CheckCartesianProducts extends Rule[LogicalPlan] with PredicateHelper

    Check if there any cartesian products between joins of any type in the optimized plan tree.

    Check if there any cartesian products between joins of any type in the optimized plan tree. Throw an error if a cartesian product is found without an explicit cross join specified. This rule is effectively disabled if the CROSS_JOINS_ENABLED flag is true.

    This rule must be run AFTER the ReorderJoin rule since the join conditions for each join must be collected before checking if it is a cartesian product. If you have SELECT * from R, S where R.r = S.s, the join between R and S is not a cartesian product and therefore should be allowed. The predicate R.r = S.s is not recognized as a join condition until the ReorderJoin rule.

    This rule must be run AFTER the batch "LocalRelation", since a join with empty relation should not be a cartesian product.

  3. object CollapseProject extends Rule[LogicalPlan]

    Combines two Project operators into one and perform alias substitution, merging the expressions into one single expression for the following cases.

    Combines two Project operators into one and perform alias substitution, merging the expressions into one single expression for the following cases. 1. When two Project operators are adjacent. 2. When two Project operators have LocalLimit/Sample/Repartition operator between them and the upper project consists of the same number of columns which is equal or aliasing. GlobalLimit(LocalLimit) pattern is also considered.

  4. object CollapseRepartition extends Rule[LogicalPlan]

    Combines adjacent RepartitionOperation operators

  5. object CollapseWindow extends Rule[LogicalPlan]

    Collapse Adjacent Window Expression.

    Collapse Adjacent Window Expression. - If the partition specs and order specs are the same and the window expression are independent and are of the same window function type, collapse into the parent.

  6. object ColumnPruning extends Rule[LogicalPlan]

    Attempts to eliminate the reading of unneeded columns from the query plan.

    Attempts to eliminate the reading of unneeded columns from the query plan.

    Since adding Project before Filter conflicts with PushPredicatesThroughProject, this rule will remove the Project p2 in the following pattern:

    p1 @ Project(_, Filter(_, p2 @ Project(_, child))) if p2.outputSet.subsetOf(p2.inputSet)

    p2 is usually inserted by this rule and useless, p1 could prune the columns anyway.

  7. object CombineConcats extends Rule[LogicalPlan]

    Combine nested Concat expressions.

  8. object CombineFilters extends Rule[LogicalPlan] with PredicateHelper

    Combines two adjacent Filter operators into one, merging the non-redundant conditions into one conjunctive predicate.

  9. object CombineLimits extends Rule[LogicalPlan]

    Combines two adjacent Limit operators into one, merging the expressions into one single expression.

  10. object CombineTypedFilters extends Rule[LogicalPlan]

    Combines two adjacent TypedFilters, which operate on same type object in condition, into one, merging the filter functions into one conjunctive function.

  11. object CombineUnions extends Rule[LogicalPlan]

    Combines all adjacent Union operators into a single Union.

  12. object ComputeCurrentTime extends Rule[LogicalPlan]

    Computes the current date and time to make sure we return the same result in a single query.

  13. object ConstantFolding extends Rule[LogicalPlan]

    Replaces Expressions that can be statically evaluated with equivalent Literal values.

  14. object ConstantPropagation extends Rule[LogicalPlan] with PredicateHelper

    Substitutes Attributes which can be statically evaluated with their corresponding value in conjunctive Expressions eg.

    Substitutes Attributes which can be statically evaluated with their corresponding value in conjunctive Expressions eg.

    SELECT * FROM table WHERE i = 5 AND j = i + 3
    ==>  SELECT * FROM table WHERE i = 5 AND j = 8

    Approach used: - Populate a mapping of attribute => constant value by looking at all the equals predicates - Using this mapping, replace occurrence of the attributes with the corresponding constant values in the AND node.

  15. object ConvertToLocalRelation extends Rule[LogicalPlan]

    Converts local operations (i.e.

    Converts local operations (i.e. ones that don't require data exchange) on LocalRelation to another LocalRelation.

  16. object CostBasedJoinReorder extends Rule[LogicalPlan] with PredicateHelper

    Cost-based join reorder.

    Cost-based join reorder. We may have several join reorder algorithms in the future. This class is the entry of these algorithms, and chooses which one to use.

  17. object DecimalAggregates extends Rule[LogicalPlan]

    Speeds up aggregates on fixed-precision decimals by executing them on unscaled Long values.

    Speeds up aggregates on fixed-precision decimals by executing them on unscaled Long values.

    This uses the same rules for increasing the precision and scale of the output as org.apache.spark.sql.catalyst.analysis.DecimalPrecision.

  18. object EliminateDistinct extends Rule[LogicalPlan]

    Remove useless DISTINCT for MAX and MIN.

    Remove useless DISTINCT for MAX and MIN. This rule should be applied before RewriteDistinctAggregates.

  19. object EliminateMapObjects extends Rule[LogicalPlan]

    Removes MapObjects when the following conditions are satisfied

    Removes MapObjects when the following conditions are satisfied

    1. Mapobject(... lambdavariable(..., false) ...), which means types for input and output are primitive types with non-nullable 2. no custom collection class specified representation of data item.
  20. object EliminateOuterJoin extends Rule[LogicalPlan] with PredicateHelper

    Elimination of outer joins, if the predicates can restrict the result sets so that all null-supplying rows are eliminated

    Elimination of outer joins, if the predicates can restrict the result sets so that all null-supplying rows are eliminated

    - full outer -> inner if both sides have such predicates - left outer -> inner if the right side has such predicates - right outer -> inner if the left side has such predicates - full outer -> left outer if only the left side has such predicates - full outer -> right outer if only the right side has such predicates

    This rule should be executed before pushing down the Filter

  21. object EliminateResolvedHint extends Rule[LogicalPlan]

    Replaces ResolvedHint operators from the plan.

    Replaces ResolvedHint operators from the plan. Move the HintInfo to associated Join operators, otherwise remove it if no Join operator is matched.

  22. object EliminateSerialization extends Rule[LogicalPlan]

    Removes cases where we are unnecessarily going between the object and serialized (InternalRow) representation of data item.

    Removes cases where we are unnecessarily going between the object and serialized (InternalRow) representation of data item. For example back to back map operations.

  23. object EliminateSorts extends Rule[LogicalPlan]

    Removes Sort operation.

    Removes Sort operation. This can happen: 1) if the sort order is empty or the sort order does not have any reference 2) if the child is already sorted 3) if there is another Sort operator separated by 0...n Project/Filter operators 4) if the Sort operator is within Join separated by 0...n Project/Filter operators only, and the Join conditions is deterministic 5) if the Sort operator is within GroupBy separated by 0...n Project/Filter operators only, and the aggregate function is order irrelevant

  24. object ExtractPythonUDFFromJoinCondition extends Rule[LogicalPlan] with PredicateHelper

    PythonUDF in join condition can't be evaluated if it refers to attributes from both join sides.

    PythonUDF in join condition can't be evaluated if it refers to attributes from both join sides. See ExtractPythonUDFs for details. This rule will detect un-evaluable PythonUDF and pull them out from join condition.

  25. object FoldablePropagation extends Rule[LogicalPlan]

    Replace attributes with aliases of the original foldable expressions if possible.

    Replace attributes with aliases of the original foldable expressions if possible. Other optimizations will take advantage of the propagated foldable expressions. For example, this rule can optimize

    SELECT 1.0 x, 'abc' y, Now() z ORDER BY x, y, 3

    to

    SELECT 1.0 x, 'abc' y, Now() z ORDER BY 1.0, 'abc', Now()

    and other rules can further optimize it and remove the ORDER BY operator.

  26. object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelper with ConstraintHelper

    Generate a list of additional filters from an operator's existing constraint but remove those that are either already part of the operator's condition or are part of the operator's child constraints.

    Generate a list of additional filters from an operator's existing constraint but remove those that are either already part of the operator's condition or are part of the operator's child constraints. These filters are currently inserted to the existing conditions in the Filter operators and on either side of Join operators.

    Note: While this optimization is applicable to a lot of types of join, it primarily benefits Inner and LeftSemi joins.

  27. object JoinReorderDP extends PredicateHelper with Logging

    Reorder the joins using a dynamic programming algorithm.

    Reorder the joins using a dynamic programming algorithm. This implementation is based on the paper: Access Path Selection in a Relational Database Management System. http://www.inf.ed.ac.uk/teaching/courses/adbs/AccessPath.pdf

    First we put all items (basic joined nodes) into level 0, then we build all two-way joins at level 1 from plans at level 0 (single items), then build all 3-way joins from plans at previous levels (two-way joins and single items), then 4-way joins ... etc, until we build all n-way joins and pick the best plan among them.

    When building m-way joins, we only keep the best plan (with the lowest cost) for the same set of m items. E.g., for 3-way joins, we keep only the best plan for items {A, B, C} among plans (A J B) J C, (A J C) J B and (B J C) J A. We also prune cartesian product candidates when building a new plan if there exists no join condition involving references from both left and right. This pruning strategy significantly reduces the search space. E.g., given A J B J C J D with join conditions A.k1 = B.k1 and B.k2 = C.k2 and C.k3 = D.k3, plans maintained for each level are as follows: level 0: p({A}), p({B}), p({C}), p({D}) level 1: p({A, B}), p({B, C}), p({C, D}) level 2: p({A, B, C}), p({B, C, D}) level 3: p({A, B, C, D}) where p({A, B, C, D}) is the final output plan.

    For cost evaluation, since physical costs for operators are not available currently, we use cardinalities and sizes to compute costs.

  28. object JoinReorderDPFilters extends PredicateHelper

    Implements optional filters to reduce the search space for join enumeration.

    Implements optional filters to reduce the search space for join enumeration.

    1) Star-join filters: Plan star-joins together since they are assumed to have an optimal execution based on their RI relationship. 2) Cartesian products: Defer their planning later in the graph to avoid large intermediate results (expanding joins, in general). 3) Composite inners: Don't generate "bushy tree" plans to avoid materializing intermediate results.

    Filters (2) and (3) are not implemented.

  29. object LikeSimplification extends Rule[LogicalPlan]

    Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition.

    Simplifies LIKE expressions that do not need full regular expressions to evaluate the condition. For example, when the expression is just checking to see if a string starts with a given pattern.

  30. object LimitPushDown extends Rule[LogicalPlan]

    Pushes down LocalLimit beneath UNION ALL and beneath the streamed inputs of outer joins.

  31. object NestedColumnAliasing

    This aims to handle a nested column aliasing pattern inside the ColumnPruning optimizer rule.

    This aims to handle a nested column aliasing pattern inside the ColumnPruning optimizer rule. If a project or its child references to nested fields, and not all the fields in a nested attribute are used, we can substitute them by alias attributes; then a project of the nested fields as aliases on the children of the child will be created.

  32. object NormalizeFloatingNumbers extends Rule[LogicalPlan]

    We need to take care of special floating numbers (NaN and -0.0) in several places:

    We need to take care of special floating numbers (NaN and -0.0) in several places:

    1. When compare values, different NaNs should be treated as same, -0.0 and 0.0 should be treated as same. 2. In aggregate grouping keys, different NaNs should belong to the same group, -0.0 and 0.0 should belong to the same group. 3. In join keys, different NaNs should be treated as same, -0.0 and 0.0 should be treated as same. 4. In window partition keys, different NaNs should belong to the same partition, -0.0 and 0.0 should belong to the same partition.

    Case 1 is fine, as we handle NaN and -0.0 well during comparison. For complex types, we recursively compare the fields/elements, so it's also fine.

    Case 2, 3 and 4 are problematic, as Spark SQL turns grouping/join/window partition keys into binary UnsafeRow and compare the binary data directly. Different NaNs have different binary representation, and the same thing happens for -0.0 and 0.0.

    This rule normalizes NaN and -0.0 in window partition keys, join keys and aggregate grouping keys.

    Ideally we should do the normalization in the physical operators that compare the binary UnsafeRow directly. We don't need this normalization if the Spark SQL execution engine is not optimized to run on binary data. This rule is created to simplify the implementation, so that we have a single place to do normalization, which is more maintainable.

    Note that, this rule must be executed at the end of optimizer, because the optimizer may create new joins(the subquery rewrite) and new join conditions(the join reorder).

  33. object NullPropagation extends Rule[LogicalPlan]

    Replaces Expressions that can be statically evaluated with equivalent Literal values.

    Replaces Expressions that can be statically evaluated with equivalent Literal values. This rule is more specific with Null value propagation from bottom to top of the expression tree.

  34. object ObjectSerializerPruning extends Rule[LogicalPlan]

    Prunes unnecessary object serializers from query plan.

    Prunes unnecessary object serializers from query plan. This rule prunes both individual serializer and nested fields in serializers.

  35. object OptimizeIn extends Rule[LogicalPlan]

    Optimize IN predicates: 1.

    Optimize IN predicates: 1. Converts the predicate to false when the list is empty and the value is not nullable. 2. Removes literal repetitions. 3. Replaces (value, seq[Literal]) with optimized version (value, HashSet[Literal]) which is much faster.

  36. object OptimizeLimitZero extends Rule[LogicalPlan]

    Replaces GlobalLimit 0 and LocalLimit 0 nodes (subtree) with empty Local Relation, as they don't return any rows.

  37. object PropagateEmptyRelation extends Rule[LogicalPlan] with PredicateHelper with CastSupport

    Collapse plans consisting empty local relations generated by PruneFilters.

    Collapse plans consisting empty local relations generated by PruneFilters. 1. Binary(or Higher)-node Logical Plans

    • Union with all empty children.
    • Join with one or two empty children (including Intersect/Except). 2. Unary-node Logical Plans
    • Project/Filter/Sample/Join/Limit/Repartition with all empty children.
    • Aggregate with all empty children and at least one grouping expression.
    • Generate(Explode) with all empty children. Others like Hive UDTF may return results.
  38. object PruneFilters extends Rule[LogicalPlan] with PredicateHelper

    Removes filters that can be evaluated trivially.

    Removes filters that can be evaluated trivially. This can be done through the following ways: 1) by eliding the filter for cases where it will always evaluate to true. 2) by substituting a dummy empty relation when the filter will always evaluate to false. 3) by eliminating the always-true conditions given the constraints on the child's output.

  39. object PullupCorrelatedPredicates extends Rule[LogicalPlan] with PredicateHelper

    Pull out all (outer) correlated predicates from a given subquery.

    Pull out all (outer) correlated predicates from a given subquery. This method removes the correlated predicates from subquery Filters and adds the references of these predicates to all intermediate Project and Aggregate clauses (if they are missing) in order to be able to evaluate the predicates at the top level.

    TODO: Look to merge this rule with RewritePredicateSubquery.

  40. object PushDownLeftSemiAntiJoin extends Rule[LogicalPlan] with PredicateHelper

    This rule is a variant of PushPredicateThroughNonJoin which can handle pushing down Left semi and Left Anti joins below the following operators.

    This rule is a variant of PushPredicateThroughNonJoin which can handle pushing down Left semi and Left Anti joins below the following operators. 1) Project 2) Window 3) Union 4) Aggregate 5) Other permissible unary operators. please see PushPredicateThroughNonJoin.canPushThrough.

  41. object PushDownPredicates extends Rule[LogicalPlan] with PredicateHelper

    The unified version for predicate pushdown of normal operators and joins.

    The unified version for predicate pushdown of normal operators and joins. This rule improves performance of predicate pushdown for cascading joins such as: Filter-Join-Join-Join. Most predicates can be pushed down in a single pass.

  42. object PushLeftSemiLeftAntiThroughJoin extends Rule[LogicalPlan] with PredicateHelper

    This rule is a variant of PushPredicateThroughJoin which can handle pushing down Left semi and Left Anti joins below a join operator.

    This rule is a variant of PushPredicateThroughJoin which can handle pushing down Left semi and Left Anti joins below a join operator. The allowable join types are: 1) Inner 2) Cross 3) LeftOuter 4) RightOuter

    TODO: Currently this rule can push down the left semi or left anti joins to either left or right leg of the child join. This matches the behaviour of PushPredicateThroughJoin when the lefi semi or left anti join is in expression form. We need to explore the possibility to push the left semi/anti joins to both legs of join if the join condition refers to both left and right legs of the child join.

  43. object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper

    Pushes down Filter operators where the condition can be evaluated using only the attributes of the left or right side of a join.

    Pushes down Filter operators where the condition can be evaluated using only the attributes of the left or right side of a join. Other Filter conditions are moved into the condition of the Join.

    And also pushes down the join filter, where the condition can be evaluated using only the attributes of the left or right side of sub query when applicable.

    Check https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior for more details

  44. object PushPredicateThroughNonJoin extends Rule[LogicalPlan] with PredicateHelper

    Pushes Filter operators through many operators iff: 1) the operator is deterministic 2) the predicate is deterministic and the operator will not change any of rows.

    Pushes Filter operators through many operators iff: 1) the operator is deterministic 2) the predicate is deterministic and the operator will not change any of rows.

    This heuristic is valid assuming the expression evaluation cost is minimal.

  45. object PushProjectionThroughUnion extends Rule[LogicalPlan] with PredicateHelper

    Pushes Project operator to both sides of a Union operator.

    Pushes Project operator to both sides of a Union operator. Operations that are safe to pushdown are listed as follows. Union: Right now, Union means UNION ALL, which does not de-duplicate rows. So, it is safe to pushdown Filters and Projections through it. Filter pushdown is handled by another rule PushDownPredicates. Once we add UNION DISTINCT, we will not be able to pushdown Projections.

  46. object ReassignLambdaVariableID extends Rule[LogicalPlan]

    Reassigns per-query unique IDs to LambdaVariables, whose original IDs are globally unique.

    Reassigns per-query unique IDs to LambdaVariables, whose original IDs are globally unique. This can help Spark to hit codegen cache more often and improve performance.

  47. object RemoveDispensableExpressions extends Rule[LogicalPlan]

    Removes nodes that are not necessary.

  48. object RemoveLiteralFromGroupExpressions extends Rule[LogicalPlan]

    Removes literals from group expressions in Aggregate, as they have no effect to the result but only makes the grouping key bigger.

  49. object RemoveNoopOperators extends Rule[LogicalPlan]

    Remove no-op operators from the query plan that do not make any modifications.

  50. object RemoveRedundantAliases extends Rule[LogicalPlan]

    Remove redundant aliases from a query plan.

    Remove redundant aliases from a query plan. A redundant alias is an alias that does not change the name or metadata of a column, and does not deduplicate it.

  51. object RemoveRepetitionFromGroupExpressions extends Rule[LogicalPlan]

    Removes repetition from group expressions in Aggregate, as they have no effect to the result but only makes the grouping key bigger.

  52. object ReorderAssociativeOperator extends Rule[LogicalPlan]

    Reorder associative integral-type operators and fold all constants into one.

  53. object ReorderJoin extends Rule[LogicalPlan] with PredicateHelper

    Reorder the joins and push all the conditions into join, so that the bottom ones have at least one condition.

    Reorder the joins and push all the conditions into join, so that the bottom ones have at least one condition.

    The order of joins will not be changed if all of them already have at least one condition.

    If star schema detection is enabled, reorder the star join plans based on heuristics.

  54. object ReplaceDeduplicateWithAggregate extends Rule[LogicalPlan]

    Replaces logical Deduplicate operator with an Aggregate operator.

  55. object ReplaceDistinctWithAggregate extends Rule[LogicalPlan]

    Replaces logical Distinct operator with an Aggregate operator.

    Replaces logical Distinct operator with an Aggregate operator.

    SELECT DISTINCT f1, f2 FROM t  ==>  SELECT f1, f2 FROM t GROUP BY f1, f2
  56. object ReplaceExceptWithAntiJoin extends Rule[LogicalPlan]

    Replaces logical Except operator with a left-anti Join operator.

    Replaces logical Except operator with a left-anti Join operator.

    SELECT a1, a2 FROM Tab1 EXCEPT SELECT b1, b2 FROM Tab2
    ==>  SELECT DISTINCT a1, a2 FROM Tab1 LEFT ANTI JOIN Tab2 ON a1<=>b1 AND a2<=>b2

    Note: 1. This rule is only applicable to EXCEPT DISTINCT. Do not use it for EXCEPT ALL. 2. This rule has to be done after de-duplicating the attributes; otherwise, the generated join conditions will be incorrect.

  57. object ReplaceExceptWithFilter extends Rule[LogicalPlan]

    If one or both of the datasets in the logical Except operator are purely transformed using Filter, this rule will replace logical Except operator with a Filter operator by flipping the filter condition of the right child.

    If one or both of the datasets in the logical Except operator are purely transformed using Filter, this rule will replace logical Except operator with a Filter operator by flipping the filter condition of the right child.

    SELECT a1, a2 FROM Tab1 WHERE a2 = 12 EXCEPT SELECT a1, a2 FROM Tab1 WHERE a1 = 5
    ==>  SELECT DISTINCT a1, a2 FROM Tab1 WHERE a2 = 12 AND (a1 is null OR a1 <> 5)

    Note: Before flipping the filter condition of the right node, we should: 1. Combine all it's Filter. 2. Update the attribute references to the left node; 3. Add a Coalesce(condition, False) (to take into account of NULL values in the condition).

  58. object ReplaceExpressions extends Rule[LogicalPlan]

    Finds all the expressions that are unevaluable and replace/rewrite them with semantically equivalent expressions that can be evaluated.

    Finds all the expressions that are unevaluable and replace/rewrite them with semantically equivalent expressions that can be evaluated. Currently we replace two kinds of expressions: 1) RuntimeReplaceable expressions 2) UnevaluableAggregate expressions such as Every, Some, Any, CountIf This is mainly used to provide compatibility with other databases. Few examples are: we use this to support "nvl" by replacing it with "coalesce". we use this to replace Every and Any with Min and Max respectively.

    TODO: In future, explore an option to replace aggregate functions similar to how RuntimeReplaceable does.

  59. object ReplaceIntersectWithSemiJoin extends Rule[LogicalPlan]

    Replaces logical Intersect operator with a left-semi Join operator.

    Replaces logical Intersect operator with a left-semi Join operator.

    SELECT a1, a2 FROM Tab1 INTERSECT SELECT b1, b2 FROM Tab2
    ==>  SELECT DISTINCT a1, a2 FROM Tab1 LEFT SEMI JOIN Tab2 ON a1<=>b1 AND a2<=>b2

    Note: 1. This rule is only applicable to INTERSECT DISTINCT. Do not use it for INTERSECT ALL. 2. This rule has to be done after de-duplicating the attributes; otherwise, the generated join conditions will be incorrect.

  60. object ReplaceNullWithFalseInPredicate extends Rule[LogicalPlan]

    A rule that replaces Literal(null, BooleanType) with FalseLiteral, if possible, in the search condition of the WHERE/HAVING/ON(JOIN) clauses, which contain an implicit Boolean operator "(search condition) = TRUE".

    A rule that replaces Literal(null, BooleanType) with FalseLiteral, if possible, in the search condition of the WHERE/HAVING/ON(JOIN) clauses, which contain an implicit Boolean operator "(search condition) = TRUE". The replacement is only valid when Literal(null, BooleanType) is semantically equivalent to FalseLiteral when evaluating the whole search condition.

    Please note that FALSE and NULL are not exchangeable in most cases, when the search condition contains NOT and NULL-tolerant expressions. Thus, the rule is very conservative and applicable in very limited cases.

    For example, Filter(Literal(null, BooleanType)) is equal to Filter(FalseLiteral).

    Another example containing branches is Filter(If(cond, FalseLiteral, Literal(null, _))); this can be optimized to Filter(If(cond, FalseLiteral, FalseLiteral)), and eventually Filter(FalseLiteral).

    Moreover, this rule also transforms predicates in all If expressions as well as branch conditions in all CaseWhen expressions, even if they are not part of the search conditions.

    For example, Project(If(And(cond, Literal(null)), Literal(1), Literal(2))) can be simplified into Project(Literal(2)).

  61. object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan]

    This rule rewrites correlated ScalarSubquery expressions into LEFT OUTER joins.

  62. object RewriteDistinctAggregates extends Rule[LogicalPlan]

    This rule rewrites an aggregate query with distinct aggregations into an expanded double aggregation in which the regular aggregation expressions and every distinct clause is aggregated in a separate group.

    This rule rewrites an aggregate query with distinct aggregations into an expanded double aggregation in which the regular aggregation expressions and every distinct clause is aggregated in a separate group. The results are then combined in a second aggregate.

    First example: query without filter clauses (in scala):

    val data = Seq(
      ("a", "ca1", "cb1", 10),
      ("a", "ca1", "cb2", 5),
      ("b", "ca1", "cb1", 13))
      .toDF("key", "cat1", "cat2", "value")
    data.createOrReplaceTempView("data")
    
    val agg = data.groupBy($"key")
      .agg(
        countDistinct($"cat1").as("cat1_cnt"),
        countDistinct($"cat2").as("cat2_cnt"),
        sum($"value").as("total"))

    This translates to the following (pseudo) logical plan:

    Aggregate(
       key = ['key]
       functions = [COUNT(DISTINCT 'cat1),
                    COUNT(DISTINCT 'cat2),
                    sum('value)]
       output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
      LocalTableScan [...]

    This rule rewrites this logical plan to the following (pseudo) logical plan:

    Aggregate(
       key = ['key]
       functions = [count(if (('gid = 1)) 'cat1 else null),
                    count(if (('gid = 2)) 'cat2 else null),
                    first(if (('gid = 0)) 'total else null) ignore nulls]
       output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
      Aggregate(
         key = ['key, 'cat1, 'cat2, 'gid]
         functions = [sum('value)]
         output = ['key, 'cat1, 'cat2, 'gid, 'total])
        Expand(
           projections = [('key, null, null, 0, cast('value as bigint)),
                          ('key, 'cat1, null, 1, null),
                          ('key, null, 'cat2, 2, null)]
           output = ['key, 'cat1, 'cat2, 'gid, 'value])
          LocalTableScan [...]

    Second example: aggregate function without distinct and with filter clauses (in sql):

     SELECT
       COUNT(DISTINCT cat1) as cat1_cnt,
       COUNT(DISTINCT cat2) as cat2_cnt,
       SUM(value) FILTER (WHERE id > 1) AS total
    FROM
      data
    GROUP BY
      key

    This translates to the following (pseudo) logical plan:

    Aggregate(
       key = ['key]
       functions = [COUNT(DISTINCT 'cat1),
                    COUNT(DISTINCT 'cat2),
                    sum('value) with FILTER('id > 1)]
       output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
      LocalTableScan [...]

    This rule rewrites this logical plan to the following (pseudo) logical plan:

    Aggregate(
       key = ['key]
       functions = [count(if (('gid = 1)) 'cat1 else null),
                    count(if (('gid = 2)) 'cat2 else null),
                    first(if (('gid = 0)) 'total else null) ignore nulls]
       output = ['key, 'cat1_cnt, 'cat2_cnt, 'total])
      Aggregate(
         key = ['key, 'cat1, 'cat2, 'gid]
         functions = [sum('value) with FILTER('id > 1)]
         output = ['key, 'cat1, 'cat2, 'gid, 'total])
        Expand(
           projections = [('key, null, null, 0, cast('value as bigint), 'id),
                          ('key, 'cat1, null, 1, null, null),
                          ('key, null, 'cat2, 2, null, null)]
           output = ['key, 'cat1, 'cat2, 'gid, 'value, 'id])
          LocalTableScan [...]

    The rule does the following things here: 1. Expand the data. There are three aggregation groups in this query:

    1. the non-distinct group; ii. the distinct 'cat1 group; iii. the distinct 'cat2 group. An expand operator is inserted to expand the child data for each group. The expand will null out all unused columns for the given group; this must be done in order to ensure correctness later on. Groups can by identified by a group id (gid) column added by the expand operator. 2. De-duplicate the distinct paths and aggregate the non-aggregate path. The group by clause of this aggregate consists of the original group by clause, all the requested distinct columns and the group id. Both de-duplication of distinct column and the aggregation of the non-distinct group take advantage of the fact that we group by the group id (gid) and that we have nulled out all non-relevant columns the given group. 3. Aggregating the distinct groups and combining this with the results of the non-distinct aggregation. In this step we use the group id to filter the inputs for the aggregate functions. The result of the non-distinct group are 'aggregated' by using the first operator, it might be more elegant to use the native UDAF merge mechanism for this in the future.

    This rule duplicates the input data by two or more times (# distinct groups + an optional non-distinct group). This will put quite a bit of memory pressure of the used aggregate and exchange operators. Keeping the number of distinct groups as low as possible should be priority, we could improve this in the current rule by applying more advanced expression canonicalization techniques.

  63. object RewriteExceptAll extends Rule[LogicalPlan]

    Replaces logical Except operator using a combination of Union, Aggregate and Generate operator.

    Replaces logical Except operator using a combination of Union, Aggregate and Generate operator.

    Input Query :

    SELECT c1 FROM ut1 EXCEPT ALL SELECT c1 FROM ut2

    Rewritten Query:

    SELECT c1
    FROM (
      SELECT replicate_rows(sum_val, c1)
        FROM (
          SELECT c1, sum_val
            FROM (
              SELECT c1, sum(vcol) AS sum_val
                FROM (
                  SELECT 1L as vcol, c1 FROM ut1
                  UNION ALL
                  SELECT -1L as vcol, c1 FROM ut2
               ) AS union_all
             GROUP BY union_all.c1
           )
         WHERE sum_val > 0
        )
    )
  64. object RewriteIntersectAll extends Rule[LogicalPlan]

    Replaces logical Intersect operator using a combination of Union, Aggregate and Generate operator.

    Replaces logical Intersect operator using a combination of Union, Aggregate and Generate operator.

    Input Query :

    SELECT c1 FROM ut1 INTERSECT ALL SELECT c1 FROM ut2

    Rewritten Query:

    SELECT c1
    FROM (
         SELECT replicate_row(min_count, c1)
         FROM (
              SELECT c1, If (vcol1_cnt > vcol2_cnt, vcol2_cnt, vcol1_cnt) AS min_count
              FROM (
                   SELECT   c1, count(vcol1) as vcol1_cnt, count(vcol2) as vcol2_cnt
                   FROM (
                        SELECT true as vcol1, null as , c1 FROM ut1
                        UNION ALL
                        SELECT null as vcol1, true as vcol2, c1 FROM ut2
                        ) AS union_all
                   GROUP BY c1
                   HAVING vcol1_cnt >= 1 AND vcol2_cnt >= 1
                   )
              )
          )
  65. object RewriteNonCorrelatedExists extends Rule[LogicalPlan]

    Rewrite non correlated exists subquery to use ScalarSubquery WHERE EXISTS (SELECT A FROM TABLE B WHERE COL1 > 10) will be rewritten to WHERE (SELECT 1 FROM (SELECT A FROM TABLE B WHERE COL1 > 10) LIMIT 1) IS NOT NULL

  66. object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper

    This rule rewrites predicate sub-queries into left semi/anti joins.

    This rule rewrites predicate sub-queries into left semi/anti joins. The following predicates are supported: a. EXISTS/NOT EXISTS will be rewritten as semi/anti join, unresolved conditions in Filter will be pulled out as the join conditions. b. IN/NOT IN will be rewritten as semi/anti join, unresolved conditions in the Filter will be pulled out as join conditions, value = selected column will also be used as join condition.

  67. object SimpleTestOptimizer extends SimpleTestOptimizer

    An optimizer used in test code.

    An optimizer used in test code.

    To ensure extendability, we leave the standard rules in the abstract optimizer rules, while specific rules go to the subclasses

  68. object SimplifyBinaryComparison extends Rule[LogicalPlan] with PredicateHelper with ConstraintHelper

    Simplifies binary comparisons with semantically-equal expressions: 1) Replace '<=>' with 'true' literal.

    Simplifies binary comparisons with semantically-equal expressions: 1) Replace '<=>' with 'true' literal. 2) Replace '=', '<=', and '>=' with 'true' literal if both operands are non-nullable. 3) Replace '<' and '>' with 'false' literal if both operands are non-nullable.

  69. object SimplifyCaseConversionExpressions extends Rule[LogicalPlan]

    Removes the inner case conversion expressions that are unnecessary because the inner conversion is overwritten by the outer one.

  70. object SimplifyCasts extends Rule[LogicalPlan]

    Removes Casts that are unnecessary because the input is already the correct type.

  71. object SimplifyConditionals extends Rule[LogicalPlan] with PredicateHelper

    Simplifies conditional expressions (if / case).

  72. object SimplifyExtractValueOps extends Rule[LogicalPlan]

    Simplify redundant CreateNamedStruct, CreateArray and CreateMap expressions.

  73. object StarSchemaDetection extends PredicateHelper

    Encapsulates star-schema detection logic.

  74. object TransposeWindow extends Rule[LogicalPlan]

    Transpose Adjacent Window Expressions.

    Transpose Adjacent Window Expressions. - If the partition spec of the parent Window expression is compatible with the partition spec of the child window expression, transpose them.

Ungrouped