package analysis
Provides a logical query plan Analyzer and supporting classes for performing analysis. Analysis consists of translating UnresolvedAttributes and UnresolvedRelations into fully typed objects using information in a schema Catalog.
- Alphabetic
- By Inheritance
- analysis
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
case class
AnalysisContext(catalogAndNamespace: Seq[String] = Nil, nestedViewDepth: Int = 0, relationCache: Map[Seq[String], LogicalPlan] = mutable.Map.empty) extends Product with Serializable
Provides a way to keep state during the analysis, this enables us to decouple the concerns of analysis environment from the catalog.
Provides a way to keep state during the analysis, this enables us to decouple the concerns of analysis environment from the catalog. The state that is kept here is per-query.
Note this is thread local.
- catalogAndNamespace
The catalog and namespace used in the view resolution. This overrides the current catalog and namespace when resolving relations inside views.
- nestedViewDepth
The nested depth in the view resolution, this enables us to limit the depth of nested views.
- relationCache
A mapping from qualified table names to resolved relations. This can ensure that the table is resolved only once if a table is used multiple times in a query.
- implicit class AnalysisErrorAt extends AnyRef
-
class
Analyzer extends RuleExecutor[LogicalPlan] with CheckAnalysis with LookupCatalog
Provides a logical query plan analyzer, which translates UnresolvedAttributes and UnresolvedRelations into fully typed objects using information in a SessionCatalog.
- class CannotReplaceMissingTableException extends AnalysisException
-
trait
CastSupport extends AnyRef
Mix-in trait for constructing valid Cast expressions.
-
trait
CheckAnalysis extends PredicateHelper
Throws user facing errors when passed invalid queries that fail to analyze.
-
class
DatabaseAlreadyExistsException extends NamespaceAlreadyExistsException
Thrown by a catalog when an item already exists.
Thrown by a catalog when an item already exists. The analyzer will rethrow the exception as an org.apache.spark.sql.AnalysisException with the correct position information.
- class FunctionAlreadyExistsException extends AnalysisException
-
trait
FunctionRegistry extends AnyRef
A catalog for looking up user defined functions, used by an Analyzer.
A catalog for looking up user defined functions, used by an Analyzer.
Note: 1) The implementation should be thread-safe to allow concurrent access. 2) the database name is always case-sensitive here, callers are responsible to format the database name w.r.t. case-sensitive config.
- case class GetColumnByOrdinal(ordinal: Int, dataType: DataType) extends LeafExpression with Unevaluable with NonSQLExpression with Product with Serializable
-
case class
MultiAlias(child: Expression, names: Seq[String]) extends UnaryExpression with NamedExpression with Unevaluable with Product with Serializable
Used to assign new names to Generator's output, such as hive udtf.
Used to assign new names to Generator's output, such as hive udtf. For example the SQL expression "stack(2, key, value, key, value) as (a, b)" could be represented as follows: MultiAlias(stack_function, Seq(a, b))
- child
the computation being performed
- names
the names to be associated with each output of computing child.
-
trait
MultiInstanceRelation extends AnyRef
A trait that should be mixed into query operators where a single instance might appear multiple times in a logical query plan.
A trait that should be mixed into query operators where a single instance might appear multiple times in a logical query plan. It is invalid to have multiple copies of the same attribute produced by distinct operators in a query tree as this breaks the guarantee that expression ids, which are used to differentiate attributes, are unique.
During analysis, operators that include this trait may be asked to produce a new version of itself with globally unique expression ids.
- trait NamedRelation extends LogicalPlan
- class NamespaceAlreadyExistsException extends AnalysisException
-
class
NoSuchDatabaseException extends NoSuchNamespaceException
Thrown by a catalog when an item cannot be found.
Thrown by a catalog when an item cannot be found. The analyzer will rethrow the exception as an org.apache.spark.sql.AnalysisException with the correct position information.
- class NoSuchFunctionException extends AnalysisException
- class NoSuchNamespaceException extends AnalysisException
- class NoSuchPartitionException extends AnalysisException
- class NoSuchPartitionsException extends AnalysisException
- class NoSuchPermanentFunctionException extends AnalysisException
- class NoSuchTableException extends AnalysisException
- class NoSuchTempFunctionException extends AnalysisException
- class PartitionAlreadyExistsException extends AnalysisException
- class PartitionsAlreadyExistException extends AnalysisException
-
class
ResolveCatalogs extends Rule[LogicalPlan] with LookupCatalog
Resolves catalogs from the multi-part identifiers in SQL statements, and convert the statements to the corresponding v2 commands if the resolved catalog is not the session catalog.
-
case class
ResolveHigherOrderFunctions(catalog: SessionCatalog) extends Rule[LogicalPlan] with Product with Serializable
Resolve a higher order functions from the catalog.
Resolve a higher order functions from the catalog. This is different from regular function resolution because lambda functions can only be resolved after the function has been resolved; so we need to resolve higher order function when all children are either resolved or a lambda function.
-
case class
ResolveInlineTables(conf: SQLConf) extends Rule[LogicalPlan] with CastSupport with Product with Serializable
An analyzer rule that replaces UnresolvedInlineTable with LocalRelation.
-
case class
ResolveLambdaVariables(conf: SQLConf) extends Rule[LogicalPlan] with Product with Serializable
Resolve the lambda variables exposed by a higher order functions.
Resolve the lambda variables exposed by a higher order functions.
This rule works in two steps: [1]. Bind the anonymous variables exposed by the higher order function to the lambda function's arguments; this creates named and typed lambda variables. The argument names are checked for duplicates and the number of arguments are checked during this step. [2]. Resolve the used lambda variables used in the lambda function's function expression tree. Note that we allow the use of variables from outside the current lambda, this can either be a lambda function defined in an outer scope, or a attribute in produced by the plan's child. If names are duplicate, the name defined in the most inner scope is used.
-
case class
ResolveTimeZone(conf: SQLConf) extends Rule[LogicalPlan] with Product with Serializable
Replace TimeZoneAwareExpression without timezone id by its copy with session local time zone.
-
case class
ResolvedNamespace(catalog: CatalogPlugin, namespace: Seq[String]) extends LeafNode with Product with Serializable
A plan containing resolved namespace.
-
case class
ResolvedStar(expressions: Seq[NamedExpression]) extends Star with Unevaluable with Product with Serializable
Represents all the resolved input attributes to a given relational operator.
Represents all the resolved input attributes to a given relational operator. This is used in the data frame DSL.
- expressions
Expressions to expand.
-
case class
ResolvedTable(catalog: TableCatalog, identifier: Identifier, table: Table) extends LeafNode with Product with Serializable
A plan containing resolved table.
-
case class
ResolvedView(identifier: Identifier) extends LeafNode with Product with Serializable
A plan containing resolved (temp) views.
-
type
Resolver = (String, String) ⇒ Boolean
Resolver should return true if the first string refers to the same entity as the second string.
Resolver should return true if the first string refers to the same entity as the second string. For example, by using case insensitive equality.
- class SimpleFunctionRegistry extends FunctionRegistry with Logging
-
abstract
class
Star extends LeafExpression with NamedExpression
Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...".
Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...". A Star gets automatically expanded during analysis.
-
class
SubstituteUnresolvedOrdinals extends Rule[LogicalPlan]
Replaces ordinal in 'order by' or 'group by' with UnresolvedOrdinal expression.
- class TableAlreadyExistsException extends AnalysisException
- class TempTableAlreadyExistsException extends TableAlreadyExistsException
-
trait
TypeCheckResult extends AnyRef
Represents the result of
Expression.checkInputDataTypes.Represents the result of
Expression.checkInputDataTypes. We will throwAnalysisExceptioninCheckAnalysisifisFailureis true. - trait TypeCoercionRule extends Rule[LogicalPlan] with Logging
-
case class
UnresolvedAlias(child: Expression, aliasFunc: Option[(Expression) ⇒ String] = None) extends UnaryExpression with NamedExpression with Unevaluable with Product with Serializable
Holds the expression that has yet to be aliased.
Holds the expression that has yet to be aliased.
- child
The computation that is needs to be resolved during analysis.
- aliasFunc
The function if specified to be called to generate an alias to associate with the result of computing child
-
case class
UnresolvedAttribute(nameParts: Seq[String]) extends Attribute with Unevaluable with Product with Serializable
Holds the name of an attribute that has yet to be resolved.
-
case class
UnresolvedDeserializer(deserializer: Expression, inputAttributes: Seq[Attribute] = Nil) extends UnaryExpression with Unevaluable with NonSQLExpression with Product with Serializable
Holds the deserializer expression and the attributes that are available during the resolution for it.
Holds the deserializer expression and the attributes that are available during the resolution for it. Deserializer expression is a special kind of expression that is not always resolved by children output, but by given attributes, e.g. the
keyDeserializerinMapGroupsshould be resolved bygroupingAttributesinstead of children output.- deserializer
The unresolved deserializer expression
- inputAttributes
The input attributes used to resolve deserializer expression, can be empty if we want to resolve deserializer by children output.
-
class
UnresolvedException[TreeType <: TreeNode[_]] extends TreeNodeException[TreeType]
Thrown when an invalid attempt is made to access a property of a tree that has yet to be fully resolved.
-
case class
UnresolvedExtractValue(child: Expression, extraction: Expression) extends BinaryExpression with Unevaluable with Product with Serializable
Extracts a value or values from an Expression
Extracts a value or values from an Expression
- child
The expression to extract value from, can be Map, Array, Struct or array of Structs.
- extraction
The expression to describe the extraction, can be key of Map, index of Array, field name of Struct.
- case class UnresolvedFunction(name: FunctionIdentifier, arguments: Seq[Expression], isDistinct: Boolean, filter: Option[Expression] = None) extends Expression with Unevaluable with Product with Serializable
-
case class
UnresolvedGenerator(name: FunctionIdentifier, children: Seq[Expression]) extends Expression with Generator with Product with Serializable
Represents an unresolved generator, which will be created by the parser for the org.apache.spark.sql.catalyst.plans.logical.Generate operator.
Represents an unresolved generator, which will be created by the parser for the org.apache.spark.sql.catalyst.plans.logical.Generate operator. The analyzer will resolve this generator.
-
case class
UnresolvedHaving(havingCondition: Expression, child: LogicalPlan) extends UnaryNode with Product with Serializable
Represents unresolved having clause, the child for it can be Aggregate, GroupingSets, Rollup and Cube.
Represents unresolved having clause, the child for it can be Aggregate, GroupingSets, Rollup and Cube. It is turned by the analyzer into a Filter.
-
case class
UnresolvedInlineTable(names: Seq[String], rows: Seq[Seq[Expression]]) extends LeafNode with Product with Serializable
An inline table that has not been resolved yet.
An inline table that has not been resolved yet. Once resolved, it is turned by the analyzer into a org.apache.spark.sql.catalyst.plans.logical.LocalRelation.
- names
list of column names
- rows
expressions for the data
-
case class
UnresolvedNamespace(multipartIdentifier: Seq[String]) extends LeafNode with Product with Serializable
Holds the name of a namespace that has yet to be looked up in a catalog.
Holds the name of a namespace that has yet to be looked up in a catalog. It will be resolved to ResolvedNamespace during analysis.
-
case class
UnresolvedOrdinal(ordinal: Int) extends LeafExpression with Unevaluable with NonSQLExpression with Product with Serializable
Represents unresolved ordinal used in order by or group by.
Represents unresolved ordinal used in order by or group by.
For example:
select a from table order by 1 select a from table group by 1
- ordinal
ordinal starts from 1, instead of 0
-
case class
UnresolvedRegex(regexPattern: String, table: Option[String], caseSensitive: Boolean) extends Star with Unevaluable with Product with Serializable
Represents all of the input attributes to a given relational operator, for example in "SELECT
(id)?+.+FROM ...".Represents all of the input attributes to a given relational operator, for example in "SELECT
(id)?+.+FROM ...".- table
an optional table that should be the target of the expansion. If omitted all tables' columns are produced.
-
case class
UnresolvedRelation(multipartIdentifier: Seq[String]) extends LeafNode with NamedRelation with Product with Serializable
Holds the name of a relation that has yet to be looked up in a catalog.
Holds the name of a relation that has yet to be looked up in a catalog.
- multipartIdentifier
table name
-
case class
UnresolvedStar(target: Option[Seq[String]]) extends Star with Unevaluable with Product with Serializable
Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...".
Represents all of the input attributes to a given relational operator, for example in "SELECT * FROM ...".
This is also used to expand structs. For example: "SELECT record.* from (SELECT struct(a,b,c) as record ...)
- target
an optional name that should be the target of the expansion. If omitted all targets' columns are produced. This can either be a table name or struct name. This is a list of identifiers that is the path of the expansion.
-
case class
UnresolvedSubqueryColumnAliases(outputColumnNames: Seq[String], child: LogicalPlan) extends UnaryNode with Product with Serializable
Aliased column names resolved by positions for subquery.
Aliased column names resolved by positions for subquery. We could add alias names for output columns in the subquery:
// Assign alias names for output columns SELECT col1, col2 FROM testData AS t(col1, col2);- outputColumnNames
the LogicalPlan on which this subquery column aliases apply.
- child
the logical plan of this subquery.
-
case class
UnresolvedTable(multipartIdentifier: Seq[String]) extends LeafNode with Product with Serializable
Holds the name of a table that has yet to be looked up in a catalog.
Holds the name of a table that has yet to be looked up in a catalog. It will be resolved to ResolvedTable during analysis.
-
case class
UnresolvedTableOrView(multipartIdentifier: Seq[String]) extends LeafNode with Product with Serializable
Holds the name of a table or view that has yet to be looked up in a catalog.
Holds the name of a table or view that has yet to be looked up in a catalog. It will be resolved to ResolvedTable or ResolvedView during analysis.
-
case class
UnresolvedTableValuedFunction(functionName: String, functionArgs: Seq[Expression], outputNames: Seq[String]) extends LeafNode with Product with Serializable
A table-valued function, e.g.
A table-valued function, e.g.
select id from range(10); // Assign alias names select t.a from range(10) t(a);
- functionName
name of this table-value function
- functionArgs
list of function arguments
- outputNames
alias names of function output columns. If these names given, an analyzer adds Project to rename the output columns.
-
case class
UnresolvedV2Relation(originalNameParts: Seq[String], catalog: TableCatalog, tableName: Identifier) extends LeafNode with NamedRelation with Product with Serializable
A variant of UnresolvedRelation which can only be resolved to a v2 relation (
DataSourceV2Relation), not v1 relation or temp view.A variant of UnresolvedRelation which can only be resolved to a v2 relation (
DataSourceV2Relation), not v1 relation or temp view.- originalNameParts
the original table identifier name parts before catalog is resolved.
- catalog
The catalog which the table should be looked up from.
- tableName
The name of the table to look up.
-
sealed
trait
ViewType extends AnyRef
ViewType is used to specify the expected view type when we want to create or replace a view in CreateViewStatement.
Value Members
- val caseInsensitiveResolution: (String, String) ⇒ Boolean
- val caseSensitiveResolution: (String, String) ⇒ Boolean
-
def
withPosition[A](t: TreeNode[_])(f: ⇒ A): A
Catches any AnalysisExceptions thrown by
fand attachest's position if any. - object AnalysisContext extends Serializable
-
object
CTESubstitution extends Rule[LogicalPlan]
Analyze WITH nodes and substitute child plan with CTE definitions.
-
object
CleanupAliases extends Rule[LogicalPlan]
Cleans up unnecessary Aliases inside the plan.
Cleans up unnecessary Aliases inside the plan. Basically we only need Alias as a top level expression in Project(project list) or Aggregate(aggregate expressions) or Window(window expressions). Notice that if an expression has other expression parameters which are not in its
children, e.g.RuntimeReplaceable, the transformation for Aliases in this rule can't work for those parameters. -
object
DecimalPrecision extends Rule[LogicalPlan] with TypeCoercionRule
Calculates and propagates precision for fixed-precision decimals.
Calculates and propagates precision for fixed-precision decimals. Hive has a number of rules for this based on the SQL standard and MS SQL: https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_Precision_Scale_Support.pdf https://msdn.microsoft.com/en-us/library/ms190476.aspx
In particular, if we have expressions e1 and e2 with precision/scale p1/s2 and p2/s2 respectively, then the following operations have the following precision / scale:
Operation Result Precision Result Scale ------------------------------------------------------------------------ e1 + e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) e1 - e2 max(s1, s2) + max(p1-s1, p2-s2) + 1 max(s1, s2) e1 * e2 p1 + p2 + 1 s1 + s2 e1 / e2 p1 - s1 + s2 + max(6, s1 + p2 + 1) max(6, s1 + p2 + 1) e1 % e2 min(p1-s1, p2-s2) + max(s1, s2) max(s1, s2) e1 union e2 max(s1, s2) + max(p1-s1, p2-s2) max(s1, s2)
When
spark.sql.decimalOperations.allowPrecisionLossis set to true, if the precision / scale needed are out of the range of available values, the scale is reduced up to 6, in order to prevent the truncation of the integer part of the decimals.To implement the rules for fixed-precision types, we introduce casts to turn them to unlimited precision, do the math on unlimited-precision numbers, then introduce casts back to the required fixed precision. This allows us to do all rounding and overflow handling in the cast-to-fixed-precision operator.
In addition, when mixing non-decimal types with decimals, we use the following rules: - BYTE gets turned into DECIMAL(3, 0) - SHORT gets turned into DECIMAL(5, 0) - INT gets turned into DECIMAL(10, 0) - LONG gets turned into DECIMAL(20, 0) - FLOAT and DOUBLE cause fixed-length decimals to turn into DOUBLE - Literals INT and LONG get turned into DECIMAL with the precision strictly needed by the value
-
object
EliminateEventTimeWatermark extends Rule[LogicalPlan]
Ignore event time watermark in batch query, which is only supported in Structured Streaming.
Ignore event time watermark in batch query, which is only supported in Structured Streaming. TODO: add this rule into analyzer rule list.
-
object
EliminateSubqueryAliases extends Rule[LogicalPlan]
Removes SubqueryAlias operators from the plan.
Removes SubqueryAlias operators from the plan. Subqueries are only required to provide scoping information for attributes and can be removed once analysis is complete.
-
object
EliminateUnions extends Rule[LogicalPlan]
Removes Union operators from the plan if it just has one child.
-
object
EliminateView extends Rule[LogicalPlan] with CastSupport
This rule has two goals:
This rule has two goals:
1. Removes View operators from the plan. The operator is respected till the end of analysis stage because we want to see which part of an analyzed logical plan is generated from a view.
2. Make sure that a view's child plan produces the view's output attributes. We try to wrap the child by: 1. Generate the
queryOutputby: 1.1. If the query column names are defined, map the column names to attributes in the child output by name(This is mostly for handling view queries like SELECT * FROM ..., the schema of the referenced table/view may change after the view has been created, so we have to save the output of the query toviewQueryColumnNames, and restore them during view resolution, in this way, we are able to get the correct view column ordering and omit the extra columns that we don't require); 1.2. Else set the child output attributes toqueryOutput. 2. Map thequeryOutputto view output by index, if the corresponding attributes don't match, try to up cast and alias the attribute inqueryOutputto the attribute in the view output. 3. Add a Project over the child, with the new output generated by the previous steps.Once reaches this rule, it means
CheckAnalysisdid necessary checks on number of columns between the view output and the child output or the query column names.CheckAnalysisalso checked the cast from the view's child to the Project is up-cast.This should be only done after the batch of Resolution, because the view attributes are not completely resolved during the batch of Resolution.
-
object
EmptyFunctionRegistry extends FunctionRegistry
A trivial catalog that returns an error when a function is requested.
A trivial catalog that returns an error when a function is requested. Used for testing when all functions are already filled in and the analyzer needs only to resolve attribute references.
- object FakeV2SessionCatalog extends TableCatalog
- object FunctionRegistry
-
object
GlobalTempView extends ViewType
GlobalTempView means cross-session global temporary views.
GlobalTempView means cross-session global temporary views. Its lifetime is the lifetime of the Spark application, i.e. it will be automatically dropped when the application terminates. It's tied to a system preserved database
global_temp, and we must use the qualified name to refer a global temp view, e.g. SELECT * FROM global_temp.view1. -
object
HintErrorLogger extends HintErrorHandler with Logging
The hint error handler that logs warnings for each hint error.
-
object
LocalTempView extends ViewType
LocalTempView means session-scoped local temporary views.
LocalTempView means session-scoped local temporary views. Its lifetime is the lifetime of the session that created it, i.e. it will be automatically dropped when the session terminates. It's not tied to any databases, i.e. we can't use
db1.view1to reference a local temporary view. -
object
PersistedView extends ViewType
PersistedView means cross-session persisted views.
PersistedView means cross-session persisted views. Persisted views stay until they are explicitly dropped by user command. It's always tied to a database, default to the current database if not specified.
Note that, Existing persisted view with the same name are not visible to the current session while the local temporary view exists, unless the view name is qualified by database.
-
object
ResolveCreateNamedStruct extends Rule[LogicalPlan]
Resolve a CreateNamedStruct if it contains NamePlaceholders.
-
object
ResolveHints
Collection of rules related to hints.
Collection of rules related to hints. The only hint currently available is join strategy hint.
Note that this is separately into two rules because in the future we might introduce new hint rules that have different ordering requirements from join strategies.
-
object
ResolveTableValuedFunctions extends Rule[LogicalPlan]
Rule that resolves table-valued function references.
-
object
SimpleAnalyzer extends Analyzer
A trivial Analyzer with a dummy SessionCatalog and EmptyFunctionRegistry.
A trivial Analyzer with a dummy SessionCatalog and EmptyFunctionRegistry. Used for testing when all relations are already filled in and the analyzer needs only to resolve attribute references.
-
object
StreamingJoinHelper extends PredicateHelper with Logging
Helper object for stream joins.
Helper object for stream joins. See StreamingSymmetricHashJoinExec in SQL for more details.
- object TableOutputResolver
-
object
TimeWindowing extends Rule[LogicalPlan]
Maps a time column to multiple time windows using the Expand operator.
Maps a time column to multiple time windows using the Expand operator. Since it's non-trivial to figure out how many windows a time column can map to, we over-estimate the number of windows and filter out the rows where the time column is not inside the time window.
- object TypeCheckResult
-
object
TypeCoercion
A collection of Rule that can be used to coerce differing types that participate in operations into compatible ones.
A collection of Rule that can be used to coerce differing types that participate in operations into compatible ones.
Notes about type widening / tightest common types: Broadly, there are two cases when we need to widen data types (e.g. union, binary comparison). In case 1, we are looking for a common data type for two or more data types, and in this case no loss of precision is allowed. Examples include type inference in JSON (e.g. what's the column's data type if one row is an integer while the other row is a long?). In case 2, we are looking for a widened data type with some acceptable loss of precision (e.g. there is no common type for double and decimal because double's range is larger than decimal, and yet decimal is more precise than double, but in union we would cast the decimal into double).
- object UnresolvedAttribute extends Serializable
- object UnresolvedFunction extends Serializable
- object UnresolvedRelation extends Serializable
-
object
UnsupportedOperationChecker extends Logging
Analyzes the presence of unsupported operations in a logical plan.
-
object
UpdateAttributeNullability extends Rule[LogicalPlan]
Updates nullability of Attributes in a resolved LogicalPlan by using the nullability of corresponding Attributes of its children output Attributes.
Updates nullability of Attributes in a resolved LogicalPlan by using the nullability of corresponding Attributes of its children output Attributes. This step is needed because users can use a resolved AttributeReference in the Dataset API and outer joins can change the nullability of an AttribtueReference. Without this rule, a nullable column's nullable field can be actually set as non-nullable, which cause illegal optimization (e.g., NULL propagation) and wrong answers. See SPARK-13484 and SPARK-13801 for the concrete queries of this case.
-
object
UpdateOuterReferences extends Rule[LogicalPlan]
The aggregate expressions from subquery referencing outer query block are pushed down to the outer query block for evaluation.
The aggregate expressions from subquery referencing outer query block are pushed down to the outer query block for evaluation. This rule below updates such outer references as AttributeReference referring attributes from the parent/outer query block.
For example (SQL):
SELECT l.a FROM l GROUP BY 1 HAVING EXISTS (SELECT 1 FROM r WHERE r.d < min(l.b))
Plan before the rule. Project [a#226] +- Filter exists#245 [min(b#227)#249] : +- Project [1 AS 1#247] : +- Filter (d#238 < min(outer(b#227))) <----- : +- SubqueryAlias r : +- Project [_1#234 AS c#237, _2#235 AS d#238] : +- LocalRelation [_1#234, _2#235] +- Aggregate [a#226], [a#226, min(b#227) AS min(b#227)#249] +- SubqueryAlias l +- Project [_1#223 AS a#226, _2#224 AS b#227] +- LocalRelation [_1#223, _2#224] Plan after the rule. Project [a#226] +- Filter exists#245 [min(b#227)#249] : +- Project [1 AS 1#247] : +- Filter (d#238 < outer(min(b#227)#249)) <----- : +- SubqueryAlias r : +- Project [_1#234 AS c#237, _2#235 AS d#238] : +- LocalRelation [_1#234, _2#235] +- Aggregate [a#226], [a#226, min(b#227) AS min(b#227)#249] +- SubqueryAlias l +- Project [_1#223 AS a#226, _2#224 AS b#227] +- LocalRelation [_1#223, _2#224]