ModinIndex#

Public API#

class modin.core.dataframe.pandas.metadata.index.ModinIndex(value=None, axis=None, dtypes=None)#

A class that hides the various implementations of the index needed for optimization.

Parameters:
  • value (sequence, PandasDataframe or callable() -> (pandas.Index, list of ints), optional) – If a sequence passed this will be considered as the index values. If a PandasDataframe passed then it will be used to lazily extract indices when required, note that the axis parameter must be passed in this case. If a callable passed then it’s expected to return a pandas Index and a list of partition lengths along the index axis. If None was passed, the index will be considered an incomplete and will raise a RuntimeError on an attempt of materialization. To complete the index object you have to use .maybe_specify_new_frame_ref() method.

  • axis (int, optional) – Specifies an axis the object represents, serves as an optional hint. This parameter must be passed in case value is a PandasDataframe.

  • dtypes (pandas.Series, optional) – Materialized dtypes of index levels.

compare_partition_lengths_if_possible(other: ModinIndex)#

Compare the partition lengths cache for the index being stored if possible.

The ModinIndex object may sometimes store the information about partition lengths along the axis the index belongs to. If both self and other have this information or it can be inferred from them, the method returns a boolean - the result of the comparison, otherwise it returns None as an indication that the comparison cannot be made.

Parameters:

other (ModinIndex) –

Returns:

The result of the comparison if both self and other contain the lengths data, None otherwise.

Return type:

bool or None

copy(copy_lengths=False) ModinIndex#

Copy an object without materializing the internal representation.

Parameters:

copy_lengths (bool, default: False) – Whether to copy the stored partition lengths to the new index object.

Return type:

ModinIndex

equals(other: ModinIndex) bool#

Check equality of the index values.

Parameters:

other (ModinIndex) –

Returns:

The result of the comparison.

Return type:

bool

get(return_lengths=False) Index#

Get the materialized internal representation.

Parameters:

return_lengths (bool, default: False) – In some cases, during the index calculation, it’s possible to get the lengths of the partitions. This flag allows this data to be used for optimization.

Return type:

pandas.Index

property is_materialized: bool#

Check if the internal representation is materialized.

Return type:

bool

classmethod is_materialized_index(index) bool#

Check if the passed object represents a materialized index.

Parameters:

index (object) – An object to check.

Return type:

bool

maybe_get_dtypes()#

Get index dtypes if available.

Return type:

pandas.Series or None

maybe_specify_new_frame_ref(value, axis) ModinIndex#

Set a new reference for a frame used to lazily extract index labels if it’s needed.

The method sets a new reference only if the indices are not yet materialized and if a PandasDataframe was originally passed to construct this index (so the ModinIndex object holds a reference to it). The reason the reference should be updated is that we don’t want to hold in memory those frames that are already not needed. Once the reference is updated, the old frame will be garbage collected if there are no more references to it.

Parameters:
  • value (PandasDataframe) – New dataframe to reference.

  • axis (int) – Axis to extract labels from.

Returns:

New ModinIndex with the reference updated.

Return type:

ModinIndex