Modin Spreadsheets API#
Getting started#
Install Modin-spreadsheet using pip:
pip install "modin[spreadsheet]"
The following code snippet creates a spreadsheet using the FiveThirtyEight dataset on labor force information by college majors (licensed under CC BY 4.0):
import modin.pandas as pd
import modin.experimental.spreadsheet as mss
df = pd.read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/all-ages.csv')
spreadsheet = mss.from_dataframe(df)
spreadsheet
Basic Manipulations through User Interface#
The Spreadsheet API allows users to manipulate the DataFrame with simple graphical controls for sorting, filtering, and editing.
- Here are the instructions for each operation:
Sort: Click on the column header of the column to sort on.
Filter: Click on the filter button on the column header and apply the desired filter to the column. The filter dropdown changes depending on the type of the column. Multiple filters are automatically combined.
Edit Cell: Double click on a cell and enter the new value.
Add Rows: Click on the “Add Row” button in the toolbar to duplicate the last row in the DataFrame. The duplicated values provide a convenient default and can be edited as necessary.
Remove Rows: Select row(s) and click the “Remove Row” button. Select a single row by clicking on it. Multiple rows can be selected with Cmd+Click (Windows: Ctrl+Click) on the desired rows or with Shift+Click to specify a range of rows.
Some of these operations can also be done through the spreadsheet’s programmatic interface. Sorts and filters can be reset using the toolbar buttons. Edits and adding/removing rows can only be undone manually.
Virtual Rendering#
The spreadsheet will only render data based on the user’s viewport. This allows for quick rendering even on very large DataFrames because only a handful of rows are loaded at any given time. As a result, scrolling and viewing your data is smooth and responsive.
Transformation History and Exporting Code#
All operations on the spreadsheet are recorded and are easily exported as code for sharing or reproducibility. This history is automatically displayed in the history cell, which is generated below the spreadsheet whenever the spreadsheet widget is displayed. The history cell is displayed on default, but this can be turned off. Modin Spreadsheet API provides a few methods for interacting with the history:
SpreadsheetWidget.get_history(): Retrieves the transformation history in the form of reproducible code.
SpreadsheetWidget.filter_relevant_history(persist=True): Returns the transformation history that contains only code relevant to the final state of the spreadsheet. The persist parameter determines whether the internal state and the displayed history is also filtered.
SpreadsheetWidget.reset_history(): Clears the history of transformation.
Customizable Interface#
The spreadsheet widget provides a number of options that allows the user to change the appearance and the interactivity of the spreadsheet. Options include:
Row height/Column width
Preventing edits, sorts, or filters on the whole spreadsheet or on a per-column basis
Hiding the toolbar and history cell
Float precision
Highlighting of cells and rows
Viewport size
Converting Spreadsheets To and From Dataframes#
- modin.experimental.spreadsheet.general.from_dataframe(dataframe, show_toolbar=None, show_history=None, precision=None, grid_options=None, column_options=None, column_definitions=None, row_edit_callback=None)
Renders a DataFrame or Series as an interactive spreadsheet, represented by an instance of the
SpreadsheetWidget
class. TheSpreadsheetWidget
instance is constructed using the options passed in to this function. Thedataframe
argument to this function is used as thedf
kwarg in call to the SpreadsheetWidget constructor, and the rest of the parameters are passed through as is.If the
dataframe
argument is a Series, it will be converted to a DataFrame before being passed in to the SpreadsheetWidget constructor as thedf
kwarg.- Return type:
SpreadsheetWidget
- Parameters:
dataframe (DataFrame) – The DataFrame that will be displayed by this instance of SpreadsheetWidget.
grid_options (dict) – Options to use when creating the SlickGrid control (i.e. the interactive grid). See the Notes section below for more information on the available options, as well as the default options that this widget uses.
precision (integer) – The number of digits of precision to display for floating-point values. If unset, we use the value of pandas.get_option(‘display.precision’).
show_toolbar (bool) – Whether to show a toolbar with options for adding/removing rows. Adding/removing rows is an experimental feature which only works with DataFrames that have an integer index.
show_history (bool) – Whether to show the cell containing the spreadsheet transformation history.
column_options (dict) – Column options that are to be applied to every column. See the Notes section below for more information on the available options, as well as the default options that this widget uses.
column_definitions (dict) – Column options that are to be applied to individual columns. The keys of the dict should be the column names, and each value should be the column options for a particular column, represented as a dict. The available options for each column are the same options that are available to be set for all columns via the
column_options
parameter. See the Notes section below for more information on those options.row_edit_callback (callable) – A callable that is called to determine whether a particular row should be editable or not. Its signature should be
callable(row)
, whererow
is a dictionary which contains a particular row’s values, keyed by column name. The callback should return True if the provided row should be editable, and False otherwise.
Notes
The following dictionary is used for
grid_options
if none are provided explicitly:{ # SlickGrid options 'fullWidthRows': True, 'syncColumnCellResize': True, 'forceFitColumns': False, 'defaultColumnWidth': 150, 'rowHeight': 28, 'enableColumnReorder': False, 'enableTextSelectionOnCells': True, 'editable': True, 'autoEdit': False, 'explicitInitialization': True, # Modin-spreadsheet options 'maxVisibleRows': 15, 'minVisibleRows': 8, 'sortable': True, 'filterable': True, 'highlightSelectedCell': False, 'highlightSelectedRow': True }
The first group of options are SlickGrid “grid options” which are described in the SlickGrid documentation.
The second group of option are options that were added specifically for modin-spreadsheet and therefore are not documented in the SlickGrid documentation. The following bullet points describe these options.
maxVisibleRows The maximum number of rows that modin-spreadsheet will show.
minVisibleRows The minimum number of rows that modin-spreadsheet will show
sortable Whether the modin-spreadsheet instance will allow the user to sort columns by clicking the column headers. When this is set to
False
, nothing will happen when users click the column headers.filterable Whether the modin-spreadsheet instance will allow the user to filter the grid. When this is set to
False
the filter icons won’t be shown for any columns.highlightSelectedCell If you set this to True, the selected cell will be given a light blue border.
highlightSelectedRow If you set this to False, the light blue background that’s shown by default for selected rows will be hidden.
The following dictionary is used for
column_options
if none are provided explicitly:{ # SlickGrid column options 'defaultSortAsc': True, 'maxWidth': None, 'minWidth': 30, 'resizable': True, 'sortable': True, 'toolTip': "", 'width': None # Modin-spreadsheet column options 'editable': True, }
The first group of options are SlickGrid “column options” which are described in the SlickGrid documentation.
The
editable
option was added specifically for modin-spreadsheet and therefore is not documented in the SlickGrid documentation. This option specifies whether a column should be editable or not.See also
set_defaults
Permanently set global defaults for the parameters of
show_grid
, with the exception of thedataframe
andcolumn_definitions
parameters, since those depend on the particular set of data being shown by an instance, and therefore aren’t parameters we would want to set for all SpreadsheetWidget instances.set_grid_option
Permanently set global defaults for individual grid options. Does so by changing the defaults that the
show_grid
method uses for thegrid_options
parameter.SpreadsheetWidget
The widget class that is instantiated and returned by this method.
- modin.experimental.spreadsheet.general.to_dataframe(spreadsheet)
Get a copy of the DataFrame that reflects the current state of the
spreadsheet
SpreadsheetWidget instance UI. This includes any sorting or filtering changes, as well as edits that have been made by double clicking cells.- Return type:
- Parameters:
spreadsheet (SpreadsheetWidget) – The SpreadsheetWidget instance that DataFrame that will be displayed by this instance of SpreadsheetWidget.
Further API Documentation#
- class modin_spreadsheet.grid.SpreadsheetWidget(**kwargs: Any)
The widget class which is instantiated by the
show_grid
method. This class can be constructed directly but that’s not recommended because then default options have to be specified explicitly (since default options are normally provided by theshow_grid
method).The constructor for this class takes all the same parameters as
show_grid
, with one exception, which is that the requireddata_frame
parameter is replaced by an optional keyword argument calleddf
.See also
show_grid
The method that should be used to construct SpreadsheetWidget instances, because it provides reasonable defaults for all of the modin-spreadsheet options.
- df
Get/set the DataFrame that’s being displayed by the current instance. This DataFrame will NOT reflect any sorting/filtering/editing changes that are made via the UI. To get a copy of the DataFrame that does reflect sorting/filtering/editing changes, use the
get_changed_df()
method.- Type:
- grid_options
Get/set the grid options being used by the current instance.
- Type:
dict
- precision
Get/set the precision options being used by the current instance.
- Type:
integer
- show_toolbar
Get/set the show_toolbar option being used by the current instance.
- Type:
bool
- show_history
Get/set the show_history option being used by the current instance.
- Type:
bool
- column_options
Get/set the column options being used by the current instance.
- Type:
bool
- column_definitions
Get/set the column definitions (column-specific options) being used by the current instance.
- Type:
bool
- add_row(row=None)
Append a row at the end of the DataFrame. Values for the new row can be provided via the
row
argument, which is optional for DataFrames that have an integer index, and required otherwise. If therow
argument is not provided, the last row will be duplicated and the index of the new row will be the index of the last row plus one.- Parameters:
row (list (default: None)) – A list of 2-tuples of (column name, column value) that specifies the values for the new row.
See also
SpreadsheetWidget.remove_rows
The method for removing a row (or rows).
- change_grid_option(option_name, option_value)
Change a SlickGrid grid option without rebuilding the entire grid widget. Not all options are supported at this point so this method should be considered experimental.
- Parameters:
option_name (str) – The name of the grid option to be changed.
option_value (str) – The new value for the grid option.
- change_selection(rows=[])
Select a row (or rows) in the UI. The indices of the rows to select are provided via the optional
rows
argument.- Parameters:
rows (list (default: [])) – A list of indices of the rows to select. For a multi-indexed DataFrame, each index in the list should be a tuple, with each value in each tuple corresponding to a level of the MultiIndex. The default value of
[]
results in the no rows being selected (i.e. it clears the selection).
- edit_cell(index, column, value)
Edit a cell of the grid, given the index and column of the cell to edit, as well as the new value of the cell. Results in a
cell_edited
event being fired.- Parameters:
index (object) – The index of the row containing the cell that is to be edited.
column (str) – The name of the column containing the cell that is to be edited.
value (object) – The new value for the cell.
- get_changed_df()
Get a copy of the DataFrame that was used to create the current instance of SpreadsheetWidget which reflects the current state of the UI. This includes any sorting or filtering changes, as well as edits that have been made by double clicking cells.
- Return type:
- get_selected_df()
Get a DataFrame which reflects the current state of the UI and only includes the currently selected row(s). Internally it calls
get_changed_df()
and then filters down to the selected rows usingiloc
.- Return type:
- get_selected_rows()
Get the currently selected rows.
- Return type:
List of integers
- off(names, handler)
Remove a modin-spreadsheet event handler that was registered with the current instance’s
on
method.- Parameters:
names (list, str, All (default: All)) – The names of the events for which the specified handler should be uninstalled. If names is All, the specified handler is uninstalled from the list of notifiers corresponding to all events.
handler (callable) – A callable that was previously registered with the current instance’s
on
method.
See also
SpreadsheetWidget.on
The method for hooking up instance-level handlers that this
off
method can remove.
- on(names, handler)
Setup a handler to be called when a user interacts with the current instance.
- Parameters:
names (list, str, All) – If names is All, the handler will apply to all events. If a list of str, handler will apply to all events named in the list. If a str, the handler will apply just the event with that name.
handler (callable) – A callable that is called when the event occurs. Its signature should be
handler(event, spreadsheet_widget)
, whereevent
is a dictionary andspreadsheet_widget
is the SpreadsheetWidget instance that fired the event. Theevent
dictionary at least holds aname
key which specifies the name of the event that occurred.
Notes
Here’s the list of events that you can listen to on SpreadsheetWidget instances via the
on
method:[ 'cell_edited', 'selection_changed', 'viewport_changed', 'row_added', 'row_removed', 'filter_dropdown_shown', 'filter_changed', 'sort_changed', 'text_filter_viewport_changed', 'json_updated' ]
The following bullet points describe the events listed above in more detail. Each event bullet point is followed by sub-bullets which describe the keys that will be included in the
event
dictionary for each event.cell_edited The user changed the value of a cell in the grid.
index The index of the row that contains the edited cell.
column The name of the column that contains the edited cell.
old The previous value of the cell.
new The new value of the cell.
filter_changed The user changed the filter setting for a column.
column The name of the column for which the filter setting was changed.
filter_dropdown_shown The user showed the filter control for a column by clicking the filter icon in the column’s header.
column The name of the column for which the filter control was shown.
json_updated A user action causes SpreadsheetWidget to send rows of data (in json format) down to the browser. This happens as a side effect of certain actions such as scrolling, sorting, and filtering.
triggered_by The name of the event that resulted in rows of data being sent down to the browser. Possible values are
change_viewport
,change_filter
,change_sort
,add_row
,remove_row
, andedit_cell
.range A tuple specifying the range of rows that have been sent down to the browser.
row_added The user added a new row using the “Add Row” button in the grid toolbar.
index The index of the newly added row.
source The source of this event. Possible values are
api
(an api method call) andgui
(the grid interface).
row_removed The user added removed one or more rows using the “Remove Row” button in the grid toolbar.
indices The indices of the removed rows, specified as an array of integers.
source The source of this event. Possible values are
api
(an api method call) andgui
(the grid interface).
selection_changed The user changed which rows were highlighted in the grid.
old An array specifying the indices of the previously selected rows.
new The indices of the rows that are now selected, again specified as an array.
source The source of this event. Possible values are
api
(an api method call) andgui
(the grid interface).
sort_changed The user changed the sort setting for the grid.
old The previous sort setting for the grid, specified as a dict with the following keys:
column The name of the column that the grid was sorted by
ascending Boolean indicating ascending/descending order
new The new sort setting for the grid, specified as a dict with the following keys:
column The name of the column that the grid is currently sorted by
ascending Boolean indicating ascending/descending order
text_filter_viewport_changed The user scrolled the new rows into view in the filter dropdown for a text field.
column The name of the column whose filter dropdown is visible
old A tuple specifying the previous range of visible rows in the filter dropdown.
new A tuple specifying the range of rows that are now visible in the filter dropdown.
viewport_changed The user scrolled the new rows into view in the grid.
old A tuple specifying the previous range of visible rows.
new A tuple specifying the range of rows that are now visible.
The
event
dictionary for every type of event will contain aname
key specifying the name of the event that occurred. That key is excluded from the lists of keys above to avoid redundacy.See also
on
Same as the instance-level
on
method except it listens for events on all instances rather than on an individual SpreadsheetWidget instance.SpreadsheetWidget.off
Unhook a handler that was hooked up using the instance-level
on
method.
- remove_row(rows=None)
Alias for
remove_rows
, which is provided for convenience because this was the previous name of that method.
- remove_rows(rows=None)
Remove a row (or rows) from the DataFrame. The indices of the rows to remove can be provided via the optional
rows
argument. If therows
argument is not provided, the row (or rows) that are currently selected in the UI will be removed.- Parameters:
rows (list (default: None)) – A list of indices of the rows to remove from the DataFrame. For a multi-indexed DataFrame, each index in the list should be a tuple, with each value in each tuple corresponding to a level of the MultiIndex.
See also
SpreadsheetWidget.add_row
The method for adding a row.
SpreadsheetWidget.remove_row
Alias for this method.
- toggle_editable()
Change whether the grid is editable or not, without rebuilding the entire grid widget.