wellmap.load

wellmap.load(toml_path, *, data_loader=None, merge_cols=None, path_guess=None, path_required=False, extras=False, report_dependencies=False, on_alert=None)[source]

Load a microplate layout from a TOML file.

Parse the given TOML file and return a pandas.DataFrame with a row for each well and a column for each experimental condition specified in that file. If the data_loader and merge_cols arguments are provided (which is the most typical use-case), that data frame will also contain columns for any data associated with each well.

Parameters
  • toml_path (str,pathlib.Path) – The path to a file describing the layout of one or more plates. See the File format page for details about this file.

  • data_loader (callable) – Indicates that load() should attempt to load the actual data associated with the plate layout, in addition to loading the layout itself. The argument should be a function that takes a pathlib.Path to a data file, parses it, and returns a pandas.DataFrame containing the parsed data. The function may also take an argument named “extras”, in which case the extras return value (described below) will be provided. Note that specifying a data loader implies that path_required is True.

  • merge_cols (bool,dict) –

    Indicates whether or not—and if so, how—load() should merge the data frames representing the plate layout and the actual data (provided by data_loader). The argument can either be a boolean or a dictionary:

    If False (or falsey, e.g. None, {}, etc.), the data frames will be returned separately and not be merged. This is the default behavior.

    If True, the data frames will be merged using any columns that share the same name. For example, the layout will always have a column named well, so if the actual data also has a column named well, the merge would happen on those columns.

    If a dictionary, the data frames will be merged using the columns identified in each key-value pair of the dictionary. The keys should be column names from the data frame representing the plate layout (described below; see the layout return value), and the values should be column names from the data frame returned by data_loader. Below are some examples of this argument:

    • {'well0': 'Well'}: Indicates that the “Well” column in the data contains zero-padded well names, like “A01”, “A02”, etc.

    • {'row_i': 'Row', 'col_j': 'Col'}: Indicates that the ‘Row’ and ‘Col’ columns in the data contain 0-indexed coordinates (e.g. 0, 1, 2, …) identifying each row and column, respectively.

    Some details and caveats:

    • In order to successfully merge two columns, the values in those columns must correspond exactly. For example, a column that contains unpadded well names like “A1” cannot be merged with a column that contains padded well names like “A01”. This is why the layout data frame contains so many redundant columns: to increase the chance that one will correspond exactly with a column provided by the data. In some cases, though, it may be necessary for the data_loader function to construct an appropriate merge column.

    • The data frame returned by data_loader() must be “tidy”. Briefly, a data frame is tidy if each of its columns represents a single variable (e.g. time, fluorescence) and each of its rows represents a single observation.

    • The path column of the layout is automatically included in the merge and never has to be specified (although it is not an error to do so). This is makes sense because load() itself knows what path each data frame was loaded from.

  • path_guess (str) – Where to look for a data file if none is specified in the given TOML file. In other words, this is the default value for meta.path. This path is interpreted relative to the TOML file itself (unless it’s an absolute path) and is formatted with a pathlib.Path representing said TOML file. In code, that would be: path_guess.format(Path(toml_path)). A typical value would be something like '{0.stem}.csv'.

  • path_required (bool) – Indicates whether or not the given TOML file must reference one or more data files. A ValueError will be raised if this condition is not met. Data files found via path_guess are acceptable for this purpose.

  • extras (bool) – If true, return a dictionary containing any key/value pairs present in the TOML file but not part of the layout. Typically, this would be used to get information pertaining to the whole analysis and not any wells in particular (e.g. instruments used, preferred algorithms, plotting parameters, etc.).

  • report_dependencies (bool) – If true, return a set of all the TOML files that were read in the process of loading the layout from the given toml_path. See the description of dependencies below for more details. You can use this information in analysis scripts (e.g. in conjunction with os.path.getmtime()) to avoid repeating expensive analyses if the underlying layout hasn’t changed.

  • on_alert (callable) – A callback to invoke if the given TOML file contains a warning for the user. The default behavior is to print the warning to the terminal via stderr. If a callback is provided, it must take two arguments: a pathlib.Path to the TOML file containing the alert, and the message itself. Note that this could be called more than once, e.g. if there are included or concatenated files.

Returns

If neither data_loader nor merge_cols were provided:

  • layout (pandas.DataFrame) – Information about the plate layout parsed from the given TOML file. The data frame will have a row for each well and a column for each experimental condition. In addition, there will be several columns identifying each well:

    • plate: The name of the plate for this well. This column will not be present if there are no [plate] blocks in the TOML file.

    • path: The path to the data file associated with the plate for this well. This column will not be present if no data files were referenced by the TOML file.

    • well: The name of the well, e.g. “A1”.

    • well0: The zero-padded name of the well, e.g. “A01”.

    • row: The name of the row for this well, e.g. “A”.

    • col: The name of the column for this well, e.g. “1”.

    • row_i: The row-index of this well, counting from 0.

    • col_j: The column-index of this well, counting from 0.

If data_loader was provided but merge_cols was not:

  • layout (pandas.DataFrame) – See above.

  • data (pandas.DataFrame) – The concatenated result of calling data_loader() on every path specified in the given TOML file. See pandas.concat() for more information on how the data from different paths are concatenated.

If data_loader and merge_cols were both provided:

  • merged (pandas.DataFrame) – The result of merging the layout and data data frames along the columns specified by merge_cols. See pandas.merge() for more details on the merge itself. The resulting data frame will have one or more rows for each well (more are possible if there are multiple data points per well, e.g. a time course), a column for each experimental condition described in the TOML file, and a column for each kind of data loaded from the data files.

If extras was provided:

  • extras – A dictionary containing any key/value pairs present in the TOML file but not part of the layout. For example, consider the following TOML file:

    a = 1
    b = 2
    well.A1.c = 3
    

    If we were to load this file with extras=True, this return value would be {'a': 1, 'b': 2}.

If report_dependencies was provided:

  • dependencies – A set containing absolute paths to every layout file that was referenced by toml_path. This includes toml_path itself, and the paths to any included or concatenated layout files. It does not include paths to data files, as these are included already in the path column of the layout or merged data frames.