Basic usage with python

The following steps show how to get started with wellmap in python:

  1. Install wellmap from PyPI. Note that python≥3.6 is required:

    $ pip install wellmap
    
  2. Write a TOML file describing the layout of an experiment. For example, the following layout might be used for a standard curve:

    # The [row] and [col] sections specify which conditions are being tested in 
    # which wells.  The fields within these sections (e.g. `dilution`, `replicate`) 
    # can be anything.  If your plates aren't organized by row and column, there 
    # are other ways to define the plate layout; see the "File format" section for 
    # more details.
    
    [col]
    1.dilution = 1e5
    2.dilution = 1e4
    3.dilution = 1e3
    4.dilution = 1e2
    5.dilution = 1e1
    6.dilution = 1e0
    
    [row]
    A.replicate = 1
    B.replicate = 2
    C.replicate = 3
    
  3. Confirm that the layout is correct by using the wellmap command-line program to produce a visualization of the layout. This is an important step, because it’s much easier to spot mistakes in the visualization than in the layout file itself.

    $ wellmap std_curve.toml
    

    This map shows that:

    • Each row is a different replicate.

    • Each column is a different dilution.

    _images/std_curve_map.svg

    It is also possible to create maps like this directly from python, which may be useful in interactive sessions such as Jupyter notebooks:

    >>> import wellmap
    >>> wellmap.show("std_curve.toml")
    <Figure size ... with 4 Axes>
    

    Note

    Jupyter will display the layout twice when you use the show function, unless you put a semicolon after it. See #19 for more information.

  4. Load the data from the experiment in question into a tidy data frame. Tidy data are easier to work with in general, and are required by wellmap in particular. If you aren’t familiar with the concept of tidy data, this article is a good introduction. The basic idea is to ensure that:

    • Each variable is represented by a single column.

    • Each observation is represented by a single row.

    If possible, it’s best to export data from the instrument that collected it directly to a tidy format. When this isn’t possible, though, you’ll need to tidy the data yourself. For example, consider the following data (which corresponds to the layout from above). This is qPCR data, where a higher \(C_q\) value indicates that less material is present. The data are shaped like the plate itself, e.g. a row in the data for every row on the plate, and a column in the data for every column on the plate. It’s not uncommon for microplate instruments to export data in this format.

    std_curve.csv

    Cq

    1

    2

    3

    4

    5

    6

    A

    24.180858612060547

    20.74011993408203

    17.183801651000977

    13.774299621582031

    10.29498291015625

    6.967061996459961

    B

    24.15711784362793

    20.77970314025879

    17.171794891357422

    13.768831253051758

    10.362966537475586

    6.870273113250732

    C

    24.238229751586914

    20.78700828552246

    17.147598266601563

    13.779314041137695

    10.292966842651367

    6.735703945159912

    Below is the code to load this data into a tidy pandas.DataFrame with the following columns:

    • row: A letter identifying a row on the microplate, e.g. A-H

    • col: A number identifying a column on the microplate, e.g. 1-12

    • Cq: The \(C_q\) value measured for the identified well.

    >>> import pandas as pd
    >>> def load_cq(path):
    ...     return (pd
    ...             .read_csv(path)
    ...             .rename(columns={'Cq': 'row'})
    ...             .melt(
    ...                     id_vars=['row'],
    ...                     var_name='col',
    ...                     value_name='Cq',
    ...             )
    ...     )
    >>> data = load_cq('std_curve.csv')
    >>> data
       row col         Cq
    0    A   1  24.180859
    1    B   1  24.157118
    2    C   1  24.238230
    3    A   2  20.740120
    4    B   2  20.779703
    5    C   2  20.787008
    6    A   3  17.183802
    7    B   3  17.171795
    8    C   3  17.147598
    9    A   4  13.774300
    10   B   4  13.768831
    11   C   4  13.779314
    12   A   5  10.294983
    13   B   5  10.362967
    14   C   5  10.292967
    15   A   6   6.967062
    16   B   6   6.870273
    17   C   6   6.735704
    
  5. Use wellmap.load() to associate the labels specified in the TOML file (e.g. the dilutions and replicates) with the experimental data (e.g. the \(C_q\) values). This process has three steps:

    • Load a data frame containing the data (see above).

    • Load another data frame containing the labels.

    • Merge the two data frames.

    For the sake of clarity and completeness, we will first show how to perform these steps manually. Practically, though, it’s easier to let wellmap perform them automatically.

    Manual merge

    Use the wellmap.load() function to create a pandas.DataFrame containing the information from the TOML file. This data frame will have columns for each label we specified: replicate, dilution. It will also have six columns identifying the wells in different ways: well, well0, row, col, row_i, col_j. These columns are redundant, but this redundancy makes it easier to merge the labels with the data. For example, if the wells are named “A1,A2,…” in the data, the well column can be used for the merge. If the wells are named “A01,A02,…”, the well0 column can be used instead. If the wells are named in some non-standard way, the row_i and col_j columns can be used to calculate an appropriate merge column.

    >>> import wellmap
    >>> labels = wellmap.load('std_curve.toml')
    >>> labels
       well well0 row col  row_i  col_j  replicate  dilution
    0    A1   A01   A   1      0      0          1  100000.0
    1    A2   A02   A   2      0      1          1   10000.0
    2    A3   A03   A   3      0      2          1    1000.0
    3    A4   A04   A   4      0      3          1     100.0
    4    A5   A05   A   5      0      4          1      10.0
    5    A6   A06   A   6      0      5          1       1.0
    6    B1   B01   B   1      1      0          2  100000.0
    7    B2   B02   B   2      1      1          2   10000.0
    8    B3   B03   B   3      1      2          2    1000.0
    9    B4   B04   B   4      1      3          2     100.0
    10   B5   B05   B   5      1      4          2      10.0
    11   B6   B06   B   6      1      5          2       1.0
    12   C1   C01   C   1      2      0          3  100000.0
    13   C2   C02   C   2      2      1          3   10000.0
    14   C3   C03   C   3      2      2          3    1000.0
    15   C4   C04   C   4      2      3          3     100.0
    16   C5   C05   C   5      2      4          3      10.0
    17   C6   C06   C   6      2      5          3       1.0
    

    Use the pandas.merge() function to associate the labels with the data. In this case, both data frames have columns named row and col, so pandas will automatically use those for the merge. It is also easy to merge using columns with different names; see the documentation on pandas.merge() for more information.

    >>> import pandas as pd
    >>> df = pd.merge(labels, data)
    >>> df
       well well0 row col  row_i  col_j  replicate  dilution         Cq
    0    A1   A01   A   1      0      0          1  100000.0  24.180859
    1    A2   A02   A   2      0      1          1   10000.0  20.740120
    2    A3   A03   A   3      0      2          1    1000.0  17.183802
    3    A4   A04   A   4      0      3          1     100.0  13.774300
    4    A5   A05   A   5      0      4          1      10.0  10.294983
    5    A6   A06   A   6      0      5          1       1.0   6.967062
    6    B1   B01   B   1      1      0          2  100000.0  24.157118
    7    B2   B02   B   2      1      1          2   10000.0  20.779703
    8    B3   B03   B   3      1      2          2    1000.0  17.171795
    9    B4   B04   B   4      1      3          2     100.0  13.768831
    10   B5   B05   B   5      1      4          2      10.0  10.362967
    11   B6   B06   B   6      1      5          2       1.0   6.870273
    12   C1   C01   C   1      2      0          3  100000.0  24.238230
    13   C2   C02   C   2      2      1          3   10000.0  20.787008
    14   C3   C03   C   3      2      2          3    1000.0  17.147598
    15   C4   C04   C   4      2      3          3     100.0  13.779314
    16   C5   C05   C   5      2      4          3      10.0  10.292967
    17   C6   C06   C   6      2      5          3       1.0   6.735704
    

    Automatic merge

    While it’s good to understand how the labels are merged with the data, it’s better to let wellmap perform the merge for you. Not only is this more succinct, it also handles some tricky corner cases behind the scenes, e.g. layouts with multiple data files.

    To load and merge the data using wellmap.load(), you need to provide the following arguments:

    • data_loader: A function that accepts a path to a file and returns a pandas.DataFrame containing the data from that file. Note that the function we wrote in the previous section fulfills these requirements. If the raw data are tidy to begin with, it is often possible to directly use pandas.read_csv() or similar for this argument.

    • merge_cols: An indication of which columns to merge. In the snippet below, True means to use any columns that are shared between the two data frames (e.g. that have the same name). You can also use a dictionary to be more explicit about which columns to merge on.

    Here we also provide the path_guess argument, which specifies that the experimental data can be found in a CSV file with the same base name as the layout. It also would’ve been possible to specify the path to the CSV directly from the TOML file (see meta.path), in which case this argument would’ve been unnecessary.

    >>> df = wellmap.load(
    ...         'std_curve.toml',
    ...         data_loader=load_cq,
    ...         merge_cols=True,
    ...         path_guess='{0.stem}.csv',
    ... )
    >>> df
       well well0 row  ... replicate  dilution         Cq
    0    A1   A01   A  ...         1  100000.0  24.180859
    1    A2   A02   A  ...         1   10000.0  20.740120
    2    A3   A03   A  ...         1    1000.0  17.183802
    3    A4   A04   A  ...         1     100.0  13.774300
    4    A5   A05   A  ...         1      10.0  10.294983
    5    A6   A06   A  ...         1       1.0   6.967062
    6    B1   B01   B  ...         2  100000.0  24.157118
    7    B2   B02   B  ...         2   10000.0  20.779703
    8    B3   B03   B  ...         2    1000.0  17.171795
    9    B4   B04   B  ...         2     100.0  13.768831
    10   B5   B05   B  ...         2      10.0  10.362967
    11   B6   B06   B  ...         2       1.0   6.870273
    12   C1   C01   C  ...         3  100000.0  24.238230
    13   C2   C02   C  ...         3   10000.0  20.787008
    14   C3   C03   C  ...         3    1000.0  17.147598
    15   C4   C04   C  ...         3     100.0  13.779314
    16   C5   C05   C  ...         3      10.0  10.292967
    17   C6   C06   C  ...         3       1.0   6.735704
    
    [18 rows x 10 columns]
    
  6. Analyze the data given the connection between the labels and the data. This step doesn’t involve wellmap, but is included here for completeness. The example below makes a linear regression of the data in log-space:

    #!/usr/bin/env python3
    
    import wellmap
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    from scipy.stats import linregress
    
    def load_cq(path):
        return (pd
                .read_csv(path)
                .rename(columns={'Cq': 'row'})
                .melt(
                        id_vars=['row'],
                        var_name='col',
                        value_name='Cq',
                )
        )
    
    df = wellmap.load(
          'std_curve.toml',
          data_loader=load_cq,
          merge_cols=True,
          path_guess='{0.stem}.csv',
    )
    
    x = df['dilution']
    y = df['Cq']
    m, b, r, p, err = linregress(np.log10(x), y)
    
    x_fit = np.logspace(0, 5)
    y_fit = np.polyval((m, b), np.log10(x_fit))
    
    r2 = r**2
    eff = 100 * (10**(1/m) - 1)
    label = 'R²={:.5f}\neff={:.2f}%'.format(r2, eff)
    
    plt.plot(x_fit, y_fit, '--', label=label)
    plt.plot(x, y, '+')
    plt.legend(loc='best')
    plt.xscale('log')
    plt.xlabel('dilution')
    plt.ylabel('Cq')
    plt.show()
    
    _images/std_curve_py.svg

    R² is a measure of how well the line fits the data. In this case, the fit is very good. Note that there are three data points for each dilution, but they are hard to tell apart because they are almost perfectly superimposed. Efficiency is a measure of how well the qPCR reaction worked, or more specifically, how close the amount of DNA came to doubling (as would be expected) on each cycle. 100% indicates perfect doubling; 94% is a little on the low side.