Indexed or unindexed columns ============================ One important feature of the Table container is the ability to compute arithmetic operations with and without taking the index into account. Using the indexed or the unindexed operation is left as a choice for the user. This notebook tries to explain the differences, and the consequences of using each option. We'll start by creating a sparse Table. .. code:: python from gtable import Table t = Table() t.add_column('a', [1,2,3,4,5,6]) t.add_column('b', [1,2,3], align="bottom") If we access the column by attribute, we'll get a type called ``Column``, that includes information about the index .. code:: python t.a .. parsed-literal:: Accessing by key is a shortcut to the data stored within the ``Table``, and has no information about how the table is indexed. .. code:: python t['a'] .. parsed-literal:: array([1, 2, 3, 4, 5, 6]) The column ``a`` is not a particularly good example, but ``b`` is. The data stored in the latter column has only three elements. Where those elements are actually placed within the table is stored in the index. .. code:: python t['b'] .. parsed-literal:: array([1, 2, 3]) The easiest and safest way to operate with columns is to take the index into account .. code:: python c = t.a + t.b .. code:: python c.values .. parsed-literal:: array([ 2., 3., 4.]) See that, since the ``b`` column only had three elements, the result of the addition with the ``a`` column has only three elements. There are no NaNs or NAs. However, using the index to perform arithmetic operations has some cost, particularly in the case of large dense columns. Assume that we want to scale the ``a`` column by the last element of ``b``. We can do that either accessing the full column or by accessing the raw data .. code:: python c = t.a * t.b[-1] .. code:: python c.values .. parsed-literal:: array([ 3, 6, 9, 12, 15, 18]) Using columns is more convenient, since in many cases arithmetic operations do what they are supposed to do, but they have an important caveat: performance: .. code:: python %%timeit t.a * t.b[-1] .. parsed-literal:: 26.2 µs ± 224 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) .. code:: python %%timeit t['a'] * t['b'][-1] .. parsed-literal:: 6.19 µs ± 88.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) But since the data of each column has a different length, using the raw data or the colum will have different outcomes .. code:: python t.a + t.b .. parsed-literal:: .. code:: python t['a'] + t['b'] :: --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () ----> 1 t['a'] + t['b'] ValueError: operands could not be broadcast together with shapes (6,) (3,) A caveat of columns is that they are designed to perform fast operations using the column as a whole, and in consequence, accessing individual item of a column is O(N). Another important difference is that we can create new columns by attribute, but not by index .. code:: python t.c = t.a + t.b .. code:: python t .. parsed-literal:: , b[3] , c[3] ] object at 0x7f5348736fd0> .. code:: python t['d'] = t['a'] :: --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in () ----> 1 t['d'] = t['a'] ~/projects/gtable/gtable/table.py in __setitem__(self, key, value) 158 159 def __setitem__(self, key, value): --> 160 self.data[self.keys.index(key)] = value 161 162 def __delitem__(self, key): ValueError: 'd' is not in list