RackioEDA

class rackio_AI.RackioEDA(name='', description='')

Rackio Exploratory Data Analysis (RackioEDA for short) based on the pipe and filter architecture style, is an ETL framework for data extraction from homogeneous or heterogeneous sources, data transformation by data cleaning and transforming them into a proper storage format/structure for the purposes of querying and analysis; finally, data loading into the final target database such as an operational data store, a data mart, data lake or a data warehouse.

This schematic process is shown in the following image:

ETL Process

Parameters

  • :param name: (str) RackioEDA object's name
  • :param description: (str) RackioEDA object's description

returns

  • RackioEDA object
>>> from rackio_AI import RackioEDA
>>> EDA = RackioEDA(name='EDA core', description='Object Exploratory Data Analysis')
serialize(self)

Serialize RackioEDA object


Parameters

None

:return:

  • result: (dict) keys {"name", "description"}

Snippet code

>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> EDA.serialize()
{'name': 'EDA core', 'description': 'Object Exploratory Data Analysis'}
get_name(self)

Get RackioEDA object's name

returns

  • name: (str)

Snippet code

>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> EDA.get_name()
'EDA core'
description

Preprocessing attribute to storage preprocessing model description


Parameters

  • :param value: (str) RackioEDA model description

  • :return:

  • description: (str) RackioEDA model description


Snippet code

>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> EDA.description
'Object Exploratory Data Analysis'
data

Property setter methods


Parameters

  • :param value: (np.array, pd.DataFrame)

:return:

  • data: (np.array, pd.DataFrame)

Snippet code

>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['One', 'Two', 'Three'])
>>> EDA.data = df
>>> EDA.data
   One  Two  Three
0    1    2      3
1    4    5      6
2    7    8      9
insert_columns(self, df, data, column_names, locs=[], allow_duplicates=False)

Insert columns data in the dataframe df in the location locs


Parameters

  • :param data: (np.ndarray, pd.DataFrame or pd.Series) column to insert
  • :param columns: (list['str']) column name to to be added
  • :param locs: (list[int]) location where the column will be added, (optional, default=Last position)
  • :param allow_duplicates: (bool) (optional, default=False)

:return:

  • data: (pandas.DataFrame)

Snippet code

>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['One', 'Two', 'Three'])
>>> col = [10, 11, 12]
>>> EDA.insert_columns(df, col, ['Four'])
   One  Two  Three  Four
0    1    2      3    10
1    4    5      6    11
2    7    8      9    12
remove_columns(self, df, *args)

Remove columns in the data by their names


Parameters

  • :param args: (str) column name or column names to remove from the data

:return:

  • data: (pandas.DataFrame)

Snippet code

>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['One', 'Two', 'Three'])
>>> EDA.remove_columns(df, 'Two', 'Three')
   One
0    1
1    4
2    7
keep_columns(self, df, *args)

Keep columns in the data by their names


Parameters

  • :param args: (str) column name or column names to keep from the data

:return:

  • data: (pandas.DataFrame)

Snippet code

>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['One', 'Two', 'Three'])
>>> EDA.keep_columns(df, 'Two')
   Two
0    2
1    5
2    8
rename_columns(self, df, **kwargs)

Rename column names in the dataframe df


Parameters

  • :param df: (pd.DataFrame) dataframe to be renamed
  • :param kwargs: (dict) column name or column names to remove from the data

:return:

  • data: (pandas.DataFrame)

Snippet code

>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['One', 'Two', 'Three'])
>>> columns_to_rename = {'One': 'one', 'Two': 'two'}
>>> EDA.rename_columns(df, **columns_to_rename)
   one  two  Three
0    1    2      3
1    4    5      6
2    7    8      9
>>> EDA.rename_columns(df, One='one',Three='three')
   one  Two  three
0    1    2      3
1    4    5      6
2    7    8      9
change_columns(self, df, data, column_names)

Change columns in the dataframe df for another columns in the dataframe data


Parameters

  • :param df: (pandas.DataFrame)
  • :param data: (pandas.DataFrame) to change in self.data
  • :param column_names: (list) column or columns names to change

:return:

  • data: (pandas.DataFrame)

Snippet code

>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['One', 'Two', 'Three'])
>>> EDA.data = df
>>> data = pd.DataFrame([[10, 11], [13, 14], [16, 17]], columns=['Two','Three'])
>>> columns=['Two','Three']
>>> EDA.change_columns(df, data, columns)
   One  Two  Three
0    1   10     11
1    4   13     14
2    7   16     17
search_loc(self, column_name, *keys, **kwargs)

Logical indexing


Parameters

  • :param column_name: (str) to change in self.data
  • :param keys: (tuple(str)) Positional arguments
  • :param join_by: (str)
  • :param logic: (str)

:return:

  • data: (pandas.DataFrame)
set_datetime_index(self, df, label, index_name, start=datetime.datetime(2022, 1, 13, 11, 22, 11, 49655), format='%Y-%m-%d %H:%M:%S')

Set index in dataframe df in datetime format

Parameters

  • :param df: (pandas.DataFrame) Dataframe to set the index
  • :param label: (str) Column name that represents timeseries
  • :param index_name: (str) Index name
  • :param start: (str) datetime in string format "%Y-%m-$%d %H:%M:%S"
  • :param format: (str) datetime format

returns

data (pandas.DataFrame)


Snippet code

>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[0.5, 2, 3], [1.5, 5, 6], [3, 8, 9]], columns=['Time', 'Two', 'Three'])
>>> df = EDA.set_datetime_index(df, "Time", "Timestamp", start="2021-01-01 00:00:00")
resample(self, df, sample_time, label=None, datetime_format='%Y-%m-%d %H:%M:%S.%f', set_index=False)

Resample timeseries column in the dataframe df

Parameters

  • :param df: (pandas.DataFrame)
  • :param sample_time: (float or int) new sample time in the dataframe
  • :param label: (str) column name that represents timeseries values

returns

data: (pandas.DataFrame)


Snippet code

>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[0.5, 2, 3], [1, 5, 6], [1.5, 8, 9], [2, 8, 9]], columns=['Time', 'Two', 'Three'])
>>> EDA.resample(df, 1, label="Time")
   Time  Two  Three
0   0.5    2      3
2   1.5    8      9
>>> import pandas as pd
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([["2021-03-24 17:27:11.0", 2, 3], ["2021-03-24 17:27:11.5", 5, 6], ["2021-03-24 17:27:12.0", 8, 9], ["2021-03-24 17:27:12.5", 8, 9]], columns=['Time', 'Two', 'Three'])
>>> EDA.resample(df, 1, label="Time")
                    Time  Two  Three
0  2021-03-24 17:27:11.0    2      3
2  2021-03-24 17:27:12.0    8      9
reset_index(self, df, drop=False)

Reset index in the dataframe df

Parameters

  • :param df: (pandas.DataFrame)
  • :param drop: (bool) drop index from the dataframe

returns

data: (pandas.DataFrame)


Snippet code

>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[0.5, 2, 3], [1, 5, 6], [1.5, 8, 9], [2, 8, 9]], columns=['Time', 'Two', 'Three'])
>>> EDA.reset_index(df, drop=False)
   index  Time  Two  Three
0      0   0.5    2      3
1      1   1.0    5      6
2      2   1.5    8      9
3      3   2.0    8      9
print_report(self, df, info=True, head=True, header=10)

Print DataFrame report, info and head report


Parameters

  • :param df: (pd.DataFrame) DataFrame to print report
  • :param info: (bool) get info from DataFrame
  • :param head: (bool) get head from DataFrame
  • :param header: (int) number of first rows to print

:return:

  • data: (pandas.DataFrame)

Snippet code

>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['One', 'Two', 'Three'])
>>> df = EDA.print_report(df, info=True, head=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   One     3 non-null      int64
 1   Two     3 non-null      int64
 2   Three   3 non-null      int64
dtypes: int64(3)
memory usage: 200.0 bytes