RackioEDA
rackio_AI.RackioEDA(name='', description='')Rackio Exploratory Data Analysis (RackioEDA for short) based on the pipe and filter architecture style, is an ETL framework for data extraction from homogeneous or heterogeneous sources, data transformation by data cleaning and transforming them into a proper storage format/structure for the purposes of querying and analysis; finally, data loading into the final target database such as an operational data store, a data mart, data lake or a data warehouse.
This schematic process is shown in the following image:

Parameters
- :param name: (str) RackioEDA object's name
- :param description: (str) RackioEDA object's description
returns
- RackioEDA object
>>> from rackio_AI import RackioEDA
>>> EDA = RackioEDA(name='EDA core', description='Object Exploratory Data Analysis')
serialize(self)Serialize RackioEDA object
Parameters
None
:return:
- result: (dict) keys {"name", "description"}
Snippet code
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> EDA.serialize()
{'name': 'EDA core', 'description': 'Object Exploratory Data Analysis'}
get_name(self)Get RackioEDA object's name
returns
- name: (str)
Snippet code
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> EDA.get_name()
'EDA core'
descriptionPreprocessing attribute to storage preprocessing model description
Parameters
-
:param value: (str) RackioEDA model description
-
:return:
-
description: (str) RackioEDA model description
Snippet code
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> EDA.description
'Object Exploratory Data Analysis'
dataProperty setter methods
Parameters
- :param value: (np.array, pd.DataFrame)
:return:
- data: (np.array, pd.DataFrame)
Snippet code
>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['One', 'Two', 'Three'])
>>> EDA.data = df
>>> EDA.data
One Two Three
0 1 2 3
1 4 5 6
2 7 8 9
insert_columns(self, df, data, column_names, locs=[], allow_duplicates=False)Insert columns data in the dataframe df in the location locs
Parameters
- :param data: (np.ndarray, pd.DataFrame or pd.Series) column to insert
- :param columns: (list['str']) column name to to be added
- :param locs: (list[int]) location where the column will be added, (optional, default=Last position)
- :param allow_duplicates: (bool) (optional, default=False)
:return:
- data: (pandas.DataFrame)
Snippet code
>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['One', 'Two', 'Three'])
>>> col = [10, 11, 12]
>>> EDA.insert_columns(df, col, ['Four'])
One Two Three Four
0 1 2 3 10
1 4 5 6 11
2 7 8 9 12
remove_columns(self, df, *args)Remove columns in the data by their names
Parameters
- :param args: (str) column name or column names to remove from the data
:return:
- data: (pandas.DataFrame)
Snippet code
>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['One', 'Two', 'Three'])
>>> EDA.remove_columns(df, 'Two', 'Three')
One
0 1
1 4
2 7
keep_columns(self, df, *args)Keep columns in the data by their names
Parameters
- :param args: (str) column name or column names to keep from the data
:return:
- data: (pandas.DataFrame)
Snippet code
>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['One', 'Two', 'Three'])
>>> EDA.keep_columns(df, 'Two')
Two
0 2
1 5
2 8
rename_columns(self, df, **kwargs)Rename column names in the dataframe df
Parameters
- :param df: (pd.DataFrame) dataframe to be renamed
- :param kwargs: (dict) column name or column names to remove from the data
:return:
- data: (pandas.DataFrame)
Snippet code
>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['One', 'Two', 'Three'])
>>> columns_to_rename = {'One': 'one', 'Two': 'two'}
>>> EDA.rename_columns(df, **columns_to_rename)
one two Three
0 1 2 3
1 4 5 6
2 7 8 9
>>> EDA.rename_columns(df, One='one',Three='three')
one Two three
0 1 2 3
1 4 5 6
2 7 8 9
change_columns(self, df, data, column_names)Change columns in the dataframe df for another columns in the dataframe data
Parameters
- :param df: (pandas.DataFrame)
- :param data: (pandas.DataFrame) to change in self.data
- :param column_names: (list) column or columns names to change
:return:
- data: (pandas.DataFrame)
Snippet code
>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['One', 'Two', 'Three'])
>>> EDA.data = df
>>> data = pd.DataFrame([[10, 11], [13, 14], [16, 17]], columns=['Two','Three'])
>>> columns=['Two','Three']
>>> EDA.change_columns(df, data, columns)
One Two Three
0 1 10 11
1 4 13 14
2 7 16 17
search_loc(self, column_name, *keys, **kwargs)Logical indexing
Parameters
- :param column_name: (str) to change in self.data
- :param keys: (tuple(str)) Positional arguments
- :param join_by: (str)
- :param logic: (str)
:return:
- data: (pandas.DataFrame)
set_datetime_index(self, df, label, index_name, start=datetime.datetime(2022, 1, 13, 11, 22, 11, 49655), format='%Y-%m-%d %H:%M:%S')Set index in dataframe df in datetime format
Parameters
- :param df: (pandas.DataFrame) Dataframe to set the index
- :param label: (str) Column name that represents timeseries
- :param index_name: (str) Index name
- :param start: (str) datetime in string format "%Y-%m-$%d %H:%M:%S"
- :param format: (str) datetime format
returns
data (pandas.DataFrame)
Snippet code
>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[0.5, 2, 3], [1.5, 5, 6], [3, 8, 9]], columns=['Time', 'Two', 'Three'])
>>> df = EDA.set_datetime_index(df, "Time", "Timestamp", start="2021-01-01 00:00:00")
resample(self, df, sample_time, label=None, datetime_format='%Y-%m-%d %H:%M:%S.%f', set_index=False)Resample timeseries column in the dataframe df
Parameters
- :param df: (pandas.DataFrame)
- :param sample_time: (float or int) new sample time in the dataframe
- :param label: (str) column name that represents timeseries values
returns
data: (pandas.DataFrame)
Snippet code
>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[0.5, 2, 3], [1, 5, 6], [1.5, 8, 9], [2, 8, 9]], columns=['Time', 'Two', 'Three'])
>>> EDA.resample(df, 1, label="Time")
Time Two Three
0 0.5 2 3
2 1.5 8 9
>>> import pandas as pd
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([["2021-03-24 17:27:11.0", 2, 3], ["2021-03-24 17:27:11.5", 5, 6], ["2021-03-24 17:27:12.0", 8, 9], ["2021-03-24 17:27:12.5", 8, 9]], columns=['Time', 'Two', 'Three'])
>>> EDA.resample(df, 1, label="Time")
Time Two Three
0 2021-03-24 17:27:11.0 2 3
2 2021-03-24 17:27:12.0 8 9
reset_index(self, df, drop=False)Reset index in the dataframe df
Parameters
- :param df: (pandas.DataFrame)
- :param drop: (bool) drop index from the dataframe
returns
data: (pandas.DataFrame)
Snippet code
>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[0.5, 2, 3], [1, 5, 6], [1.5, 8, 9], [2, 8, 9]], columns=['Time', 'Two', 'Three'])
>>> EDA.reset_index(df, drop=False)
index Time Two Three
0 0 0.5 2 3
1 1 1.0 5 6
2 2 1.5 8 9
3 3 2.0 8 9
print_report(self, df, info=True, head=True, header=10)Print DataFrame report, info and head report
Parameters
- :param df: (pd.DataFrame) DataFrame to print report
- :param info: (bool) get info from DataFrame
- :param head: (bool) get head from DataFrame
- :param header: (int) number of first rows to print
:return:
- data: (pandas.DataFrame)
Snippet code
>>> import pandas as pd
>>> from rackio_AI import RackioAI
>>> EDA = RackioAI.get(name="EDA core", _type='EDA')
>>> df = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]], columns=['One', 'Two', 'Three'])
>>> df = EDA.print_report(df, info=True, head=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 One 3 non-null int64
1 Two 3 non-null int64
2 Three 3 non-null int64
dtypes: int64(3)
memory usage: 200.0 bytes