How can I use NEO Plasma to convert a single row into multiple rows?

Question

I have a file where the data provided is in a column instead of a row. A single row in the CSV needs to be written as multiple rows in DB. I need a python script that can "unpivot" the column data to create multiple rows of data.

For example, columns 2241, 2242, 2243, etc. denote the weeks in the year 2022, with the value below the week column being the quantity sold during that week.

Input CSV File

NUMBER	CODE	SUPPLIER_CODE	BUSINESS	TEST_CODE	2241	2242	2243	2244	2245
000001	8	1	02	H9	1	2	3	4	5

The final result should look something like this. We have 5 rows, with our week columns unpivoted so they are included as a row.

Expected Output CSV File

NUMBER	CODE	SUPPLIER_CODE	BUSINESS	TEST_CODE	week	Quantity
000001	8	1	02	H9	2241	1
000001	8	1	02	H9	2242	2
000001	8	1	02	H9	2243	3
000001	8	1	02	H9	2244	4
000001	8	1	02	H9	2245	5

jcourter · Accepted Answer

The python script below will perform this "unpivot" transformation. A simple breakdown of the script is below: create a new output file read the input file iterate over all rows iterate over all week columns create a new unpivoted_row append the non-week values (NUMBER, CODE, SUPPLIER_CODE, BUSINESS, TEST_CODE) to the unpivoted_row append the week value, which comes from the column header, to the unpivoted_row  append the quantity value to the unpivoted_row add the unpivoted_row to our cvs_writer to write the row to our output file return output file import csv # NUMBER,CODE,SUPPLIER_CODE,BUSINESS,TEST_CODE non_week_row_count = 5 def executeNode(inputs): out_file_path = "/tmp/unpivoted.csv" header = None with open(out_file_path, "w") as unpivoted_file: csv_writer = csv.writer(unpivoted_file, delimiter = ',') with open(inputs["PivotedFile"]) as pivoted_file: csv_reader = csv.reader(pivoted_file, delimiter=',') for row in csv_reader: if not header: header = row else: for i in range(non_week_row_count, non_week_row_count + 5): # 5 weeks unpivoted_row = [] for j in range(non_week_row_count): unpivoted_row.append(row[j]) # week number unpivoted_row.append(header[i]) # forecast unpivoted_row.append(row[i]) csv_writer.writerow(unpivoted_row) outputs = {} outputs["UnpivotedFile"] = out_file_path return outputs

jcourter · Answer

Below is a similar example to the script above, where we add the Header line to our output file and calculating the amount of week columns by taking the difference between the header week columns and the header non-week columns. This allows the script to handle the possibility of having more/less than 5 week columns.  import csv # NUMBER,CODE,SUPPLIER_CODE,BUSINESS,TEST header_array = ['#NUMBER','CODE','SUPPLIER_CODE','BUSINESS','TEST','WEEK','QUANTITY'] non_week_row_count = 5 def executeNode(inputs): out_file_path = "/tmp/unpivoted.csv" header = None with open(out_file_path, "w") as unpivoted_file: csv_writer = csv.writer(unpivoted_file, delimiter = ',') with open(inputs["PivotedFile"]) as pivoted_file: csv_reader = csv.reader(pivoted_file, delimiter=',') for row in csv_reader: if not header: csv_writer.writerow(header_array) header = row else: for i in range(non_week_row_count, non_week_row_count + (len(header) - non_week_row_count)): # Calculate week column count by taking Header length and subtracting non_week_row_count unpivoted_row = [] for j in range(non_week_row_count): unpivoted_row.append(row[j]) # week number unpivoted_row.append(header[i]) # forecast unpivoted_row.append(row[i]) csv_writer.writerow(unpivoted_row) outputs = {} outputs["UnpivotedFile"] = out_file_path return outputs

[Solved] How can I use NEO Plasma to convert a single row into multiple rows?