I have a file where the data provided is in a column instead of a row. A single row in the CSV needs to be written as multiple rows in DB. I need a python script that can "unpivot" the column data to create multiple rows of data.
For example, columns 2241, 2242, 2243, etc. denote the weeks in the year 2022, with the value below the week column being the quantity sold during that week.
Input CSV File
NUMBER |
CODE |
SUPPLIER_CODE |
BUSINESS |
TEST_CODE |
2241 |
2242 |
2243 |
2244 |
2245 |
000001 |
8 |
1 |
02 |
H9 |
1 |
2 |
3 |
4 |
5 |
The final result should look something like this. We have 5 rows, with our week columns unpivoted so they are included as a row.
Expected Output CSV File
NUMBER |
CODE |
SUPPLIER_CODE |
BUSINESS |
TEST_CODE |
week |
Quantity |
000001 |
8 |
1 |
02 |
H9 |
2241 |
1 |
000001 |
8 |
1 |
02 |
H9 |
2242 |
2 |
000001 |
8 |
1 |
02 |
H9 |
2243 |
3 |
000001 |
8 |
1 |
02 |
H9 |
2244 |
4 |
000001 |
8 |
1 |
02 |
H9 |
2245 |
5 |
The python script below will perform this "unpivot" transformation. A simple breakdown of the script is below:
import csv # NUMBER,CODE,SUPPLIER_CODE,BUSINESS,TEST_CODE non_week_row_count = 5 def executeNode(inputs): out_file_path = "/tmp/unpivoted.csv" header = None with open(out_file_path, "w") as unpivoted_file: csv_writer = csv.writer(unpivoted_file, delimiter = ',') with open(inputs["PivotedFile"]) as pivoted_file: csv_reader = csv.reader(pivoted_file, delimiter=',') for row in csv_reader: if not header: header = row else: for i in range(non_week_row_count, non_week_row_count + 5): # 5 weeks unpivoted_row = [] for j in range(non_week_row_count): unpivoted_row.append(row[j]) # week number unpivoted_row.append(header[i]) # forecast unpivoted_row.append(row[i]) csv_writer.writerow(unpivoted_row) outputs = {} outputs["UnpivotedFile"] = out_file_path return outputs
Below is a similar example to the script above, where we add the Header line to our output file and calculating the amount of week columns by taking the difference between the header week columns and the header non-week columns. This allows the script to handle the possibility of having more/less than 5 week columns.
import csv # NUMBER,CODE,SUPPLIER_CODE,BUSINESS,TEST header_array = ['#NUMBER','CODE','SUPPLIER_CODE','BUSINESS','TEST','WEEK','QUANTITY'] non_week_row_count = 5 def executeNode(inputs): out_file_path = "/tmp/unpivoted.csv" header = None with open(out_file_path, "w") as unpivoted_file: csv_writer = csv.writer(unpivoted_file, delimiter = ',') with open(inputs["PivotedFile"]) as pivoted_file: csv_reader = csv.reader(pivoted_file, delimiter=',') for row in csv_reader: if not header: csv_writer.writerow(header_array) header = row else: for i in range(non_week_row_count, non_week_row_count + (len(header) - non_week_row_count)): # Calculate week column count by taking Header length and subtracting non_week_row_count unpivoted_row = [] for j in range(non_week_row_count): unpivoted_row.append(row[j]) # week number unpivoted_row.append(header[i]) # forecast unpivoted_row.append(row[i]) csv_writer.writerow(unpivoted_row) outputs = {} outputs["UnpivotedFile"] = out_file_path return outputs