I have a file where the data provided is in a column instead of a row. A single row in the CSV needs to be written as multiple rows in DB. I need a python script that can "unpivot" the column data to create multiple rows of data.
For example, columns 2241, 2242, 2243, etc. denote the weeks in the year 2022, with the value below the week column being the quantity sold during that week.
Input CSV File
|
NUMBER |
CODE |
SUPPLIER_CODE |
BUSINESS |
TEST_CODE |
2241 |
2242 |
2243 |
2244 |
2245 |
|
000001 |
8 |
1 |
02 |
H9 |
1 |
2 |
3 |
4 |
5 |
The final result should look something like this. We have 5 rows, with our week columns unpivoted so they are included as a row.
Expected Output CSV File
|
NUMBER |
CODE |
SUPPLIER_CODE |
BUSINESS |
TEST_CODE |
week |
Quantity |
|
000001 |
8 |
1 |
02 |
H9 |
2241 |
1 |
|
000001 |
8 |
1 |
02 |
H9 |
2242 |
2 |
|
000001 |
8 |
1 |
02 |
H9 |
2243 |
3 |
|
000001 |
8 |
1 |
02 |
H9 |
2244 |
4 |
|
000001 |
8 |
1 |
02 |
H9 |
2245 |
5 |
The python script below will perform this "unpivot" transformation. A simple breakdown of the script is below:
import csv
# NUMBER,CODE,SUPPLIER_CODE,BUSINESS,TEST_CODE
non_week_row_count = 5
def executeNode(inputs):
out_file_path = "/tmp/unpivoted.csv"
header = None
with open(out_file_path, "w") as unpivoted_file:
csv_writer = csv.writer(unpivoted_file, delimiter = ',')
with open(inputs["PivotedFile"]) as pivoted_file:
csv_reader = csv.reader(pivoted_file, delimiter=',')
for row in csv_reader:
if not header:
header = row
else:
for i in range(non_week_row_count, non_week_row_count + 5): # 5 weeks
unpivoted_row = []
for j in range(non_week_row_count):
unpivoted_row.append(row[j])
# week number
unpivoted_row.append(header[i])
# forecast
unpivoted_row.append(row[i])
csv_writer.writerow(unpivoted_row)
outputs = {}
outputs["UnpivotedFile"] = out_file_path
return outputs
Below is a similar example to the script above, where we add the Header line to our output file and calculating the amount of week columns by taking the difference between the header week columns and the header non-week columns. This allows the script to handle the possibility of having more/less than 5 week columns.
import csv
# NUMBER,CODE,SUPPLIER_CODE,BUSINESS,TEST
header_array = ['#NUMBER','CODE','SUPPLIER_CODE','BUSINESS','TEST','WEEK','QUANTITY']
non_week_row_count = 5
def executeNode(inputs):
out_file_path = "/tmp/unpivoted.csv"
header = None
with open(out_file_path, "w") as unpivoted_file:
csv_writer = csv.writer(unpivoted_file, delimiter = ',')
with open(inputs["PivotedFile"]) as pivoted_file:
csv_reader = csv.reader(pivoted_file, delimiter=',')
for row in csv_reader:
if not header:
csv_writer.writerow(header_array)
header = row
else:
for i in range(non_week_row_count, non_week_row_count + (len(header) - non_week_row_count)): # Calculate week column count by taking Header length and subtracting non_week_row_count
unpivoted_row = []
for j in range(non_week_row_count):
unpivoted_row.append(row[j])
# week number
unpivoted_row.append(header[i])
# forecast
unpivoted_row.append(row[i])
csv_writer.writerow(unpivoted_row)
outputs = {}
outputs["UnpivotedFile"] = out_file_path
return outputs