Notifications
Clear all

[Solved] How can I use NEO Plasma to convert a single row into multiple rows?

2 Posts
1 Users
1 Likes
514 Views
0
Topic starter

I have a file where the data provided is in a column instead of a row. A single row in the CSV needs to be written as multiple rows in DB. I need a python script that can "unpivot" the column data to create multiple rows of data.

 

For example, columns 2241, 2242, 2243, etc. denote the weeks in the year 2022, with the value below the week column being the quantity sold during that week.

Input CSV File

NUMBER

CODE

SUPPLIER_CODE

BUSINESS

TEST_CODE

 2241

2242

2243

2244

2245

000001

8

1

02

H9

1

2

3

4

5

 

 

The final result should look something like this. We have 5 rows, with our week columns unpivoted so they are included as a row.

Expected Output CSV File

NUMBER

CODE

SUPPLIER_CODE

BUSINESS

TEST_CODE

week

Quantity

000001

8

1

02

H9

2241

1

000001

8

1

02

H9

2242

2

000001

8

1

02

H9

2243

3

000001

8

1

02

H9

2244

4

000001

8

1

02

H9

2245

5

2 Answers
1
Topic starter

The python script below will perform this "unpivot" transformation. A simple breakdown of the script is below:

  • create a new output file
  • read the input file
  • iterate over all rows
  • iterate over all week columns
  • create a new unpivoted_row
  • append the non-week values (NUMBER, CODE, SUPPLIER_CODE, BUSINESS, TEST_CODE) to the unpivoted_row
  • append the week value, which comes from the column header, to the unpivoted_row 
  • append the quantity value to the unpivoted_row
  • add the unpivoted_row to our cvs_writer to write the row to our output file
  • return output file
import csv

# NUMBER,CODE,SUPPLIER_CODE,BUSINESS,TEST_CODE
non_week_row_count = 5

def executeNode(inputs):
  out_file_path = "/tmp/unpivoted.csv"
  header = None
  with open(out_file_path, "w") as unpivoted_file:
    csv_writer = csv.writer(unpivoted_file, delimiter = ',')
    with open(inputs["PivotedFile"]) as pivoted_file:
      csv_reader = csv.reader(pivoted_file, delimiter=',')
      for row in csv_reader:
        if not header:
          header = row
        else:
          for i in range(non_week_row_count, non_week_row_count + 5):  # 5 weeks
            unpivoted_row = []
            for j in range(non_week_row_count):
              unpivoted_row.append(row[j])
            # week number
            unpivoted_row.append(header[i])
            # forecast
            unpivoted_row.append(row[i])
            csv_writer.writerow(unpivoted_row)

  outputs = {}
  outputs["UnpivotedFile"] = out_file_path
  
  return outputs
0
Topic starter

Below is a similar example to the script above, where we add the Header line to our output file and calculating the amount of week columns by taking the difference between the header week columns and the header non-week columns. This allows the script to handle the possibility of having more/less than 5 week columns. 

import csv

# NUMBER,CODE,SUPPLIER_CODE,BUSINESS,TEST
header_array = ['#NUMBER','CODE','SUPPLIER_CODE','BUSINESS','TEST','WEEK','QUANTITY']
non_week_row_count = 5

def executeNode(inputs):
  out_file_path = "/tmp/unpivoted.csv"
  header = None
  with open(out_file_path, "w") as unpivoted_file:
    csv_writer = csv.writer(unpivoted_file, delimiter = ',')
    with open(inputs["PivotedFile"]) as pivoted_file:
      csv_reader = csv.reader(pivoted_file, delimiter=',')
      for row in csv_reader:
        if not header:
          csv_writer.writerow(header_array)
          header = row
        else:
          for i in range(non_week_row_count, non_week_row_count + (len(header) - non_week_row_count)):  # Calculate week column count by taking Header length and subtracting non_week_row_count
            unpivoted_row = []
            for j in range(non_week_row_count):
              unpivoted_row.append(row[j])
            # week number
            unpivoted_row.append(header[i])
            # forecast
            unpivoted_row.append(row[i])
            csv_writer.writerow(unpivoted_row)

  outputs = {}
  outputs["UnpivotedFile"] = out_file_path
  
  return outputs