How to split files depending on column values

This post describes an easy way to split files into separate files depending on the column values of your input input file.
When you allways expect the same values u can use the tFilterrow compoment, but when the values are different each time it’s a lot easier to do it this way:

1. Create a new job

  • Name your new job “splitFiles”

2. Add the tFixedFlowInput compoment to your canvas

  • Edit the schema
  • Add the city column as a String
  • Add the value column as a String

Split csv files schema

  • Set the row and field separator
  • Add the content

Split csv files tfixedflowinput

Content: Los Angeles;1 Amsterdam;1 Rome;1 Barcelona;1 London;1 Berlin;1 Detroit;1 Amsterdam;2 Amsterdam;3 Los Angeles;2

3. Add contexts

In this job we are going to pass the values to a subjob and write them to a file. To achieve this we need to temporarely store the data into contexts.

  • Go to the context tab of your job and click the + button
  • Add city as a string
  • Add value as a string

Split csv files context

4. Add the tJavaRow to your canvas

  • Connect the tFixedFlowInput to your tJavaRow component (main)
  • Go to the basic setting of your tJavaRow component and press “Sync columns”
  • Add the following code:

Split csv files java add to context

Code: = =; context.value = input_row.value;

5. Create a new job

  • Right click job design
  • Select “create job” and name your new job “creator”

6. Add the tFixedFlowInput to the canvas of your “creator” job

  • Edit the schema

Split csv files schema

  • Add your contexts

Split csv files context subjob

  • Add the columns and values (when you enter context. in the value column and then press ctrl+spacebar you can select your contexts)

Split csv values tFixedFlowInput

7. Put the tFileOutputDelimited at the canvas of you creator job

  • Enter the filename (replace the YourProfile part)
  • Enter the row and field separator¬†(if you use \n a
  • s a row separator the tFileOutputDelimited will create one row instead of serveral rows)
  • Check the append option
  • Press “Sync columns”
  • Connect the tFixedFlowInput component to the tFileOutputDelimited component (Main)

Split csv files tFileOutputDelimited

8. Add the tRunJob component the canvas of your splitFiles job

  • Connect the tJavaRow component to your tRunJob component (Main)
  • Press Sync columns in you tRubJob component
  • Select the creator job

Split csv files tRunJob

9. Run your job

You will see that the following files are created:

Split csv files result

You will also see that the Amsterdam.txt file contains 3 rows:


10. splitRows job

This is how your split files job should look like:

Split csv files split row job

11. Creator job

This is how your creator job should look like:

Split csv files creator job

If you have any questions just leave a comment!!!


Leave a Reply