
Enabling and disabling reject flows
Rejected data is closely coupled to schemas (Chapter 2, Metadata and Schemas), as many of the input and output components will validate data according to a schema definition and then pass any incorrect data to a reject flow.
Reject flows thus allow non-conforming data to be collected and handled as per the needs of a project.
In some cases, depending upon the business requirement, rejects are not acceptable. In these cases, reject flows should be disabled and the job allowed to fail.
Tip
Whether a job dies on the first incorrect record, collects rejects in a file, or completely ignores rejects is a design decision that should be based upon the requirements for the process. Where possible, designers and developers should attempt to define how errors and rejects are handled before coding begins.
Getting ready
Open the job jo_cook_ch03_0000_inputReject
.
How to do it…
- Run the job and it will fail with an unparseable date error.
- Open the
tFileInputDelimited
component and in the Basic settings tab uncheck the Die on error box. - Drag a new
tLogRow
to the canvas, open it and set the mode to Table. - Right-click the
tFileInputDelimited
component, and select Row, then reject. Connect this row to the newtLogRow
.Your job should look like the following: - Run the job. You should see that two records have now been passed to the reject flow.
How it works…
When Talend reads an input data source, it attempts to parse the data into the schema. If it cannot parse the data, then it will fail with a Java error.
When the die on error box is unchecked, Talend enables a reject flow to be added to the component and changes the action of the component, so that instead of killing the job, invalid rows are passed to a reject flow.
There's more...
You can, if required, ignore any rejects by not attaching a reject flow, but it is wise to double check first if this is a genuine requirement for the process. Most cases of rejects being ignored are down to programmers forgetting to check if there is a reject flow for the given component.
In the tFileInputDelimited
component, there is an Advanced tab that enables data to be validated against the schema and for dates to be checked. These options provide an added level of validation for the input data.
Tip
It is always worth checking every input component for the presence of reject flow when die on error is unchecked, or for additional validation options.
In many cases, these validations will not be explicitly stated in a specification, so it is always worth checking with the customer to see if they require rejects and/or validation rules to be added.
See also
- Gathering all rejects from an input, in this chapter.