PhixFlow Help
Making "near" matches
By the end of this chapter you will be able to:
- Use Near Match pipes
Drag the following Streams onto a model:
- AM Address List
- AM Problem Names
Run Analysis on both of these and review the data. In AM Problem Names there is a list of names which have not been successfully matched against a list of all names and addresses, and in AM Address List is the list of all names and addresses.
You will create a Stream to find the best match for the problem names:
- Add a Stream with Name Find Near Match Names and Addresses
- Add a pull pipe from AM Problem Names into Find Near Match Names and Addresses
- Drag the attribute NAME from AM Problem Names into Find Near Match Names and Addresses
- Add a pipe from AM Address List into Find Near Match Names and Addresses
- Drag the attributes from AM Address List into Find Near Match Names and Addresses (you will need to create a new attribute in Find Near Match Names and Addresses with a Name other than NAME, since this has already been used – called this NearMatchName, and give this an expression to read NAME from AM Address List)
- To complete configuration of the pipe from AM Address List into Find Near Match Names and Addresses:
- Make this a lookup pipe
- Set Index Type to Near Match
- Set the Maximum Number of Edits Expression to 3
This defines the maximum number of changes that can be made to make the value used to lookup match the value being looked up. This can be any expression, but in this case we set a fixed value of 3 for all lookups.
-
- Add the Order/Index Attribute:
NAME = _out.NAME
Run Analysis on Find Near Match Names and Addresses. You will get an error – in some cases, more than one row is returned. Update your attribute expressions to pick a single record returned by the lookup – you can arbitrarily choose any of the records returned – and save the values in this record to the Stream.
Please let us know if we could improve this page feedback@phixflow.com