Anti-money laundering

Dedect relationship between A and E by tracing through payments with similar amounts and identifying payment chains.

For example：

`lag`

means `lag(daystamp,-1) over (partitin by accname, Cntpty_Acct_Name order by daystamp )`

accname | Event_Dt | Tx_Amt | Cntpty_Acct_Name | daystamp | id | lag |
---|---|---|---|---|---|---|

a | 2020-01-01 | 20.0 | b | 7305 | 5068 | 7306.0 |

a | 2020-01-02 | 300.0 | b | 7306 | 5069 | 7307.0 |

a | 2020-01-03 | 180.0 | b | 7307 | 5070 | Infinity |

b | 2020-01-03 | 40.0 | c | 7307 | 5071 | 7307.0 |

b | 2020-01-03 | 500.0 | c | 7307 | 5072 | 7308.0 |

b | 2020-01-04 | 10.0 | c | 7308 | 5073 | Infinity |

b | 2020-01-03 | 150.0 | d | 7307 | 5074 | Infinity |

c | 2020-01-04 | 50.0 | e | 7308 | 5075 | Infinity |

d | 2020-01-04 | 150.0 | e | 7308 | 5076 | Infinity |

You can run ` spark-submit aml.py`

to get the payment chains seem like money laundering. The same ‘batch_id’ indicates these transactions belong to a complete chain. The field ‘depth’ indicates the length of the chain.

id | batch_id | src | dst | amount_sum | depth | accname | Event_Dt | Tx_Amt | Cntpty_Acct_Name |
---|---|---|---|---|---|---|---|---|---|

5068 | 0 | a | e | 200.0 | 4 | a | 2020-01-01 | 20.0 | b |

5070 | 0 | a | e | 200.0 | 4 | a | 2020-01-03 | 180.0 | b |

5071 | 0 | a | e | 200.0 | 4 | b | 2020-01-03 | 40.0 | c |

5073 | 0 | a | e | 200.0 | 4 | b | 2020-01-04 | 10.0 | c |

5074 | 0 | a | e | 200.0 | 4 | b | 2020-01-03 | 150.0 | d |

5075 | 0 | a | e | 200.0 | 4 | c | 2020-01-04 | 50.0 | e |

5076 | 0 | a | e | 200.0 | 4 | d | 2020-01-04 | 150.0 | e |

5068 | 1 | a | c | 500.0 | 3 | a | 2020-01-01 | 20.0 | b |

5069 | 1 | a | c | 500.0 | 3 | a | 2020-01-02 | 300.0 | b |

5070 | 1 | a | c | 500.0 | 3 | a | 2020-01-03 | 180.0 | b |

5072 | 1 | a | c | 500.0 | 3 | b | 2020-01-03 | 500.0 | c |

Enjoy it !

?