Имам входна рамка с данни, където искам да направя превод с подобен тип записи в един запис. Например входната рамка с данни съдържа много записи от procdata_*
записи, където искам само един запис от нея в изходната рамка с данни, както е показано по-долу:
Входна рамка с данни:
+-------------------+----------+------+--------------------+--------------------+------------+------------+---------------+
| File_name |Cycle_date|Status| Source_time| Target_time|Source_count|Target_count|Missing_Records|
+-----------------------+----------+------+--------------------+--------------------+------------+------------+---------------+
|data_20171223_f.csv| 20180911| PASS|2018-12-05 10:37:10 |2018-12-05 10:37:12 | 5| 5| 0|
|data_20180421_f.csv| 20180911| PASS|2018-12-05 10:37:10 |2018-12-05 10:37:12 | 5| 4| 1|
|data_20171007_f.csv| 20180911| PASS|2018-12-05 10:37:12 |2018-12-05 10:37:12 | 6| 4| 2|
|data_20160423_f.csv| 20180911| PASS|2018-12-05 10:37:14 |2018-12-05 10:37:15 | 4| 4| 0|
|data_20180106_f.csv| 20180911| PASS|2018-12-05 10:37:15 |2018-12-05 10:37:15 | 10| 9| 1|
|raw_20180120_f.csv | 20180911| PASS|2018-12-05 10:37:16 |2018-12-05 10:37:17 | 10| 10| 0|
|raw_20171202_f.csv | 20180911| PASS|2018-12-05 10:37:17 |2018-12-05 10:37:18 | 2| 2| 0|
|raw_20151219_f.csv | 20180911| PASS|2018-12-05 10:37:17 |2018-12-05 10:37:18 | 10| 10| 0|
|raw_20151031_f.csv | 20180911| PASS|2018-12-05 10:37:17 |2018-12-05 10:37:18 | 8| 8| 0|
|raw_20170204_f.csv | 20180911| PASS|2018-12-05 10:37:18 |2018-12-05 10:37:18 | 12| 10| 2|
|eeight.csv | 20180911| FAIL|2018-12-05 10:37:18 |2018-12-05 10:37:19 | 10| 10| 10|
+-----------------------+----------+------+--------------------+--------------------+------------+------------+---------------+
Изходна рамка от данни:
+-----------------------+----------+------+--------------------+--------------------+------------+------------+---------------+
| File_name |Cycle_date|Status| Source_time| Target_time|Source_count|Target_count|Missing_Records|
+-----------------------+----------+------+--------------------+--------------------+------------+------------+---------------+
|data.csv | 20180911| PASS|2018-12-05 10:37:10 |2018-12-05 10:37:15 | 30| 26| 4|
|raw.csv | 20180911| PASS|2018-12-05 10:37:16 |2018-12-05 10:37:18 | 42| 40| 2|
|eeight.csv | 20180911| FAIL|2018-12-05 10:37:18 |2018-12-05 10:37:19 | 10| 10| 0|
+-----------------------+----------+------+--------------------+--------------------+------------+------------+---------------+
Как може да се постигне това в Spark?