Shell script merge two list and remove duplicates

rajeshkumar created the topic: shell script merge two list and remove duplicates

You want all the records from list_A supplemented by all the records from list_B for which there is not already a matching name in list A. Mathematically this is:

A + B - {w in B | (w,value) in A }

\

There are many ways of accomplishing this, depending on access and needed efficiencies.

* If you can modify DB1 (with A), then download table B from DB2, upload it to DB1, then extract your data with the appropriate join
* If you can’t modify DB1, then download both A and B and concatenate them to the same stream, with A followed by B. Then sort by the first field. Then process the stream one record at time. Duplicate names will be side-by-side. If the same name appears more than one time, print the first and ignore subsequent records with the same name.

Here is a sample solution to your problem (starting with two lists of names/values)

#!/bin/bash

A="Smith value1
Jones value2
Wilson value3"

B="Smith value10
Wilson value11
Fox value12
Brown value13"

PrevName="Not a valid name"
echo "$A
$B" | sort -k1 |
while read Name Value
do
if [ "$Name" != "$PrevName" ]; then
echo $Name $Value
fi
PrevName="$Name"
done > outfile

You want all the records from list_A supplemented by all the records from list_B for which there is not already a matching name in list A. Mathematically this is:

A + B – {w in B | (w,value) in A }

There are many ways of accomplishing this, depending on access and needed efficiencies.

* If you can modify DB1 (with A), then download table B from DB2, upload it to DB1, then extract your data with the appropriate join
* If you can’t modify DB1, then download both A and B and concatenate them to the same stream, with A followed by B. Then sort by the first field. Then process the stream one record at time. Duplicate names will be side-by-side. If the same name appears more than one time, print the first and ignore subsequent records with the same name.

Here is a sample solution to your problem (starting with two lists of names/values):

#!/bin/bash

A=”Smith value1
Jones value2
Wilson value3″

B=”Smith value10
Wilson value11
Fox value12
Brown value13″

PrevName=”Not a valid name”
echo “$A
$B” | sort -k1 |
while read Name Value
do
if [ “$Name” != “$PrevName” ]; then
echo $Name $Value
fi
PrevName=”$Name”
done > outfile

Here is the output:
Brown value13
Fox value12
Jones value2
Smith value1
Wilson value11
Regards,
Rajesh Kumar
Twitt me @ twitter.com/RajeshKumarIn

Rajesh Kumar
Follow me
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x