{"id":2397,"date":"2017-12-08T09:22:58","date_gmt":"2017-12-08T09:22:58","guid":{"rendered":"http:\/\/www.scmgalaxy.com\/tutorials\/?p=2397"},"modified":"2020-01-09T10:00:18","modified_gmt":"2020-01-09T10:00:18","slug":"shell-script-merge-two-list-and-remove-duplicates","status":"publish","type":"post","link":"https:\/\/www.devopsschool.com\/blog\/shell-script-merge-two-list-and-remove-duplicates\/","title":{"rendered":"Shell script merge two list and remove duplicates"},"content":{"rendered":"<p><strong>rajeshkumar created the topic: shell script merge two list and remove duplicates<\/strong><\/p>\n<p>You want all the records from list_A supplemented by all the records from list_B for which there is not already a matching name in list A. Mathematically this is:<\/p>\n<p><code>A + B - {w in B | (w,value) in A }<\/code><\/p>\n<p>\\<\/p>\n<p>There are many ways of accomplishing this, depending on access and needed efficiencies.<\/p>\n<p>* If you can modify DB1 (with A), then download table B from DB2, upload it to DB1, then extract your data with the appropriate join<br \/>\n* If you can&#8217;t modify DB1, then download both A and B and concatenate them to the same stream, with A followed by B. Then sort by the first field. Then process the stream one record at time. Duplicate names will be side-by-side. If the same name appears more than one time, print the first and ignore subsequent records with the same name.<\/p>\n<p>Here is a sample solution to your problem (starting with two lists of names\/values)<\/p>\n<p><code>#!\/bin\/bash<\/p>\n<p>A=\"Smith value1<br \/>\nJones value2<br \/>\nWilson value3\"<\/p>\n<p>B=\"Smith value10<br \/>\nWilson value11<br \/>\nFox value12<br \/>\nBrown value13\"<\/p>\n<p>PrevName=\"Not a valid name\"<br \/>\necho \"$A<br \/>\n$B\" | sort -k1  |<br \/>\nwhile read Name Value<br \/>\ndo<br \/>\n   if [ \"$Name\" != \"$PrevName\" ]; then<br \/>\n      echo $Name $Value<br \/>\n   fi<br \/>\n   PrevName=\"$Name\"<br \/>\ndone > outfile<\/code><\/p>\n<p>You want all the records from list_A supplemented by all the records from list_B for which there is not already a matching name in list A. Mathematically this is:<\/p>\n<p>A + B &#8211; {w in B | (w,value) in A }<\/p>\n<p>There are many ways of accomplishing this, depending on access and needed efficiencies.<\/p>\n<p>* If you can modify DB1 (with A), then download table B from DB2, upload it to DB1, then extract your data with the appropriate join<br \/>\n* If you can&#8217;t modify DB1, then download both A and B and concatenate them to the same stream, with A followed by B. Then sort by the first field. Then process the stream one record at time. Duplicate names will be side-by-side. If the same name appears more than one time, print the first and ignore subsequent records with the same name.<\/p>\n<p>Here is a sample solution to your problem (starting with two lists of names\/values):<\/p>\n<p>#!\/bin\/bash<\/p>\n<p>A=&#8221;Smith value1<br \/>\nJones value2<br \/>\nWilson value3&#8243;<\/p>\n<p>B=&#8221;Smith value10<br \/>\nWilson value11<br \/>\nFox value12<br \/>\nBrown value13&#8243;<\/p>\n<p>PrevName=&#8221;Not a valid name&#8221;<br \/>\necho &#8220;$A<br \/>\n$B&#8221; | sort -k1 |<br \/>\nwhile read Name Value<br \/>\ndo<br \/>\nif [ &#8220;$Name&#8221; != &#8220;$PrevName&#8221; ]; then<br \/>\necho $Name $Value<br \/>\nfi<br \/>\nPrevName=&#8221;$Name&#8221;<br \/>\ndone > outfile<\/p>\n<p>Here is the output:<br \/>\nBrown value13<br \/>\nFox value12<br \/>\nJones value2<br \/>\nSmith value1<br \/>\nWilson value11<br \/>\nRegards,<br \/>\nRajesh Kumar<br \/>\nTwitt me @<a href=\"http:\/\/twitter.com\/RajeshKumarIn\" target=\"_blank\" rel=\"noopener\"> twitter.com\/RajeshKumarIn<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>rajeshkumar created the topic: shell script merge two list and remove duplicates You want all the records from list_A supplemented by all the records from list_B for which there is not already a matching name in list A. Mathematically this is: A + B &#8211; {w in B | (w,value) in A } \\ There&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","_joinchat":[],"footnotes":""},"categories":[454],"tags":[138],"class_list":["post-2397","post","type-post","status-publish","format-standard","hentry","category-shell-script","tag-shell"],"_links":{"self":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/2397","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/comments?post=2397"}],"version-history":[{"count":1,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/2397\/revisions"}],"predecessor-version":[{"id":2398,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/posts\/2397\/revisions\/2398"}],"wp:attachment":[{"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/media?parent=2397"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/categories?post=2397"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devopsschool.com\/blog\/wp-json\/wp\/v2\/tags?post=2397"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}