4 votes

Comment joindre 3 fichiers en utilisant awk ?

Ci-dessous, trois fichiers -

emp.txt
7839|BLAKE|PRESIDENT||17-NOV-81|5000||10
7698|KING|MANAGER|7839|01-MAY-81|2850||10
7782|CLARK|MANAGER|7839|09-JUN-81|2450||10
7566|JONES|MANAGER|7839|02-APR-81|2975||40
7788|SCOTT|ANALYST|7566|19-APR-87|3000||50
7902|FORD|ANALYST|7566|03-DEC-81|3000||20

dept.txt
10|ACCOUNTING|NEW YORK
20|RESEARCH|DALLAS
30|SALES|CHICAGO
40|OPERATIONS|BOSTON

citycode.txt
1123|NEW YORK
1124|DALLAS
1125|CHICAGO
1126|BOSTON
1127|WASHINGTON

résultat attendu : $0(from emp.txt where $6 >= 2850)+$2(from dept.txt)+$1(from citycode.txt)

7839|BLAKE|PRESIDENT||17-NOV-81|5000||10|ACCOUNTING|1123
7698|KING|MANAGER|7839|01-MAY-81|2850||10|ACCOUNTING|1123
7566|JONES|MANAGER|7839|02-APR-81|2975||40|OPERATIONS|1126
7788|SCOTT|ANALYST|7566|19-APR-87|3000||50|NULL|NULL
7902|FORD|ANALYST|7566|03-DEC-81|3000||20|RESEARCH|1124

Explication - joindre ces fichiers en utilisant la clé commune de emp.txt($NF : outer join on emp.txt--record no 50 doesn't have common that is why it will print NULL) and dept.txt($1) où emp($6 >= 2850) et utiliser la colonne commune de dept.txt($NF) et citycode.txt($1). joindre deux fichiers est une tâche facile en utilisant la clé commune, mais comment joindre trois fichiers ensemble.

4voto

jas Points 7141

Utiliser la fonction intégrée FILENAME pour construire les tableaux associatifs appropriés lorsque les codes de la ville et du département sont traités.

Ensuite, lors du traitement des salariés, les données souhaitées sont éditées :

BEGIN {FS=OFS="|"}
function chknull(s) { return s ? s : "NULL"}
FILENAME == "citycode.txt" {citycode[$2]=$1}
FILENAME == "dept.txt" {depname[$1]=$2; depcity[$1]=$3}
FILENAME == "emp.txt" && $6 >= 2850 {
    print $0, chknull(depname[$NF]), chknull(citycode[depcity[$NF]])
}

$ awk -f j3.awk citycode.txt dept.txt emp.txt 
7839|BLAKE|PRESIDENT||17-NOV-81|5000||10|ACCOUNTING|1123
7698|KING|MANAGER|7839|01-MAY-81|2850||10|ACCOUNTING|1123
7566|JONES|MANAGER|7839|02-APR-81|2975||40|OPERATIONS|1126
7788|SCOTT|ANALYST|7566|19-APR-87|3000||50|NULL|NULL
7902|FORD|ANALYST|7566|03-DEC-81|3000||20|RESEARCH|1124

(Notez que j'ai également ajouté le chknull pour imprimer "NULL" au lieu de la chaîne vide afin de correspondre à la sortie souhaitée dans la question, mais ce n'est pas important pour le cœur de la question qui est de savoir comment faire des jointures avec plus de deux fichiers).

3voto

Akshay Hegde Points 12930
awk 'BEGIN{ FS=OFS="|" }
     FNR==1{ f++ }
     f==1{ d[$1]=$2; c[$NF]=$1; next }
     f==2 && $NF in c{ c[c[$NF]]=$1; next }
     $6 >=2850{
       print $0, ($NF in d ? d[$NF] : "NULL" ),( $NF in c ? c[$NF] : "NULL")
   }' dept.txt citycode.txt emp.txt

Explication

awk '                               # Call awk
      BEGIN{          
              FS=OFS="|"            # Set input and output separator
      }
      FNR==1{                       # FNR will be 1 for each file when awk reads first line
            f++                     # File counter
      }

      # use can also do FILENAME == "dept.txt" here but f==1 is easy

      f==1{                         # for first file dept.txt
            d[$1]=$2;               # set array d where index is $1 and value is $2 of file dept.txt 
            c[$NF]=$1;              # set array c where index is last field and value is first field of file dept.txt
            next                    # stop processing go to next line
      }

      # when f equal 2 that is when we read second file citycode.txt 
      # and last field of current file exists in array c
      # there is no point in reading extra line exists in citycode.txt
      # we just need whichever city exists in dept.txt
      # so $NF in c

      f==2 && $NF in c{             

           # modify array c by empid 
           # that is index will be the value of array c 
           # corresponding to last field of current file and 
           # array c value will be citycode
           # Its because to make it easy to access array c value by empid while
           # reading emp.txt file

            c[c[$NF]]=$1;           

           # gawk user can delete array element here like below
           # if you have gawk uncomment below line
           # delete c[$NF]

            next                   # stop processing go to next line
      }

      $6 >=2850{                   # here we read emp.txt if 6th field is greater than or equal to 2850

            # Print current record/row/line from emp.txt

            # if last field of current file that is 
            # empid exists in array d then print department else NULL

            # if last field of current file that is
            # empid exists in array c then print citycode else NULL

            print $0,($NF in d?d[$NF]:"NULL"),($NF in c?c[$NF]:"NULL")
      }
   ' dept.txt citycode.txt emp.txt

Entrée

$ cat emp.txt 
7839|BLAKE|PRESIDENT||17-NOV-81|5000||10
7698|KING|MANAGER|7839|01-MAY-81|2850||10
7782|CLARK|MANAGER|7839|09-JUN-81|2450||10
7566|JONES|MANAGER|7839|02-APR-81|2975||40
7788|SCOTT|ANALYST|7566|19-APR-87|3000||50
7902|FORD|ANALYST|7566|03-DEC-81|3000||20

$ cat dept.txt 
10|ACCOUNTING|NEW YORK
20|RESEARCH|DALLAS
30|SALES|CHICAGO
40|OPERATIONS|BOSTON

$ cat citycode.txt 
1123|NEW YORK
1124|DALLAS
1125|CHICAGO
1126|BOSTON
1127|WASHINGTON

Sortie

$ awk 'BEGIN{FS=OFS="|"}FNR==1{f++}f==1{d[$1]=$2;c[$NF]=$1;next}f==2 && $NF in c{c[c[$NF]]=$1;next}$6 >=2850{print $0,($NF in d?d[$NF]:"NULL"),($NF in c?c[$NF]:"NULL")}' dept.txt citycode.txt emp.txt
7839|BLAKE|PRESIDENT||17-NOV-81|5000||10|ACCOUNTING|1123
7698|KING|MANAGER|7839|01-MAY-81|2850||10|ACCOUNTING|1123
7566|JONES|MANAGER|7839|02-APR-81|2975||40|OPERATIONS|1126
7788|SCOTT|ANALYST|7566|19-APR-87|3000||50|NULL|NULL
7902|FORD|ANALYST|7566|03-DEC-81|3000||20|RESEARCH|1124

1voto

Olivier Dulac Points 2203

C'est simple en awk :

  • lisez les 2 fichiers que vous devez ajouter à la fin de la ligne (dept.txt et citycode.txt)
  • et les ajouter lors de l'analyse du fichier principal (emp.txt)

le code est le suivant :

awk -F'|'  -v dptfile="dept.txt" -v citycodefile="citycode.txt" -v from="2850" '
   BEGIN { OFS=FS; 
           rem="build 2 arrays, dpt[] associates number with department,";
           rem="and dptcity[] associate same number with department city";
           while ((getline line<dptfile) > 0) {
              split(line,a,OFS);dpt[a[1]]=a[2]; dptcity[a[1]]=a[3]
           }
           close(dptfile)
           rem="build 3rd array, city[cityname] associates a city name to its number";
          while ((getline line<citycodefile)>0) {
              split(line,a,OFS); city[a[2]]=a[1] ; 
           }
           close(citycodefile); 
         }

  ( $6>=from ) { print $0 OFS ($8 in dpt? dpt[$8]:"NULL") OFS (dptcity[$8] in city? city[dptcity[$8]]:"NULL") ;}
  ' emp.txt

Compte tenu des données que vous avez citées (et en empruntant la bonne présentation de @akshay-hegde) :

Entrée

$ cat emp.txt
7839|BLAKE|PRESIDENT||17-NOV-81|5000||10
7698|KING|MANAGER|7839|01-MAY-81|2850||10
7782|CLARK|MANAGER|7839|09-JUN-81|2450||10
7566|JONES|MANAGER|7839|02-APR-81|2975||40
7788|SCOTT|ANALYST|7566|19-APR-87|3000||50
7902|FORD|ANALYST|7566|03-DEC-81|3000||20

$ cat dept.txt 
10|ACCOUNTING|NEW YORK
20|RESEARCH|DALLAS
30|SALES|CHICAGO
40|OPERATIONS|BOSTON

$ cat citycode.txt
1123|NEW YORK
1124|DALLAS
1125|CHICAGO
1126|BOSTON
1127|WASHINGTON

Sortie

$ awk -F'|' -v dptfile="dept.txt" -v citycodefile="citycode.txt" -v from="2850" 'BEGIN { OFS=FS; rem="build 2 arrays, dpt[] associates number with department,";rem="and dptcity[] associate same number with department city"; while ((getline line<dptfile) > 0) { split(line,a,OFS);dpt[a[1]]=a[2]; dptcity[a[1]]=a[3];} ; close(dptfile) ; rem="build 3rd array, city[cityname] associates a city name to its number"; while ((getline line<citycodefile)>0) { split(line,a,OFS); city[a[2]]=a[1] ; }; close(citycodefile); } ( $6>=from ) { print $0 OFS ($8 in dpt? dpt[$8]:"NULL") OFS (dptcity[$8] in city? city[dptcity[$8]]:"NULL") ;}' emp.txt
7839|BLAKE|PRESIDENT||17-NOV-81|5000||10|ACCOUNTING|1123
7698|KING|MANAGER|7839|01-MAY-81|2850||10|ACCOUNTING|1123
7566|JONES|MANAGER|7839|02-APR-81|2975||40|OPERATIONS|1126
7788|SCOTT|ANALYST|7566|19-APR-87|3000||50|NULL|NULL
7902|FORD|ANALYST|7566|03-DEC-81|3000||20|RESEARCH|1124

Prograide.com

Prograide est une communauté de développeurs qui cherche à élargir la connaissance de la programmation au-delà de l'anglais.
Pour cela nous avons les plus grands doutes résolus en français et vous pouvez aussi poser vos propres questions ou résoudre celles des autres.

Powered by:

X