需求場(chǎng)景:
公司某個(gè)站點(diǎn)刪除大量稿件,但是這些稿件已經(jīng)被百度收錄,這樣用戶訪問(wèn)將會(huì)出現(xiàn)404,用戶體驗(yàn)不太好,所以需要將刪除的稿件生成為xml格式文件,并且每個(gè)文件為5000條數(shù)據(jù),然后提交至百度進(jìn)行收錄刪除。
普通文件:
https://www.abc.com/html/ys/13003183/20191115/123456.html
https://www.abc.com/html/ys/13003183/20191115/123765.html
https://www.abc.com/html/ys/13003183/20191115/567567.html
https://www.abc.com/html/ys/13003183/20191115/456456.html
https://www.abc.com/html/ys/13003183/20191115/374456.html
https://www.abc.com/html/ys/13003183/20191115/37456645.html
xml格式文件:
<urlset>
<url> <loc> https://www.abc.com/html/ys/13003183/20191115/37404973.html </loc> </url>
<url> <loc> https://www.abc.com/html/jb/13003184/20191115/37404988.html </loc> </url>
<url> <loc> https://www.abc.com/html/jb/13003184/20191115/37404968.html </loc> </url>
<url> <loc> https://www.abc.com/ylaq/13003182/20191115/37404860.html </loc> </url>
<url> <loc> https://www.abc.com/ylaq/13003182/20191115/37404861.html </loc> </url>
</urlset>
腳本信息:
cat xml.sh #!/bin/bash # sed -i 's/^/<url> <loc> /g' $1 sed -i 's/$/ </loc> </url>/g' $1 name=`echo $1 | awk -F"." '{print $1}'` echo $name split -l 5000 $1 ${name}_xml for filename in `find ./ -name "${name}_xml*"` do sed -i '1 i\<urlset>' $filename echo "</urlset>" >> $filename mv $filename ${filename}.xml done
執(zhí)行:
sh xml.sh 文件名稱
腳本講解:
- 腳本使用sed對(duì)行首和行尾添加字段;
- 定義變量去掉文件后綴名稱;
- 使用split對(duì)文件進(jìn)行分割;
- 使用for循環(huán)對(duì)分割后的文件進(jìn)行添加xml首部和尾部字段,然后進(jìn)行重命名;