如何抓取網頁標題 - 歡迎來到3WA問題解決專家工作室

訓練家的佈弱格-Patch1.2

The BLOG of trainer

編輯日期：2009-07-31 10:16

類型：程式設計
作者：羽山
文章時間：2009-07-31 10:16:24
瀏覽人數：6278人
標題：如何抓取網頁標題
網址：http://59-126-75-42.hinet-ip.hinet.net/blog/blog.php?id=767
內容：

日前在 irc.giga.net.tw #linuxtw 版聊天

發現了個機器人還滿有趣的

你只要貼網址，他就會上幫你 parser 出網址的 <title>

像我的 59-126-75-42.hinet-ip.hinet.net 就是歡迎來訓練家的工作室

於是，友人 Rickz 就開始搞 shell script，他也寫了個 parser 的功能，用什麼 curl 的指令抓，再抽離

我就用 php 實作這個功能~

time php -r "\$array=explode('</title',file_get_contents(\$argv[1])); \$array=explode('<title>',\$array[0]);echo \$array[1];" "http://59-126-75-42.hinet-ip.hinet.net"

歡迎來到3WA問題解決專家工作室
real 0m0.996s
user 0m0.022s
sys 0m0.006s

實際時間為 user+sys

我的作法還滿簡單的，只是利用 file_get_contents 去抓網址而以

然後…又寫了 python 的寫法

time python -c "print __import__('urllib2').build_opener().open(__import__('sys').argv[1]).read().split('<title>')[1].split('</title>')[0]" "http://59-126-75-42.hinet-ip.hinet.net"

歡迎來到3WA問題解決專家工作室
real 0m0.916s
user 0m0.053s
sys 0m0.013s

似乎不難發現，哪個快，哪個慢了~

版上有大大補充了更快的方法，也讓小弟再學一招!

time php -r '$str=file_get_contents($argv[1]);if(preg_match("/<title>(.*?)<\/title>/msi", $str, $m ) ) {echo $m[1] ; }' "//59-126-75-42.hinet-ip.hinet.net"
歡迎來到3WA問題解決專家工作室
real    0m1.008s
user    0m0.023s
sys    0m0.008s

首頁上十頁上一頁 1 下一頁最末頁 (總共有...1頁)

第 1 頁

有話要說　看留言【2】