前面我们提到了Python 爬虫 抓取豆瓣小组图片 通过api提交入库到 Chevereto 图床,后来闲着无聊又使用Golang写了一个脚本,用来抓取豆瓣小组的图片。
Chevereto free版本 使用api 上传图片 图文教程
图床地址:http://788to.com
使用之前大家先配置一下Golang的环境,然后安装一下必要的包:
go get github.com/PuerkitoBio/goquery
脚本运行时可以使用两个参数:
-u 小组的url地址,例如:https://www.douban.com/group/meituikong/discussion?start=
-e 最后一些的start=的值
-k?Chevereto密匙
完整的运行示例:
go run get-douban-image.go -u=”https://www.douban.com/group/265201/discussion?start=” -e=”700″ -k=”laoji.org”
git 地址:https://github.com/qsbaq/doubanImage
源码如下,以下代码仅作演示,以git地址代码为准:
package main import ( "encoding/json" "flag" "fmt" "io/ioutil" "log" "net/http" "net/url" "regexp" "strconv" "sync" "time" "github.com/PuerkitoBio/goquery" ) func GetUrl(url string) []byte { ret, err := http.Get(url) if err != nil { log.Println(url) } body := ret.Body data, _ := ioutil.ReadAll(body) return data } func getImage(image_url string, k string) { data := GetUrl(image_url) body := string(data) part := regexp.MustCompile("https://(.*).doubanio.com/view/group_topic/large/public/(.*).jpg") match := part.FindAllString(body, -1) for _, value := range match { submit_url := "http://788to.com/api/1/upload/?key=" + k + "&source=" + url.QueryEscape(value) fmt.Println(submit_url) return_json := GetUrl(submit_url) res := make(map[string]interface{}) json.Unmarshal(return_json, &res) log.Printf("%s -> %v \n", value, res["status_code"]) } } func getGroupList(target_url string, k string) { fmt.Printf("Begin Url : %s\n", target_url) doc, err := goquery.NewDocument(target_url) if err != nil { panic(err) log.Fatal(err) } // Find the review items doc.Find("td.title a").Each(func(i int, s *goquery.Selection) { // For each item found, get the band and title href, IsExist := s.Attr("href") if IsExist { getImage(href, k) } }) wg.Done() } var wg sync.WaitGroup func main() { k := flag.String("k", "laoji.org", "Chevereto Key") endStartInt := flag.Int("e", 100, "End Start Int Value") defaultUrl := flag.String("u", "https://www.douban.com/group/meituikong/discussion?start=", "Group Url") flag.Parse() for i := 0; i < *endStartInt; i = i + 25 { wg.Add(1) go getGroupList(*defaultUrl+strconv.Itoa(i), *k) time.Sleep(3e9) } wg.Wait() }
运行结果:
2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p615
41380.jpg -> 200
2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p447
24331.jpg -> 200
2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p655
69545.jpg -> 200
2017/02/10 08:18:10 https://img1.doubanio.com/view/group_topic/large/public/p447
24327.jpg -> 200
Begin Url : https://www.douban.com/group/265201/discussion?start=500
2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p470
29205.jpg -> 200
2017/02/10 08:18:10 https://img5.doubanio.com/view/group_topic/large/public/p336
82186.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p649
79344.jpg -> 200
2017/02/10 08:18:11 https://img5.doubanio.com/view/group_topic/large/public/p470
29206.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p649
79345.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p487
17685.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p507
72901.jpg -> 200
2017/02/10 08:18:11 https://img1.doubanio.com/view/group_topic/large/public/p452
23799.jpg -> 200
2017/02/10 08:18:11 https://img1.doubanio.com/view/group_topic/large/public/p477
58309.jpg -> 200