目前 selenium还不支持增加referer，不知道为什么?

如果有类似需求，大家是怎么做的呢?

让客户端看到我们的referer

主要思路是让selnium 请求通过代理转发，然后在代理中添加referer,代理服务器使用mitmproxy

具体实现:

1) selenium 先访问referer，根据请求路径进行标识

2) 代理接收到请求，读取标志，若存在，则直接返回，否则直接发送请求
使用mitmproxy 作为代理服务器, 需要python2.7

yum -y groupinstall "Development tools"
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel
cd ~
wget https://www.python.org/ftp/python/2.7.9/Python-2.7.9.tgz
tar zxvf Python-2.7.9.tgz 
cd Python-2.7.9
./configure --prefix=/usr/local
make && make altinstall
mv /usr/bin/python /usr/bin/python2.6.6.old
ln -s /usr/local/bin/python2.7 /usr/bin/python
vi /usr/bin/yum
将#!/usr/bin/python改为#!/usr/bin/python2.6，因为yum需要python2.6
wget --no-check-certificate https://bootstrap.pypa.io/ez_setup.py
python2.7 ez_setup.py
easy_install-2.7 pip
#注意后面使用pip2.7而不是pip。
pip2.7 install netlib pyopenssl pyasn1 urwid lxml flask
pip2.7 install pil --allow-external PIL --allow-unverified PIL
pip2.7 install pyamf protobuf
pip2.7 install nose pathod countershape
pip2.7 install mitmproxy

 安装后发现 python2.7还是无法使用，需要安装python3.X,

wget https://www.python.org/ftp/python/3.6.0/Python-3.6.0a1.tar.xz
tar xvf  Python-3.6.0a1.tar.xz
./configure 
make && make install
wget --no-check-certificate https://bootstrap.pypa.io/ez_setup.py
python3 ez_setup.py
easy_install-* pip
pip3 install netlib pyopenssl pyasn1 urwid lxml flask
pip3 install pil --allow-external PIL --allow-unverified PIL
pip3 install pyamf protobuf
pip3 install nose pathod countershape
pip3 install mitmproxy

配置上层代理

 mitmproxy -b 192.168.109.135 -p 443 -U http://192.168.109.130:8080

动态修改上层代理方法 https://github.com/mitmproxy/mitmproxy/blob/master/examples/complex/change_upstream_proxy.py

自定义返回内容:

https://github.com/mitmproxy/mitmproxy/blob/master/examples/simple/send_reply_from_proxy.py

比如检测到cookie中包含特定字符串,代码如下 change.py，即返回:

from mitmproxy import http


def request(flow: http.HTTPFlow) -> None:
    # pretty_url takes the "Host" header of the request into account, which
    # is useful in transparent mode where we usually only have the IP otherwise.
    if 'cookie' in flow.request.headers:
        print(flow.request.headers['cookie'])
        if(flow.request.headers['cookie'].find("key=website80") > -1):
            print(flow.request.headers['cookie'])
    #if flow.request.pretty_url == "http://example.com/path":
            flow.response = http.HTTPResponse.make(
            200,  # (optional) status code
            b"<html><body><website80></website80></body></html>",  # (optional) content
            {"Content-Type": "text/html"}  # (optional) headers
        )
    else:
       print('saaaaaaaaaaaaaaaaaaaaa')

运行上述脚本 mitmdump -s change.py

综上所述，实现功能需要两个脚本，第一个mitmdump脚本，根据请求路径中包含 redirect.html表示不需要继续请求，直接返回，

# This scripts demonstrates how mitmproxy can switch to a second/different upstream proxy
# in upstream proxy mode.
#
# Usage: mitmdump -U http://default-upstream-proxy.local:8080/ -s change_upstream_proxy.py
#
# If you want to change the target server, you should modify flow.request.host and flow.request.port
from mitmproxy import http

def proxy_address(flow):
    # Poor man's loadbalancing: route every second domain through the alternative proxy.
    if hash(flow.request.host) % 2 == 1:
        return ("192.168.109.130", 8080)
    else:
        return ("192.168.109.130", 8080)


def request(flow):
    if flow.request.pretty_url.find("redirect.html") > -1:
            flow.response = http.HTTPResponse.make(
            200,  
            b"<html><body><website80></website80></body></html>",
            {"Content-Type": "text/html"}
        )
    else:
        print("go to now")
    if flow.request.method == "CONNECT":
        # If the decision is done by domain, one could also modify the server address here.
        # We do it after CONNECT here to have the request data available as well.
        return
    address = proxy_address(flow)
    if flow.live:
        flow.live.change_upstream_proxy_server(address)

selenium代码:

# -*- coding: utf-8 -*-
import json
import random
import sys
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import os
from datetime import datetime

__author__ = 'Administrator'
'''
需求: 有一个可以控制的任意url跳转页面 website.com/redirect.html?host={host}
      一个挂站长统计的域名列表
      枚举域名列表 分别访问 website.com/redirect.html?host=domain.com,则跳转到domain.com
      redirect.html代码如下:
功能: 刷带referer的站长统计
实现: 1. 本地实现一个webserver
      2. 获得所有referer 的host 修改成本地 http://www.xker.com/page/e2015/05/188691.html
      3. 读取一行为一个域名,打开对应referer
      4. 页面返回后，插入跳转的 jsavscript代码 并执行

      做一个静态页面，使用js创建一个a标签 id=redirecta
      <a href="http://www.website80.com/Index" target="_blank">go</a>
   <script type="text/javascript">
    var a = document.createElement("a");
    var node = document.createTextNode("link");
    a.appendChild(node);
    a.setAttribute("href","http://www.website.com/Index");
    //a.setAttribute("target","_blank"); //不打开新标签，否则关闭不上
        a.setAttribute("id","redirecta");
    document.body.appendChild(a);
    </script>



cookie
http://www.cnblogs.com/fnng/p/3269450.html
'''


# use the default firefox to do this,, so I need not do login operation. In the different os, you must change the path

#newFirefox = webdriver.FirefoxProfile(r'C:\Users\Administrator\AppData\Roaming\Mozilla\Firefox\Profiles\wed8a4gm.default')
newFirefox = webdriver.FirefoxProfile()
newFirefox.set_preference("network.proxy.type", 1)
newFirefox.set_preference("network.proxy.http",'192.168.109.135')
newFirefox.set_preference("network.proxy.http_port",int(8080))
#newFirefox.set_preference("general.useragent.override","whater_useragent")
newFirefox.update_preferences()

#firefox_profile=newFirefox
browser = webdriver.Firefox(firefox_profile=newFirefox)
key = 'website'
fd = open(r'51.la.txt')
count = 0
c = {}
#c['domain'] = 'website80.com'
c['name'] = 'key'
c['value'] = key
c['path'] = '/'

print datetime.now()
for line in fd:
    line = line.strip()
    if line:
        count = count + 1
        url = 'http://website.com/redirect.html?host=' + line

        try:
            if url.find("website.com") > -1:
                browser.delete_all_cookies()
                #browser.add_cookie(c)
                #print browser.get_cookies()
            browser.get(url)
            bfind = True 

            sucess = WebDriverWait(browser,200).until(lambda  browser:  browser.find_element_by_tag_name("html"));
            if sucess:

                addjs = browser.find_element_by_tag_name(key)
                if addjs:
                    destUrl = 'http://%s?from=www.website.com' % line

                    js = 'var a = document.createElement("a");'
                    js = js + 'var node = document.createTextNode("link");'
                    js = js + 'a.appendChild(node);'
                    js = js + 'a.setAttribute("id","redirecta");'
                    js = js + 'a.setAttribute("href","%s");' % destUrl
                    #js = js + 'a.setAttribute("target","_blank");'
                    js = js + 'document.body.appendChild(a);'
                    #print js
                    browser.execute_script(js)
                    a = browser.find_element_by_id("redirecta")


                    if a:
                        if(a.get_attribute("href") !="www.website.com"):
                            print a.get_attribute("href")
                            #pass
                        a.click()
                    else:
                        links = browser.find_elements_by_tag_name("a")
                        for link in links:
                            print link.get_attribute("href")
                    if bfind:
                        time.sleep(10)
                        #close the other windows by switch the windows
                        nowhandle = browser.current_window_handle
                        allhandles = browser.window_handles
                        if len(allhandles) > 1:
                            for handle in allhandles:
                                if handle != nowhandle:
                                    browser.switch_to.window(handle)
                                    try:
                                        browser.close()
                                    except Exception,e:
                                        print e
                            browser.switch_to.window(nowhandle)

                    try:
                        if EC.alert_is_present:
                            alert = browser.switch_to.alert() # switch_to_alert()
                            alert.accept()
                    except Exception,e:
                        pass

        except Exception,e:
            print str(e)
        if count % 1000 == 0:
            print count
            print datetime.now()
browser.close()