`
tedeyang
  • 浏览: 316654 次
  • 性别: Icon_minigender_1
  • 来自: 上海
社区版块
存档分类
最新评论

初学ruby,写个域名查询的小程序练练手

阅读更多

前端时间公司出钱买书,我定了本<Programming Ruby>,拿回家了,每天看一点点,半个月的时间也有了点收获.

下面星期天在家里写的一个小程序,起源于我想买个.com域名,心血来潮,想看看还有哪些拼音短域名没被注册过,说不定可以捡个漏呢,哈哈.

 

程序的关键之处在于词库和拼音,还好有人做好了:http://open-phrase.googlecode.com/files/phrase_pinyin_freq_sc_20090402.txt.bz2

这也是ibus所用的词库.

词库中含有中文,拼音以及词频,非常合用.

运行环境:linux,unix,freebsd,macos

 

#!/usr/bin/ruby
# author tedeyang 
# 2010-8-2
#
# A script for running whois command to detect which pinyin can be used for ".com" domain (like:zhongguo.com),the phrases are parsed from ibus phrase lib.  
#
class ReadPhrase
  attr_reader :all
	def initialize(file,lengthRange)
	  @file = file
	  @range = lengthRange
	  @count = 0
	  @runcount = 0
	  @askcount = 0
	  @all = Array.new()
	  @work = File.new("work-#{@range.}.log","w")
	  @get = File.new("domain-#{@range}.log","w")
	  @nowPid = 0
	  @ask = ""
	end
	def initFromFile
	  phraseFile = File.new(@file,'r')
	  keystore = Hash.new
	  phraseFile.each{|line|
	    phrase,pinyin,freq = line.split
	    two = (pinyin =~ /^\w+('\w+){0,2}$/) != nil
	    pinyin.gsub!("'","")
	    if pinyin.length <=8 && two
        # p "#{pinyin}   #{phrase} (词频:#{freq})"
        previousPinyin = @all[keystore[pinyin]] if keystore.has_key?(pinyin)
	      if previousPinyin == nil
	        @all << [pinyin,freq.to_i,phrase]
	        keystore[pinyin] = @count
	        @count = @count + 1
        else
          @all[keystore[pinyin]] = [pinyin,freq.to_i+previousPinyin[1],previousPinyin[2]+","+phrase]
        end
      end
	  }
	  puts "read done!"
	  puts "there are #{@count} phrases in file"
	  @all.sort!{|p1,p2|
	    p1[0].length - p2[0].length
	  }
	  puts "sort done by phrase 's length!"
  end
  
  def unregisted?(phrasePy)
    @askcount += 1
    domain = phrasePy + ".com"
    @work.puts "#{@askcount}-----run whois #{domain}---------------------------begin with #{Time.new}"
    @ask = "whois #{domain}"
    whois = IO.popen(@ask,"r") do |pipe|
      @nowPid = pipe.pid
      result = pipe.read
      @nowPid = 0
      good = (/\s*(No match for).+/ =~ result) != nil
      @work.puts "#{@askcount}-----run whois #{domain}------------#{good}------------end with #{Time.new}"
      @work.flush
      good
    end
    return whois
  end
  
  #start a thread for timeout check
  def startTimeoutKiller
    th = Thread.start {
      while 1
        pid1 = @nowPid.to_i
        sleep(3) #wait for n seconds
        pid2 = @nowPid.to_i
        if pid1==pid2 && pid1>0
          Process.kill 'TERM',pid1 
          p "kill a timeout whois , pid=#{pid1},is '#{@ask}'"
        end
      end
    }
  end
  
  def tryall
    @get.puts "begin "+ Time.new().to_s
    @work.puts "begin "+ Time.new().to_s
    
    @all.each{|p|
     
      if @range===p[0].length && self.unregisted?(p[0])
        puts "Now get domain : "+p.join(' ')
        @get.puts "#{p[0]} #{p[2]} #{p[1]} order[#{@askcount}]"
        @get.flush
        @runcount += 1
      end
      # sleep(3) if @askcount % 10 ==9
      # break if @runcount >1
    }
    puts "done with #{@runcount}!(#{Time.new()}),see run.log"
  end
end

parser = ReadPhrase.new("phrase_pinyin_freq_sc_20090402.txt",6..6)
parser.initFromFile
parser.startTimeoutKiller
# p parser.unregisted?('qidian')
parser.tryall

 代码写得散乱,textmate用得不顺手,缩手缩脚,一点没有用eclipse写java那般行云流水的手感,没办法,写代码也是个经验活,和开车一样,写着写着就有感觉了嘛.

代码没用多线程,只开了个线程检查超时,因为怕whois命令太多导致被封IP,呵呵,机器开了一整夜才跑完1万多个5字母拼音域名.

还有3,4百个汉语拼音没被注册的,有价值的也很少了.

譬如"xieca.com",xieca与"鞋擦"同音,也许对浙江的小商品市场还有那么一丁点商业价值,呵呵,6字母的就太多了,不过双音节的不多.

 

以后还可以写个多线程版本,从godaddy,net.cn查询,效率应该可以提高不少.

 

 

 

2
0
分享到:
评论
2 楼 tedeyang 2010-09-25  
fireflyman 写道
代碼有誤 -->第16行里面的
 @work = File.new("work-#{@range.}.log","w")

@range.應該是沒有點的

-true --->
 @work = File.new("work-#{@range}.log","w")

谢谢这么仔细。
1 楼 fireflyman 2010-08-27  
代碼有誤 -->第16行里面的
 @work = File.new("work-#{@range.}.log","w")

@range.應該是沒有點的

-true --->
 @work = File.new("work-#{@range}.log","w")

相关推荐

Global site tag (gtag.js) - Google Analytics