码迷,mamicode.com
首页 > 编程语言 > 详细

社交网络分析与挖掘 第六课:网页排序

时间:2019-10-08 09:23:38      阅读:211      评论:0      收藏:0      [点我收藏+]

标签:ges   map   strong   display   ext   hyper   link   mos   span   

The History of PageRank

  • PageRank is a link analysis algorithm which assigns a numerical weighting to each Web page,with the purpose of "measuring"relative importance.
  • Based on the hyperlinks map
  • An excellent way to prioritize the results of web keyword searches
  • PageRank was proposed by Larry Page(hencethe name Page-Rank)and Sergey Brin in 1998.
  • Shortly after,Page and Brin founded Google.

PageRank:the intuitive idea

技术图片

  • Although page E hasmore"point to"links than page C, page C is more important than page E.
  • One reason is that page C is linked by page B whose is the most important page in this network.

A Simple Version of PageRank

\[ R(u)=c \sum_{v \in B_{u}} \frac{R(v)}{N_{v}} \]

  • u:a web page
  • B(u):the set of u‘s backlinks(point-to-links)
  • N(v):the number of forward links of page v
  • c:the normalization factor to make
    \[ \|\mathrm{R}\|_{\mathrm{L} 1}=\left|\mathrm{R}_{1}+\ldots+\mathrm{R}_{\mathrm{n}}\right| \]

An example of Simplified PageRank

技术图片

A Problem with Simplified PageRank

A loop:
技术图片
During each iteration,the loop accumulates rank but never distributes rank to other pages!

Fix the problem:what PageRank does

  • Randomly walking and also randomly jumping
    • the“random surfer”simply keeps clicking successive links at random,but periodically “gets bored”and jumps to a random page based on the distribution
      技术图片

The actual PageRank

\[ R^{\prime}(u)=c_{1} \sum_{v \in B_{u}} \frac{R^{\prime}(v)}{N_{v}}+\mathrm{c}_{2} E(u) \]

  • E(u):a distribution of ranks of web pages that "users"jump to when they"gets bored"after successive links at random.

An example of PageRank

技术图片

Convergence Experiments

  • PR(322 Million Links):52 iterations
  • PR(161Million Links):45 iterations
  • Scaling factor is roughly linear in logn
    技术图片

Conclusion

  • PageRank rank web pages based on their locations in the web graph structure
  • PageRank uses information which is external to the web pages-backlinks
  • Backlinks from important pages are more significant than backlinks from common pages
  • The structure of the web graph is very useful for information retrieval tasks.

以上内容出自小象学院

社交网络分析与挖掘 第六课:网页排序

标签:ges   map   strong   display   ext   hyper   link   mos   span   

原文地址:https://www.cnblogs.com/KisInfinite/p/11633540.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!