Jun 05, 2016

PageRank doesn't really get "trained" per se, since it's not going to be used to make predictions about other, unseen data.

However, it is learnt from the data: links from one document to another are recorded in a huge adjacency matrix (sort of) and the PageRanks are dominant eigenvector of that matrix.

The original paper never mentions AOL or Yahoo (except as an example of a popular website), but does describe how they built their own crawler (see here: http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf) but I suppose it's possible they got data from those companies later.