Reply to thread

Message: <blockquote data-quote="Carders" data-source="post: 641" data-attributes="member: 17">To determine the most likely status of each vertex, the confidence propagation algorithm was used. First, each vertex counts its state from the propagation matrix. Then the vertices inform each other about the changed state. When they receive new data about their neighbors, they check their status. This starts the next stage of computation, followed by a new message chain. This continues until the system reaches equilibrium.<img src="https://xakep.ru/wp-content/uploads/2016/01/1453177870_bdf4_netprobe.png" alt="In this illustration, vertices with an undefined state are marked in gray, fraudsters are marked in red, and their accomplices are marked in yellow" class="fr-fic fr-dii fr-draggable " style="" />In this illustration, vertices with an undefined state are marked in gray, fraudsters are marked in red, and their accomplices are marked in yellowTo test the effectiveness of this method, the researchers set up a self-made robot on eBay that collected information about users and transactions between them. Based on the resulting data set, they constructed a graph consisting of 66,130 vertices and 795,320 arcs. Ten vertices in this graph belonged to scammers who had already been caught and reported in the news. The algorithm correctly identified each of them and marked possible accomplices. There is another sign that the idea is correct: the reputation of accounts that the algorithm suspected of fraud was several times worse than that of others.Interestingly, in order for everything to work, the algorithm does not need to know in advance who is the accomplice and who is the fraudster. You don't even need a user's reputation. Only the relationships between them can be analyzed. Everything is determined by the graph topology.<h4>Wrong friendship of Russian robots</h4>In 1881, the American mathematician Simon Newcomb noticed something very strange: for some reason, the first pages of books with logarithmic tables are always more frayed than the last. And it's not that no one reads them to the end. Logarithmic tables are not an ordinary book that is supposed to be read in order. This is a tool that significantly speeds up multiplication and division of large numbers.Pre-calculated logarithms of a set of numbers are combined into logarithmic tables. To multiply two numbers, it is enough to find the corresponding logarithms in the table, add them up, and then determine from the same table which result corresponds to the sum. This is much easier and faster than column multiplication, which is taught in school.At the beginning of the logarithmic table, the logarithms of numbers with one in the highest digit are listed, then the logarithms of numbers starting with two, and so on to nine. If a book is more worn out at the beginning than at the end, then people need multipliers that start with one more often than numbers that start with two, let alone nine.<img src="https://xakep.ru/wp-content/uploads/2016/01/1453177920_1cbe_simon_newcomb_01.jpg" alt="Simon Newcomb" class="fr-fic fr-dii fr-draggable " style="" />Simon NewcombNewcomb suggested that the lower the value of the highest digit of a number, the more often it occurs. According to the formula that the scientist deduced, the probability of encountering a number with one at the beginning is about 30%. The probability decreases with each digit until it reaches 4.6% — this value corresponds to nine.Common sense protests against this idea, but you can't argue with the facts. In 1938, the physicist Frank Benford, who independently stumbled upon the same pattern, tested the validity of his conclusions on tens of thousands of measurements. He calculated the probability with which different digits occur in the highest order of dozens of physical constants. The results matched the formula's predictions. River basin areas? The molecular weight of hundreds of chemicals? Population of randomly selected localities? Stock prices on the stock exchange? Benford checked one data set after another, but couldn't find any errors. The distribution of digits in the highest order was subject to the law that today bears his name — Benford's law.In the early seventies, economist Hal Varian proposed using Benford's law to distinguish falsified data from genuine data. Values taken from the ceiling may look very plausible, but they do not stand up to the test of Benford's law. By the end of the twentieth century, this method was adopted by forensic accounting. They check whether the figures in the financial statements fit into the correct distribution. If Benford's law is not followed, then someone has adjusted the financial indicators.Benford's law easily finds traces of human interference in the natural order. Do I need to explain how valuable this quality is for finding anomalies in the data? The algorithm constructed in this way is simple and efficient. However, it is not suitable for analyzing data that is obviously unnatural. This is a limitation, but who doesn't have it?</blockquote>

Name

Verification