

FlashChat  Actuarial Discussion  Preliminary Exams  CAS/SOA Exams  Cyberchat  Around the World  Suggestions 

Thread Tools  Search this Thread  Display Modes 
#21




Quote:
Quote:
First, note that breadth of study beats depth of study every time. The sort of packrat behavior that physicists and engineers have, where they collect little pieces of math and use them in a patchwork for applications is an example of this. When you're empirically grounded, working with any single model eventually runs into a brick wall because models that incorporate all aspects of the observed system become intractable relatively quickly. Studying a system through a variety of tractable models (including intractable models modified to be tractable based on approximations or simplifying assumptions) yields more progress. If you start to notice that physics and engineering have successes with this kind of approach, I don't think "I should study physics, they learn this stuff more quickly" is the most fruitful conclusion to draw. Learning the contents of a particular packrat's collection of techniques is going to mostly teach you about the quirks of that packrat's field. That can be interesting and useful in it's own right, but if you want to do work in some other field, it's at best a case study. What you really want to learn is how to be a different packrat that adapts to whatever your native field of study is. My perspective is framed around a background heavy in computer science. Methods that work on one particular problem are to varying degrees portable to other problems. At a higher level of abstraction (category theory), all branches of mathematics have concepts defined in other branches of mathematics embedded in them. As you study more, rather than learning more techniques, you should actually start to learn that all of the diverse techniques can really be understood as applications of a smaller set of computational strategies. Breadth beats depth because breadth increases the rate at which a personal understanding of this is revealed. 
#22




Here’s a bit of material related to this:
https://arxiv.org/pdf/math/0303352.pdf Chaitin can be a bit insufferable but he lays out some good sketches of why a fixed model has bounds on the range of phenomena it can explain. There is also an article critical of Chaitin’s traits that make him a bit insufferable that also covers some key concepts here: http://www.ams.org/notices/200109/revpanu.pdf I’ll note that the criticism levied that a theory’s complexity is unrelated to its explanatory power seems to be badly mistaken. The example offered of a large but weak theory is highly compressible, meaning its equivalence to a simpler theory is readily apparent. On the other hand, naked set theory is not useful for expressing all of mathematics without defining mappings between set theoretic concepts and any particular theory (i.e. you have to define how to encode your terms into set theory, such as the many constructions for defining the natural numbers as a sequence of sets constructed from only the empty set). Those definitions need to be regarded as part of the theory. 
#23




__________________
If you want to add me on LinkedIn, PM me. Why I hate Microsoft Access. Studying/Reading: Machine Learning and Pattern Recognition 
#24




Quote:
__________________
Spoiler: 
#25




Quote:
There are multiple theories that try to capture notions of complexity or information content. However, Shannon entropy (measure of symbols needed to efficiently encode the outcome of a recurring statistical process with a known distribution) and Kolmogorov complexity (length of the shortest program in a given computer language which outputs a given object) play a central role in this family of theories and they both relate to bounds on how much data can be compressed. If you sample results over and over again from a known distribution, it’s statistically certain that the Kolmogorov complexity of the data set will equal the total Shannon entropy plus a constant related to expressing the known distribution as a program. These are both measurements of essentially the same concept, albeit with different flaws. Shannon entropy gives you the expected information content of a sample based on a hypothesized distribution and it’s essentially probabilistic in nature and carries with it the weaknesses associated with Bayesian or Frequentist probability theory, depending on how it’s used. Kolmogorov complexity is essentially deterministic but it can’t be calculated for nontrivial data sets and it can at best only be approximated by upper bounds. Another related theory from computer science is time complexity theory which classifies problems into complexity classes based on how the number of calculation steps scales as the problem gets bigger. The average case performance for sorting a list scales O(n log n) and there are algorithms that achieve this bound. This is, in many ways, a much more practical definition of complexity and it seems to raise an issue with Kolmogorov complexity in that two objects with the same Kolmogorov complexity may require vastly different times for their shortest programs to run. This raises the question of why this more practical definition shouldn’t be regarded as more fundamental. I believe a reasonable answer to this is to point out that if an algorithm is efficient in its use of time, it should not perform calculations that can easily be predicted from analysis of earlier calculations and in this sense, efficient algorithms need to generate as much new information (in the Shannon entropy/Kolmogorov complexity sense) at each step. Kolmogorov complexity theory can in fact be used to prove results in time complexity theory. Computer science is one of the branches of math that is essentially universal in the sense that you could take a particular Turing complete computer as a given and define all of the constructible results of mathematics in computer programs for that one computer (noting that nonconstructive existence theorems are themselves constructible results of basic axioms even if the object posited by the theorem is not constructible). You could in fact write a small program that encodes set theory and then derives all possible theories that can be defined in set theory, up to a given size limit, and then wait for user input and answer the user’s questions about a particular theory by looking up the theory that matches with the user’s input. Initially this looks like a way of saying that a small set theory program can answer all of your questions, but very simple modifications to definitions wipe out this result. To actually produce results, set theory has to be unpacked to a “depth” that covers the questions you want to answer. You can only bypass the need to input your definitions and derive their conclusions if you have a vast amount of computer memory and you can wait a vast amount of time for a computer to crank through a lot of irrelevant conclusions of set theory. What this shows you is that trial and error can be substituted for knowledge. Returning to the original topic, many machine/human learning methods do in fact work on a principle of acquiring knowledge through iterative trial and error. The set theory program above uses an exhaustive search and machine/human learning methods use heuristic search methods. The extent to which a heuristic search can be better than an exhaustive search comes down to complexity issues. 
#26




I'm in section 4.4 of Theodoridis.
Sections 4.14.3 covered very standard OLS estimation and its geometric interpretation. 4.4 extends this to and goes into optimization of complex random variables, defining the Wirtinger derivative. Weird stuff.
__________________
If you want to add me on LinkedIn, PM me. Why I hate Microsoft Access. Studying/Reading: Machine Learning and Pattern Recognition 
#27




I found a paper by Theodoridis that uses Wirtinger calculus on reproducing kernel Hilbert spaces. As near as I can tell, the motivation is to extend standard machine learning techniques that work on realvalued signals so that they can also work with complexvalued signals, which is apparently something you have to do in communications systems. This seems like very electrical engineeringoriented approach and not something you'd find in the statistics or computer science literatures, but I don't think there's any real harm in studying it. Just don't expect it to ever come up outside of a few very specific applications.

#28




Quote:
To me, the core concept that I would want from a machine learning course would be the relationship of a hypothesis that a particular class of processes gave rise to a dataset and the algorithms that locate a particular process within that class.
If you don’t have any candidate solutions scratched out at the start, you have a problem that can only be solved by a quantity of trial and error up to some notion of the “depth”of this particular problem instance. If you have some information already that helps locate the process, you get a reduction in the amount of trial and error needed. The computational efficiency of the particular techniques you know for particular hypotheses may in fact dictate the approach you take as well, but there are limits on this determined by computational complexity. Methods used for efficiency refinements shouldn’t be taught before understanding of their larger context. 
#29




I'll be getting back to this in the summer most likely.
This class I'm taking has been brutal.
__________________
If you want to add me on LinkedIn, PM me. Why I hate Microsoft Access. Studying/Reading: Machine Learning and Pattern Recognition 
Thread Tools  Search this Thread 
Display Modes  

