Actuarial Outpost Machine Learning and Pattern Recognition Thread - starting 12/15/17
 User Name Remember Me? Password
 Register Blogs Wiki FAQ Calendar Search Today's Posts Mark Forums Read
 FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions

 DW SimpsonActuarial JobsVisit our site for the most up to date jobs for actuaries. Actuarial Salary SurveysProperty & Casualty, Health, Life, Pension and Non-Tradtional Jobs. Actuarial Meeting ScheduleBrowse this year's meetings and which recruiters will attend. Contact DW SimpsonHave a question? Let's talk. You'll be glad you did.

 Thread Tools Search this Thread Display Modes
#21
12-30-2017, 04:42 PM
 DiscreteAndDiscreet Member AAA Join Date: May 2016 Posts: 478

Quote:
 Originally Posted by clarinetist I came to a similar conclusion after talking with a former Physics Ph.D. student. I've bought a physics book as a result of this (Mathematical Methods in the Physical Sciences by Boas) and will be going through that text alongside this study. Math degree programs, I've realized, go through topics too slowly for applications.
Quote:
 Originally Posted by 764dak No. You just hate waiting and appreciating the beauty and ideas of mathematics.
You're both missing something a little more interesting.

First, note that breadth of study beats depth of study every time. The sort of packrat behavior that physicists and engineers have, where they collect little pieces of math and use them in a patchwork for applications is an example of this. When you're empirically grounded, working with any single model eventually runs into a brick wall because models that incorporate all aspects of the observed system become intractable relatively quickly. Studying a system through a variety of tractable models (including intractable models modified to be tractable based on approximations or simplifying assumptions) yields more progress.

If you start to notice that physics and engineering have successes with this kind of approach, I don't think "I should study physics, they learn this stuff more quickly" is the most fruitful conclusion to draw. Learning the contents of a particular packrat's collection of techniques is going to mostly teach you about the quirks of that packrat's field. That can be interesting and useful in it's own right, but if you want to do work in some other field, it's at best a case study. What you really want to learn is how to be a different packrat that adapts to whatever your native field of study is.

My perspective is framed around a background heavy in computer science. Methods that work on one particular problem are to varying degrees portable to other problems. At a higher level of abstraction (category theory), all branches of mathematics have concepts defined in other branches of mathematics embedded in them. As you study more, rather than learning more techniques, you should actually start to learn that all of the diverse techniques can really be understood as applications of a smaller set of computational strategies. Breadth beats depth because breadth increases the rate at which a personal understanding of this is revealed.
#22
12-31-2017, 01:12 PM
 DiscreteAndDiscreet Member AAA Join Date: May 2016 Posts: 478

Here’s a bit of material related to this:

https://arxiv.org/pdf/math/0303352.pdf

Chaitin can be a bit insufferable but he lays out some good sketches of why a fixed model has bounds on the range of phenomena it can explain.

There is also an article critical of Chaitin’s traits that make him a bit insufferable that also covers some key concepts here:

http://www.ams.org/notices/200109/rev-panu.pdf

I’ll note that the criticism levied that a theory’s complexity is unrelated to its explanatory power seems to be badly mistaken. The example offered of a large but weak theory is highly compressible, meaning its equivalence to a simpler theory is readily apparent. On the other hand, naked set theory is not useful for expressing all of mathematics without defining mappings between set theoretic concepts and any particular theory (i.e. you have to define how to encode your terms into set theory, such as the many constructions for defining the natural numbers as a sequence of sets constructed from only the empty set). Those definitions need to be regarded as part of the theory.
#23
01-09-2018, 10:29 PM
 clarinetist Member Non-Actuary Join Date: Aug 2011 Studying for Rcpp, Git Posts: 6,870

Review of Chapter 3:

https://ymmathstat.blogspot.com/2018...s-machine.html
__________________
If you want to add me on LinkedIn, PM me.

Why I hate Microsoft Access.

Studying/Reading: Linear Optimization/Programming, Machine Learning and Pattern Recognition
#24
01-10-2018, 09:17 AM
 Neutral Omen Member SOA Join Date: Oct 2011 Favorite beer: Kwak Posts: 6,990

Quote:
 Originally Posted by DiscreteAndDiscreet I’ll note that the criticism levied that a theory’s complexity is unrelated to its explanatory power seems to be badly mistaken. The example offered of a large but weak theory is highly compressible, meaning its equivalence to a simpler theory is readily apparent.
Out of depth here (ANACS), but could the definition of complexity be different from definition of information? Does compressibility increase complexity but not information?
__________________
Spoiler:

Quote:
 Originally Posted by mathmajor you are a visionary
#25
01-10-2018, 12:36 PM
 DiscreteAndDiscreet Member AAA Join Date: May 2016 Posts: 478

Quote:
 Originally Posted by Neutral Omen Out of depth here (ANACS), but could the definition of complexity be different from definition of information? Does compressibility increase complexity but not information?
You can finagle the answer to this question a bit by discussing other types of complexity measures, but not by that much.

There are multiple theories that try to capture notions of complexity or information content. However, Shannon entropy (measure of symbols needed to efficiently encode the outcome of a recurring statistical process with a known distribution) and Kolmogorov complexity (length of the shortest program in a given computer language which outputs a given object) play a central role in this family of theories and they both relate to bounds on how much data can be compressed. If you sample results over and over again from a known distribution, it’s statistically certain that the Kolmogorov complexity of the data set will equal the total Shannon entropy plus a constant related to expressing the known distribution as a program. These are both measurements of essentially the same concept, albeit with different flaws. Shannon entropy gives you the expected information content of a sample based on a hypothesized distribution and it’s essentially probabilistic in nature and carries with it the weaknesses associated with Bayesian or Frequentist probability theory, depending on how it’s used. Kolmogorov complexity is essentially deterministic but it can’t be calculated for non-trivial data sets and it can at best only be approximated by upper bounds.

Another related theory from computer science is time complexity theory which classifies problems into complexity classes based on how the number of calculation steps scales as the problem gets bigger. The average case performance for sorting a list scales O(n log n) and there are algorithms that achieve this bound. This is, in many ways, a much more practical definition of complexity and it seems to raise an issue with Kolmogorov complexity in that two objects with the same Kolmogorov complexity may require vastly different times for their shortest programs to run. This raises the question of why this more practical definition shouldn’t be regarded as more fundamental. I believe a reasonable answer to this is to point out that if an algorithm is efficient in its use of time, it should not perform calculations that can easily be predicted from analysis of earlier calculations and in this sense, efficient algorithms need to generate as much new information (in the Shannon entropy/Kolmogorov complexity sense) at each step. Kolmogorov complexity theory can in fact be used to prove results in time complexity theory.

Computer science is one of the branches of math that is essentially universal in the sense that you could take a particular Turing complete computer as a given and define all of the constructible results of mathematics in computer programs for that one computer (noting that non-constructive existence theorems are themselves constructible results of basic axioms even if the object posited by the theorem is not constructible). You could in fact write a small program that encodes set theory and then derives all possible theories that can be defined in set theory, up to a given size limit, and then wait for user input and answer the user’s questions about a particular theory by looking up the theory that matches with the user’s input. Initially this looks like a way of saying that a small set theory program can answer all of your questions, but very simple modifications to definitions wipe out this result. To actually produce results, set theory has to be unpacked to a “depth” that covers the questions you want to answer. You can only bypass the need to input your definitions and derive their conclusions if you have a vast amount of computer memory and you can wait a vast amount of time for a computer to crank through a lot of irrelevant conclusions of set theory. What this shows you is that trial and error can be substituted for knowledge.

Returning to the original topic, many machine/human learning methods do in fact work on a principle of acquiring knowledge through iterative trial and error. The set theory program above uses an exhaustive search and machine/human learning methods use heuristic search methods. The extent to which a heuristic search can be better than an exhaustive search comes down to complexity issues.
#26
01-17-2018, 09:15 PM
 clarinetist Member Non-Actuary Join Date: Aug 2011 Studying for Rcpp, Git Posts: 6,870

I'm in section 4.4 of Theodoridis.

Sections 4.1-4.3 covered very standard OLS estimation and its geometric interpretation.

4.4 extends this to $\mathbb{C}$ and goes into optimization of complex random variables, defining the Wirtinger derivative.

Weird stuff.
__________________
If you want to add me on LinkedIn, PM me.

Why I hate Microsoft Access.

Studying/Reading: Linear Optimization/Programming, Machine Learning and Pattern Recognition
#27
01-17-2018, 10:30 PM
 ultrafilter Member Join Date: Nov 2006 Posts: 374

I found a paper by Theodoridis that uses Wirtinger calculus on reproducing kernel Hilbert spaces. As near as I can tell, the motivation is to extend standard machine learning techniques that work on real-valued signals so that they can also work with complex-valued signals, which is apparently something you have to do in communications systems. This seems like very electrical engineering-oriented approach and not something you'd find in the statistics or computer science literatures, but I don't think there's any real harm in studying it. Just don't expect it to ever come up outside of a few very specific applications.
#28
01-22-2018, 08:58 AM
 DiscreteAndDiscreet Member AAA Join Date: May 2016 Posts: 478

Quote:
 Originally Posted by ultrafilter I found a paper by Theodoridis that uses Wirtinger calculus on reproducing kernel Hilbert spaces. As near as I can tell, the motivation is to extend standard machine learning techniques that work on real-valued signals so that they can also work with complex-valued signals, which is apparently something you have to do in communications systems. This seems like very electrical engineering-oriented approach and not something you'd find in the statistics or computer science literatures, but I don't think there's any real harm in studying it. Just don't expect it to ever come up outside of a few very specific applications.
From the computer science perspective, putting that in chapter 4 sounds like premature optimization. It’s a performance refinement over naive approaches, but he hasn’t gotten the reader to the point of following the naive approach.

To me, the core concept that I would want from a machine learning course would be the relationship of a hypothesis that a particular class of processes gave rise to a dataset and the algorithms that locate a particular process within that class.
1. A
• For exponential families, the most efficient way to locate the particular process is to collect the minimum sufficient statistics and discard the rest of the information.
• For periodic processes, you have a sampling theorem that says that all frequencies of the signal up to half the sampling frequency can be recovered from evenly spaced samples and the values in between are unimportant.
• For some GLMs and more complicated models, the most efficient way to locate the process is essentially a randomized search.
• At the most extreme end, if you truly have no hypothesis of where your data comes from, you have no recourse but some form of exhaustive search, which is intractable in general.

If you don’t have any candidate solutions scratched out at the start, you have a problem that can only be solved by a quantity of trial and error up to some notion of the “depth”of this particular problem instance. If you have some information already that helps locate the process, you get a reduction in the amount of trial and error needed. The computational efficiency of the particular techniques you know for particular hypotheses may in fact dictate the approach you take as well, but there are limits on this determined by computational complexity.

Methods used for efficiency refinements shouldn’t be taught before understanding of their larger context.
#29
03-11-2018, 11:58 AM
 clarinetist Member Non-Actuary Join Date: Aug 2011 Studying for Rcpp, Git Posts: 6,870

I'll be getting back to this in the summer most likely.

This class I'm taking has been brutal.
__________________
If you want to add me on LinkedIn, PM me.

Why I hate Microsoft Access.

Studying/Reading: Linear Optimization/Programming, Machine Learning and Pattern Recognition
#30
03-28-2018, 11:03 PM
 Brizzle Member SOA Join Date: Apr 2013 Posts: 238

Are you guys elite actuaries? I don't know a single actuary at my company(health insurance plan) that I think could contribute to this conversation. Does having this kind of grasp on statistics help in your career?

 Thread Tools Search this Thread Search this Thread: Advanced Search Display Modes Linear Mode

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off

All times are GMT -4. The time now is 02:01 AM.

 -- Default Style - Fluid Width ---- Default Style - Fixed Width ---- Old Default Style ---- Easy on the eyes ---- Smooth Darkness ---- Chestnut ---- Apple-ish Style ---- If Apples were blue ---- If Apples were green ---- If Apples were purple ---- Halloween 2007 ---- B&W ---- Halloween ---- AO Christmas Theme ---- Turkey Day Theme ---- AO 2007 beta ---- 4th Of July Contact Us - Actuarial Outpost - Archive - Privacy Statement - Top

Powered by vBulletin®
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.15256 seconds with 9 queries