Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Software & Technology
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions


Not looking for a job? Tell us about your ideal job,
and we'll only contact you when it opens up.
https://www.dwsimpson.com/register


Reply
 
Thread Tools Search this Thread Display Modes
  #21  
Old 12-30-2017, 05:42 PM
DiscreteAndDiscreet DiscreteAndDiscreet is offline
Member
AAA
 
Join Date: May 2016
Posts: 478
Default

Quote:
Originally Posted by clarinetist View Post
I came to a similar conclusion after talking with a former Physics Ph.D. student. I've bought a physics book as a result of this (Mathematical Methods in the Physical Sciences by Boas) and will be going through that text alongside this study.

Math degree programs, I've realized, go through topics too slowly for applications.
Quote:
Originally Posted by 764dak View Post
No. You just hate waiting and appreciating the beauty and ideas of mathematics.
You're both missing something a little more interesting.

First, note that breadth of study beats depth of study every time. The sort of packrat behavior that physicists and engineers have, where they collect little pieces of math and use them in a patchwork for applications is an example of this. When you're empirically grounded, working with any single model eventually runs into a brick wall because models that incorporate all aspects of the observed system become intractable relatively quickly. Studying a system through a variety of tractable models (including intractable models modified to be tractable based on approximations or simplifying assumptions) yields more progress.

If you start to notice that physics and engineering have successes with this kind of approach, I don't think "I should study physics, they learn this stuff more quickly" is the most fruitful conclusion to draw. Learning the contents of a particular packrat's collection of techniques is going to mostly teach you about the quirks of that packrat's field. That can be interesting and useful in it's own right, but if you want to do work in some other field, it's at best a case study. What you really want to learn is how to be a different packrat that adapts to whatever your native field of study is.

My perspective is framed around a background heavy in computer science. Methods that work on one particular problem are to varying degrees portable to other problems. At a higher level of abstraction (category theory), all branches of mathematics have concepts defined in other branches of mathematics embedded in them. As you study more, rather than learning more techniques, you should actually start to learn that all of the diverse techniques can really be understood as applications of a smaller set of computational strategies. Breadth beats depth because breadth increases the rate at which a personal understanding of this is revealed.
Reply With Quote
  #22  
Old 12-31-2017, 02:12 PM
DiscreteAndDiscreet DiscreteAndDiscreet is offline
Member
AAA
 
Join Date: May 2016
Posts: 478
Default

Here’s a bit of material related to this:

https://arxiv.org/pdf/math/0303352.pdf

Chaitin can be a bit insufferable but he lays out some good sketches of why a fixed model has bounds on the range of phenomena it can explain.

There is also an article critical of Chaitin’s traits that make him a bit insufferable that also covers some key concepts here:

http://www.ams.org/notices/200109/rev-panu.pdf

I’ll note that the criticism levied that a theory’s complexity is unrelated to its explanatory power seems to be badly mistaken. The example offered of a large but weak theory is highly compressible, meaning its equivalence to a simpler theory is readily apparent. On the other hand, naked set theory is not useful for expressing all of mathematics without defining mappings between set theoretic concepts and any particular theory (i.e. you have to define how to encode your terms into set theory, such as the many constructions for defining the natural numbers as a sequence of sets constructed from only the empty set). Those definitions need to be regarded as part of the theory.
Reply With Quote
  #23  
Old 01-09-2018, 11:29 PM
clarinetist clarinetist is offline
Member
Non-Actuary
 
Join Date: Aug 2011
Studying for Rcpp, Git
Posts: 6,872
Default

Review of Chapter 3:

https://ymmathstat.blogspot.com/2018...s-machine.html
__________________
If you want to add me on LinkedIn, PM me.

Why I hate Microsoft Access.

Studying/Reading: GLMs, Bayesian Stats, Time Series
Reply With Quote
  #24  
Old 01-10-2018, 10:17 AM
Neutral Omen's Avatar
Neutral Omen Neutral Omen is offline
Member
SOA
 
Join Date: Oct 2011
Favorite beer: Kwak
Posts: 6,992
Default

Quote:
Originally Posted by DiscreteAndDiscreet View Post
I’ll note that the criticism levied that a theory’s complexity is unrelated to its explanatory power seems to be badly mistaken. The example offered of a large but weak theory is highly compressible, meaning its equivalence to a simpler theory is readily apparent.
Out of depth here (ANACS), but could the definition of complexity be different from definition of information? Does compressibility increase complexity but not information?
__________________
Spoiler:


Quote:
Originally Posted by mathmajor View Post
you are a visionary
Reply With Quote
  #25  
Old 01-10-2018, 01:36 PM
DiscreteAndDiscreet DiscreteAndDiscreet is offline
Member
AAA
 
Join Date: May 2016
Posts: 478
Default

Quote:
Originally Posted by Neutral Omen View Post
Out of depth here (ANACS), but could the definition of complexity be different from definition of information? Does compressibility increase complexity but not information?
You can finagle the answer to this question a bit by discussing other types of complexity measures, but not by that much.

There are multiple theories that try to capture notions of complexity or information content. However, Shannon entropy (measure of symbols needed to efficiently encode the outcome of a recurring statistical process with a known distribution) and Kolmogorov complexity (length of the shortest program in a given computer language which outputs a given object) play a central role in this family of theories and they both relate to bounds on how much data can be compressed. If you sample results over and over again from a known distribution, itís statistically certain that the Kolmogorov complexity of the data set will equal the total Shannon entropy plus a constant related to expressing the known distribution as a program. These are both measurements of essentially the same concept, albeit with different flaws. Shannon entropy gives you the expected information content of a sample based on a hypothesized distribution and itís essentially probabilistic in nature and carries with it the weaknesses associated with Bayesian or Frequentist probability theory, depending on how itís used. Kolmogorov complexity is essentially deterministic but it canít be calculated for non-trivial data sets and it can at best only be approximated by upper bounds.

Another related theory from computer science is time complexity theory which classifies problems into complexity classes based on how the number of calculation steps scales as the problem gets bigger. The average case performance for sorting a list scales O(n log n) and there are algorithms that achieve this bound. This is, in many ways, a much more practical definition of complexity and it seems to raise an issue with Kolmogorov complexity in that two objects with the same Kolmogorov complexity may require vastly different times for their shortest programs to run. This raises the question of why this more practical definition shouldnít be regarded as more fundamental. I believe a reasonable answer to this is to point out that if an algorithm is efficient in its use of time, it should not perform calculations that can easily be predicted from analysis of earlier calculations and in this sense, efficient algorithms need to generate as much new information (in the Shannon entropy/Kolmogorov complexity sense) at each step. Kolmogorov complexity theory can in fact be used to prove results in time complexity theory.

Computer science is one of the branches of math that is essentially universal in the sense that you could take a particular Turing complete computer as a given and define all of the constructible results of mathematics in computer programs for that one computer (noting that non-constructive existence theorems are themselves constructible results of basic axioms even if the object posited by the theorem is not constructible). You could in fact write a small program that encodes set theory and then derives all possible theories that can be defined in set theory, up to a given size limit, and then wait for user input and answer the userís questions about a particular theory by looking up the theory that matches with the userís input. Initially this looks like a way of saying that a small set theory program can answer all of your questions, but very simple modifications to definitions wipe out this result. To actually produce results, set theory has to be unpacked to a ďdepthĒ that covers the questions you want to answer. You can only bypass the need to input your definitions and derive their conclusions if you have a vast amount of computer memory and you can wait a vast amount of time for a computer to crank through a lot of irrelevant conclusions of set theory. What this shows you is that trial and error can be substituted for knowledge.

Returning to the original topic, many machine/human learning methods do in fact work on a principle of acquiring knowledge through iterative trial and error. The set theory program above uses an exhaustive search and machine/human learning methods use heuristic search methods. The extent to which a heuristic search can be better than an exhaustive search comes down to complexity issues.
Reply With Quote
  #26  
Old 01-17-2018, 10:15 PM
clarinetist clarinetist is offline
Member
Non-Actuary
 
Join Date: Aug 2011
Studying for Rcpp, Git
Posts: 6,872
Default

I'm in section 4.4 of Theodoridis.

Sections 4.1-4.3 covered very standard OLS estimation and its geometric interpretation.

4.4 extends this to and goes into optimization of complex random variables, defining the Wirtinger derivative.

Weird stuff.
__________________
If you want to add me on LinkedIn, PM me.

Why I hate Microsoft Access.

Studying/Reading: GLMs, Bayesian Stats, Time Series
Reply With Quote
  #27  
Old 01-17-2018, 11:30 PM
ultrafilter ultrafilter is offline
Member
 
Join Date: Nov 2006
Posts: 374
Default

I found a paper by Theodoridis that uses Wirtinger calculus on reproducing kernel Hilbert spaces. As near as I can tell, the motivation is to extend standard machine learning techniques that work on real-valued signals so that they can also work with complex-valued signals, which is apparently something you have to do in communications systems. This seems like very electrical engineering-oriented approach and not something you'd find in the statistics or computer science literatures, but I don't think there's any real harm in studying it. Just don't expect it to ever come up outside of a few very specific applications.
Reply With Quote
  #28  
Old 01-22-2018, 09:58 AM
DiscreteAndDiscreet DiscreteAndDiscreet is offline
Member
AAA
 
Join Date: May 2016
Posts: 478
Default

Quote:
Originally Posted by ultrafilter View Post
I found a paper by Theodoridis that uses Wirtinger calculus on reproducing kernel Hilbert spaces. As near as I can tell, the motivation is to extend standard machine learning techniques that work on real-valued signals so that they can also work with complex-valued signals, which is apparently something you have to do in communications systems. This seems like very electrical engineering-oriented approach and not something you'd find in the statistics or computer science literatures, but I don't think there's any real harm in studying it. Just don't expect it to ever come up outside of a few very specific applications.
From the computer science perspective, putting that in chapter 4 sounds like premature optimization. Itís a performance refinement over naive approaches, but he hasnít gotten the reader to the point of following the naive approach.

To me, the core concept that I would want from a machine learning course would be the relationship of a hypothesis that a particular class of processes gave rise to a dataset and the algorithms that locate a particular process within that class.
  1. A
  • For exponential families, the most efficient way to locate the particular process is to collect the minimum sufficient statistics and discard the rest of the information.
  • For periodic processes, you have a sampling theorem that says that all frequencies of the signal up to half the sampling frequency can be recovered from evenly spaced samples and the values in between are unimportant.
  • For some GLMs and more complicated models, the most efficient way to locate the process is essentially a randomized search.
  • At the most extreme end, if you truly have no hypothesis of where your data comes from, you have no recourse but some form of exhaustive search, which is intractable in general.

If you donít have any candidate solutions scratched out at the start, you have a problem that can only be solved by a quantity of trial and error up to some notion of the ďdepthĒof this particular problem instance. If you have some information already that helps locate the process, you get a reduction in the amount of trial and error needed. The computational efficiency of the particular techniques you know for particular hypotheses may in fact dictate the approach you take as well, but there are limits on this determined by computational complexity.

Methods used for efficiency refinements shouldnít be taught before understanding of their larger context.
Reply With Quote
  #29  
Old 03-11-2018, 12:58 PM
clarinetist clarinetist is offline
Member
Non-Actuary
 
Join Date: Aug 2011
Studying for Rcpp, Git
Posts: 6,872
Default

I'll be getting back to this in the summer most likely.

This class I'm taking has been brutal.
__________________
If you want to add me on LinkedIn, PM me.

Why I hate Microsoft Access.

Studying/Reading: GLMs, Bayesian Stats, Time Series
Reply With Quote
  #30  
Old 03-29-2018, 12:03 AM
Brizzle Brizzle is offline
Member
SOA
 
Join Date: Apr 2013
Posts: 275
Default

Are you guys elite actuaries? I don't know a single actuary at my company(health insurance plan) that I think could contribute to this conversation. Does having this kind of grasp on statistics help in your career?
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 01:09 PM.


Powered by vBulletin®
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.55480 seconds with 9 queries