Actuarial Outpost
 
Go Back   Actuarial Outpost > Actuarial Discussion Forum > Software & Technology
FlashChat Actuarial Discussion Preliminary Exams CAS/SOA Exams Cyberchat Around the World Suggestions

DW Simpson International Actuarial Jobs
Canada  Asia  Australia  Bermuda  Latin America  Europe


Reply
 
Thread Tools Display Modes
  #1  
Old 11-16-2017, 02:38 PM
GwenAnderson GwenAnderson is offline
CAS
 
Join Date: Nov 2017
Posts: 14
Default R Multiple Merge

Hi. Is there an elegant way to merge multiple dataframes in R by multiple columns? Has there been a replacement for merge_recurse which seems to be faulty? Would you concatentate the columns in the merge language (eg. by.x = paste... ? Is Reduce the best way?
Reply With Quote
  #2  
Old 11-16-2017, 04:44 PM
kevinykuo kevinykuo is offline
CAS
 
Join Date: Nov 2017
Posts: 12
Default

In dplyr `*_join` functions, pass a vector to `by`.

It'd help if you have some reproducible code, see https://stackoverflow.com/questions/...ucible-example
Reply With Quote
  #3  
Old 11-16-2017, 07:50 PM
AMedActuary AMedActuary is offline
Member
SOA
 
Join Date: May 2007
College: UCLA Alumni
Posts: 374
Default

I believe dplyr only does two dataframes at a time. To get around this, you can just do multiple joins (join Table A and B then join it to Table C, etc.). You can just pipe the steps together.

Code:
new_tab<-tab_A %>%
 left_join(tab_B, by=c("var1", "var2", "var3")) %>% 
 left_join(tab_C, by=c("var1", "var2", "var3"))
Reply With Quote
  #4  
Old 11-16-2017, 11:31 PM
kevinykuo kevinykuo is offline
CAS
 
Join Date: Nov 2017
Posts: 12
Default

Ah, I missed the multiple data frames part. The idiomatic way would be defining `join <- function(x, y) left_join(x, y, by = cols)` then calling `Reduce(f, df_list)` or maybe `purrr::reduce(df_list, left_join, by = cols)` where `cols` are the columns to group by.

Just don't write a loop.
Reply With Quote
  #5  
Old 11-17-2017, 02:16 PM
GwenAnderson GwenAnderson is offline
CAS
 
Join Date: Nov 2017
Posts: 14
Default

Quote:
Originally Posted by AMedActuary View Post
I believe dplyr only does two dataframes at a time. To get around this, you can just do multiple joins (join Table A and B then join it to Table C, etc.). You can just pipe the steps together.

Code:
new_tab<-tab_A %>%
 left_join(tab_B, by=c("var1", "var2", "var3")) %>% 
 left_join(tab_C, by=c("var1", "var2", "var3"))

That looks elegant ... very orderly! I think that it is more intuitive than Reduce.

For the data example, I have year and month columns and then one of the ENSO indices per file. They are actually wide files but I use reshape2 melt(). The indices are in separate files and I may want to join them with the actual weather data by year and month, perhaps fie or more columns of indices. I think that piping the steps together as shown above is the most straightforward approach and easily interpreted.
Reply With Quote
  #6  
Old 11-17-2017, 02:35 PM
kevinykuo kevinykuo is offline
CAS
 
Join Date: Nov 2017
Posts: 12
Default

Quote:
Originally Posted by GwenAnderson View Post
That looks elegant ... very orderly! I think that it is more intuitive than Reduce.

For the data example, I have year and month columns and then one of the ENSO indices per file. They are actually wide files but I use reshape2 melt(). The indices are in separate files and I may want to join them with the actual weather data by year and month, perhaps fie or more columns of indices. I think that piping the steps together as shown above is the most straightforward approach and easily interpreted.
You should use tidyr instead of reshape2 to have a consistent approach in your data wrangling.
Reply With Quote
  #7  
Old 11-17-2017, 03:05 PM
AMedActuary AMedActuary is offline
Member
SOA
 
Join Date: May 2007
College: UCLA Alumni
Posts: 374
Default

Yeah as Kevin said, look at spread() and gather() instead of melt.

This cheatsheet will help.

Last edited by AMedActuary; 11-17-2017 at 03:42 PM..
Reply With Quote
  #8  
Old 11-18-2017, 04:30 PM
GwenAnderson GwenAnderson is offline
CAS
 
Join Date: Nov 2017
Posts: 14
Default

I don't see that there is anything inconsistent in using reshape2 in a program. However I am taking it that you believe in staying within the tidyverse whenever possible? Mainly for actuarial applications? Is the main advantage that the person running the code may already have tidyverse loaded? (Or else, just tidyr?) In any case I do not see that reshape2 has any particular advantages. Aside from working just fine for the purposes. So I will bring up the last discussion on data.table that was misplaced othe P&C board ... are all of the functions in data.table so strongly preferred in the actuarial community that it would seem incoherent to use a base R command in a short program? Is it one of those 'back of the hand' kind of things, so that other commands become unfamiliar and awkward?
Reply With Quote
Reply

Tags
merge, r code, r language

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


All times are GMT -4. The time now is 06:48 AM.


Powered by vBulletin®
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
*PLEASE NOTE: Posts are not checked for accuracy, and do not
represent the views of the Actuarial Outpost or its sponsors.
Page generated in 0.37217 seconds with 11 queries