Использование gsub для создания единых форматов времени

Я пытаюсь переформатировать время, которое в настоящее время находится в символьном формате. Я пытаюсь сделать их едиными. Прямо сейчас они выглядят так:

 [1] "1:00PM"      "1:10 PM"     "1:10PM"      "1:20 PM"     "1:30 PM"     "1:30PM"     
 [7] "1:40 PM"     "10:00AM"     "10:10 AM"    "10:10AM"     "10:30 AM"    "10:30AM"    
 [13] "10:45 AM"    "10:45AM"     "10:50 AM"    "10:50AM"     "10AM"        "11:00AM"    
 [19] "11:10 AM"    "11:10AM"     "11:40 AM"    "11:40AM"     "11AM"        "12:00PM"    
 [25] "12:05 PM"    "12:10 PM"    "12:10PM"     "12:25PM"     "12:30 PM"    "12:30PM"    
 [31] "12:45 PM"    "12:45:30 PM" "12:45PM"     "12:50 PM"    "12PM"        "1PM"        
 [37] "2:00PM"      "2:10 PM"     "2:10PM"      "2:20PM"      "2:30 PM"     "2:30PM"     
 [43] "2:35 PM"     "2:45 PM"     "2:45PM"      "2:55 PM"     "2PM"         "3:00PM"     
 [49] "3:05 PM"     "3:10 PM"     "3:10PM"      "3:20 PM"     "3:20PM"      "3:25 PM"    
 [55] "3:25PM"      "3:30 PM"     "3:35 PM"     "3:35PM"      "3:45 PM"     "3:45PM"     
 [61] "3PM"         "4:00PM"      "4:10 PM"     "4:10PM"      "4:30 PM"     "4:30PM"     
 [67] "4:35 PM"     "4:35PM"      "4PM"         "5:00PM"      "5:10 PM"     "5:10PM"     
 [73] "5:20 PM"     "5:30 PM"     "5:30PM"      "5:35 PM"     "5:35PM"      "5:40 PM"    
 [79] "5:40PM"      "5:45 PM"     "5:50 PM"     "5:50PM"      "6:00PM"      "6:05PM"     
 [85] "6:10 PM"     "6:10PM"      "6:15PM"      "6:30 PM"     "6:30PM"      "6PM"        
 [91] "7:00PM"      "7:10 AM"     "7:10 PM"     "7:10AM"      "7:10PM"      "7:30PM"     
 [97] "7:35 PM"     "7:35PM"      "7:45 PM"     "7:45PM"      "7AM"         "7PM"        
[103] "8:00AM"      "8:10 AM"     "8:10AM"      "8:25 PM"     "8:25PM"      "8:50 PM"    
[109] "8AM"         "9:00AM"      "9:10 AM"     "9:10AM"      "9:15 AM"     "9:15AM"     
[115] "9:20 AM"     "9:30 AM"     "9:30AM"      "9:35AM"      "9:40 AM"     "9:40AM"     
[121] "9:45 AM"     "9:45AM"      "9AM"

Я хочу, чтобы все времена были в этом формате:

13:00 вместо 13:00
12:45 вместо 12:45:30

Таким образом, в основном ЧЧ: ММ добавляются с AM или PM.

В конце концов я хотел бы преобразовать время из символьного формата в формат POSIXct. Но это возможно только с единым форматом символов. Чтобы быть более конкретным: как бы вы использовали gsub, чтобы изменить «3PM» на «15:00» и аналогичным образом изменить «12:45:30 PM» на «12:45PM», например?

Мне трудно понять синтаксис регулярных выражений в gsub, особенно как ссылаться на определенную позицию, скажем, на позицию 4 в строке символов.


person WykoW    schedule 07.06.2015    source источник
comment
stackoverflow.com/ вопросы/1803627/   -  person jmargolisvt    schedule 07.06.2015


Ответы (1)


Мы создаем индекс («indx») для элементов, у которых нет :, то есть (10:00, 11:00 и т. д.), используя sub, мы меняем формат (10:00, 11:00 и т. д.). Мы сопоставляем первые две цифры, за которыми следует :, две цифры (\\d{2}), фиксируем это как группу со скобками, сопоставляем символы, которые не являются AM/PM ([^AMP]+), сопоставляем символы, которые являются AM/PM, и выбираем в качестве второго захвата group, используйте первую и вторую захваченные группы (\\1\\2) в качестве замены. Теперь мы можем использовать strsplit/sprintf для добавления 0 в начале для элементов, которые не имеют двух цифр.

indx <- !grepl(':', str1)
str1[indx] <- sub('(\\d+)(.*)', '\\1:00\\2', str1[indx])
str1 <- sub('(^\\d+:\\d{2})[^AMP]+([AMP])', '\\1\\2', str1)
sapply(strsplit(str1, ':'), function(x) paste(sprintf('%02d',
          as.numeric(x[1])), x[2], sep=":"))
#[1] "01:00PM" "01:10PM" "01:10PM" "01:20PM" "01:30PM" "01:30PM" "01:40PM"
#[8] "10:00AM" "10:10AM" "10:10AM" "10:30AM" "10:30AM" "10:45AM" "10:45AM"
#[15] "10:50AM" "10:50AM" "10:00AM" "11:00AM" "11:10AM" "11:10AM" "11:40AM"
#[22] "11:40AM" "11:00AM" "12:00PM" "12:05PM" "12:10PM" "12:10PM" "12:25PM"
#[29] "12:30PM" "12:30PM" "12:45PM" "12:45PM" "12:45PM" "12:50PM" "12:00PM"
#[36] "01:00PM" "02:00PM" "02:10PM" "02:10PM" "02:20PM" "02:30PM" "02:30PM"
#[43] "02:35PM" "02:45PM" "02:45PM" "02:55PM" "02:00PM" "03:00PM" "03:05PM"
#[50] "03:10PM" "03:10PM" "03:20PM" "03:20PM" "03:25PM" "03:25PM" "03:30PM"
#[57] "03:35PM" "03:35PM" "03:45PM" "03:45PM" "03:00PM" "04:00PM" "04:10PM"
#[64] "04:10PM" "04:30PM" "04:30PM" "04:35PM" "04:35PM" "04:00PM" "05:00PM"
#[71] "05:10PM" "05:10PM" "05:20PM" "05:30PM" "05:30PM" "05:35PM" "05:35PM"
#[78] "05:40PM" "05:40PM" "05:45PM" "05:50PM" "05:50PM" "06:00PM" "06:05PM"
#[85] "06:10PM" "06:10PM" "06:15PM" "06:30PM" "06:30PM" "06:00PM" "07:00PM"
#[92] "07:10AM" "07:10PM" "07:10AM" "07:10PM" "07:30PM" "07:35PM" "07:35PM"
#[99] "07:45PM" "07:45PM" "07:00AM" "07:00PM" "08:00AM" "08:10AM" "08:10AM"
#[106] "08:25PM" "08:25PM" "08:50PM" "08:00AM" "09:00AM" "09:10AM" "09:10AM"
#[113] "09:15AM" "09:15AM" "09:20AM" "09:30AM" "09:30AM" "09:35AM" "09:40AM"
#[120] "09:40AM" "09:45AM" "09:45AM" "09:00AM"

Или вместо strsplit и sapply мы можем использовать format

 sub('^ ', '0', format(str1, justify='right'))

Or

 library(stringr)
 str_pad(str1, pad='0', width=7)

Или мы можем использовать пакет lubridate, который имеет параметры для нескольких строк формата.

 library(lubridate)
 paste0(format(parse_date_time(str1, orders=guess_formats(gsub('[APM]', 
   '', str1), c('hm', 'hms', 'h'))), '%H:%M'), sub('[^AMP]+', '', str1))
 #[1] "01:00PM" "01:10PM" "01:10PM" "01:20PM" "01:30PM" "01:30PM" "01:40PM"
 #[8] "10:00AM" "10:10AM" "10:10AM" "10:30AM" "10:30AM" "10:45AM" "10:45AM"
 #[15] "10:50AM" "10:50AM" "10:00AM" "11:00AM" "11:10AM" "11:10AM" "11:40AM"
 #[22] "11:40AM" "11:00AM" "12:00PM" "12:05PM" "12:10PM" "12:10PM" "12:25PM"
 #[29] "12:30PM" "12:30PM" "12:45PM" "12:45PM" "12:45PM" "12:50PM" "12:00PM"
 #[36] "01:00PM" "02:00PM" "02:10PM" "02:10PM" "02:20PM" "02:30PM" "02:30PM"
 #[43] "02:35PM" "02:45PM" "02:45PM" "02:55PM" "02:00PM" "03:00PM" "03:05PM"
 #[50] "03:10PM" "03:10PM" "03:20PM" "03:20PM" "03:25PM" "03:25PM" "03:30PM"
 #[57] "03:35PM" "03:35PM" "03:45PM" "03:45PM" "03:00PM" "04:00PM" "04:10PM"
 #[64] "04:10PM" "04:30PM" "04:30PM" "04:35PM" "04:35PM" "04:00PM" "05:00PM"
 #[71] "05:10PM" "05:10PM" "05:20PM" "05:30PM" "05:30PM" "05:35PM" "05:35PM"
 #[78] "05:40PM" "05:40PM" "05:45PM" "05:50PM" "05:50PM" "06:00PM" "06:05PM"
 #[85] "06:10PM" "06:10PM" "06:15PM" "06:30PM" "06:30PM" "06:00PM" "07:00PM"
 #[92] "07:10AM" "07:10PM" "07:10AM" "07:10PM" "07:30PM" "07:35PM" "07:35PM"
 #[99] "07:45PM" "07:45PM" "07:00AM" "07:00PM" "08:00AM" "08:10AM" "08:10AM"
 #[106] "08:25PM" "08:25PM" "08:50PM" "08:00AM" "09:00AM" "09:10AM" "09:10AM"
 #[113] "09:15AM" "09:15AM" "09:20AM" "09:30AM" "09:30AM" "09:35AM" "09:40AM"
 #[120] "09:40AM" "09:45AM" "09:45AM" "09:00AM"

данные

str1 <- c("1:00PM", "1:10 PM", "1:10PM", "1:20 PM", "1:30 PM", "1:30PM", 
"1:40 PM", "10:00AM", "10:10 AM", "10:10AM", "10:30 AM", "10:30AM", 
"10:45 AM", "10:45AM", "10:50 AM", "10:50AM", "10AM", "11:00AM", 
"11:10 AM", "11:10AM", "11:40 AM", "11:40AM", "11AM", "12:00PM", 
"12:05 PM", "12:10 PM", "12:10PM", "12:25PM", "12:30 PM", "12:30PM", 
"12:45 PM", "12:45:30 PM", "12:45PM", "12:50 PM", "12PM", "1PM", 
"2:00PM", "2:10 PM", "2:10PM", "2:20PM", "2:30 PM", "2:30PM", 
"2:35 PM", "2:45 PM", "2:45PM", "2:55 PM", "2PM", "3:00PM", "3:05 PM", 
"3:10 PM", "3:10PM", "3:20 PM", "3:20PM", "3:25 PM", "3:25PM", 
"3:30 PM", "3:35 PM", "3:35PM", "3:45 PM", "3:45PM", "3PM", "4:00PM", 
"4:10 PM", "4:10PM", "4:30 PM", "4:30PM", "4:35 PM", "4:35PM", 
"4PM", "5:00PM", "5:10 PM", "5:10PM", "5:20 PM", "5:30 PM", "5:30PM", 
"5:35 PM", "5:35PM", "5:40 PM", "5:40PM", "5:45 PM", "5:50 PM", 
"5:50PM", "6:00PM", "6:05PM", "6:10 PM", "6:10PM", "6:15PM", 
"6:30 PM", "6:30PM", "6PM", "7:00PM", "7:10 AM", "7:10 PM", "7:10AM", 
"7:10PM", "7:30PM", "7:35 PM", "7:35PM", "7:45 PM", "7:45PM", 
"7AM", "7PM", "8:00AM", "8:10 AM", "8:10AM", "8:25 PM", "8:25PM", 
"8:50 PM", "8AM", "9:00AM", "9:10 AM", "9:10AM", "9:15 AM", "9:15AM", 
"9:20 AM", "9:30 AM", "9:30AM", "9:35AM", "9:40 AM", "9:40AM", 
"9:45 AM", "9:45AM", "9AM")
person akrun    schedule 07.06.2015
comment
Первое решение с использованием strsplit и sapply работало отлично. Большое спасибо за элегантное решение. - person WykoW; 07.06.2015