Aquileo | MiG-NJU/OmniVideo-100K · Datasets at Hugging Face
Dataset Viewer
Auto-converted to Parquet Duplicate
video_id
stringlengths
11
11
question_id
stringlengths
24
44
search_tag
stringclasses
195 values
language
stringclasses
2 values
duration
int64
60
180
metadata
null
video_path
stringlengths
22
22
resolution
stringclasses
46 values
task
stringclasses
10 values
subtask
stringclasses
4 values
question
stringlengths
3
335
options
listlengths
4
4
answer
stringclasses
4 values
analysis
unknown
events
listlengths
2
5
question_indexed
stringlengths
138
706
options_indexed
listlengths
2
4
question_textual
stringlengths
55
301
options_textual
listlengths
2
4
tZ-QcbrBjaw
tZ-QcbrBjaw_fine_grained_perception_1
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
fine_grained_perception
Vision-Guided
As a pair of pink scissors is seen cutting a sheet of white paper, what specific process is the reporter explaining?
[ "The reporter is describing the procedure for lodging formal complaints with state or territory regulators regarding childcare services.", "The reporter is suggesting that families begin by checking the national register of approved providers and their quality ratings.", "The reporter is emphasizing the importa...
A
{ "connections": "This question uses a distinct visual event (cutting paper with pink scissors) as the temporal anchor. To answer correctly, one must disregard the visual action's literal meaning and instead attend to the concurrent voiceover to extract the specific administrative advice (making complaints) being del...
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_fine_grained_perception_0
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
fine_grained_perception
Audio-Guided
When the speaker mentions the specific phrase "fancy chef," what natural object is visible on the ground next to the toy monster truck?
[ "A plastic grey mountain toy with painted vegetation positioned beside a curved wooden track.", "A red plastic tub filled with muddy water containing a green watering can.", "A pair of blue and yellow bucket stilts standing upright on a patch of green grass.", "A large, reddish-brown scallop shell located on ...
D
{ "connections": "This question relies on identifying a unique and idiosyncratic audio cue (\"fancy chef\") within the stream of speech. The viewer must correlate this specific phrase with the simultaneous visual footage to identify a subtle background detail (the scallop shell) situated among rocks and dirt, rather ...
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_scene_transformation_detection_0
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
scene_transformation_detection
null
What visual changes take place while Ellen Coulter advises families to check the national register of approved providers?
[ "The scene shifts to a close-up of a wooden play surface where a child's hand pushes a white train along a curved track, eventually connecting it to a blue engine near a grey plastic mountain toy.", "The scene moves from an interview setting to a high-angle view of a child reaching into a red tub of muddy water, ...
C
{ "connections": "The defining audio event is Ellen Coulter's spoken advice regarding the national register. This serves as a temporal marker that frames a sequence of visual cuts—from the feet on the platform to the children on wheeled toys and finally the bucket stilts—requiring the user to listen to the specific a...
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_scene_transformation_detection_1
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
scene_transformation_detection
null
While Ellen Coulter explains how to lodge complaints with regulators or contact emergency services, what significant shift occurs in the visual setting?
[ "The footage changes from a rear view of children riding scooters along a paved outdoor path, to a ground-level close-up of a child's foot stepping onto blue and yellow bucket stilts.", "The view shifts from a close-up of hands inspecting the sole of a black shoe, to an indoor shot of a toddler sitting on the flo...
C
{ "connections": "The audio event is the reporter's explanation of the complaints process and the mention of calling 000. This narration marks the timeframe where the visual perspective shifts from a focused arts-and-crafts activity to a general view of children moving in the play area, necessitating cross-modal atte...
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_context_understanding_1
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
context_understanding
Audio Context
While the video presents a shot of a well-organized play table with dolls in a yellow basin, what negative sign concerning the staff does the speaker advise parents to notice?
[ "High turnover rates indicating an unsupportive workplace.", "Children actively running away from the educators.", "Staff members appearing to be looking visibly stressed.", "A lack of formal qualifications among the providers." ]
C
{ "connections": "The visual establishes the physical environment of a childcare center that parents might inspect, while the audio provides a specific, behavioral criterion (staff stress) to evaluate within that setting which cannot be seen in the static shot itself." }
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_context_understanding_0
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
context_understanding
Visual Context
When the speaker mentions superficial amenities like a "fancy chef" or a "fancy app," what specific toy is a child seen maneuvering in the dirt?
[ "A green plastic watering can being dipped into a red tub of muddy water near a wooden ladder", "A white wooden train engine with red wheels being pushed along a curved wooden track", "A blue and yellow bucket stilt being stepped on by a child wearing a black patterned sneaker", "A green and black monster tru...
D
{ "connections": "The visual provides a thematic contrast to the audio; while the speaker lists high-end, marketable features, the video grounds the reality of childcare in simple, gritty play, illustrating the \"relationship\" aspect mentioned subsequently." }
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_comparison_1
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
comparison
null
How do the audio and visual elements distinguish between a positive environment fostering attachment and a potentially negative environment indicating staff stress?
[ "Outdoor physical development versus indoor artistic expression", "Active sensory engagement versus static, unoccupied environments", "Structured educational programs versus unstructured free time", "Enthusiastic staff interaction versus visible employee fatigue" ]
B
{ "designated_segments": "[00:50 - 01:02]\n[01:16 - 01:26]", "connections": "These segments are related as they provide the viewer with signals to look for regarding the state of the childcare environment, distinguishing between a \"green flag\" (positive attachment) and a \"red flag\" (staff stress) through the sy...
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_comparison_0
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
comparison
null
How does the primary criterion for evaluating childcare quality distinguish itself between the discussion of provider databases and the later evaluation of service amenities?
[ "From administrative compliance to interpersonal connection", "From physical safety to nutritional standards", "From facility maintenance to digital integration", "From academic programming to behavioral regulation" ]
A
{ "designated_segments": "[00:00 - 00:11]\n[01:02 - 01:16]", "connections": "These segments are conceptually related as they both offer criteria for defining \"quality\" in childcare, but they differ significantly in their focus—structural versus interpersonal—which is revealed by combining audio advice with the ac...
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_sentiment_analysis_0
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
sentiment_analysis
null
What underlying sentiment does Georgie Dent reveal through her demeanor when contrasting systemic metrics with the human element of childcare?
[ "Critical skepticism of regulatory oversight", "Optimistic faith in interpersonal bonds", "Dismissive attitude toward material resources", "Resigned frustration with workplace instability" ]
B
{ "designated_segments": "00:11 - 00:24\n01:02 - 01:16\n01:26 - 01:36", "connections": "Georgie Dent’s character trait of optimistic resilience is revealed by contrasting her critical audio in early and late segments with her visual demeanor in the middle segment. In the audio of **[00:11 - 00:24]** and **[01:26 - ...
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_sentiment_analysis_1
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
sentiment_analysis
null
What underlying sentiment regarding the care environment is conveyed by linking the commentary on staff demeanor with the imagery of the unattended art supplies?
[ "A tone of creative engagement regarding skill building", "An attitude of rigid discipline regarding safety rules", "A sense of passive neglect stemming from burnout", "A mood of chaotic panic surrounding immediate hazards" ]
C
{ "designated_segments": "01:16 - 01:26\n01:36 - 01:48", "connections": "The sentiment of *latent danger* and *neglect* is constructed by linking a specific warning in the audio with a specific outcome in the visuals of a later segment. In the audio of **[01:16 - 01:26]**, Anne-Marie Morrissey verbally identifies \...
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_summarization_0
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
summarization
null
What three-step course of action is outlined for parents regarding childcare services in the provided segments?
[ "Consulting the national quality register, visiting to observe the atmosphere, and reporting issues to authorities", "Interviewing the head educators, reviewing the daily learning program, and checking the nutritional value of meals", "Inspecting the outdoor facilities, ensuring strict behavior management, and ...
A
{ "designated_segments": "[00:00 - 00:11]\n[00:41 - 00:50]\n[01:36 - 01:48]", "connections": "These non-consecutive segments collectively outline the specific actions parents must take throughout the lifecycle of childcare engagement. The topic begins in the first segment **[00:00 - 00:11]**, where the audio establ...
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_summarization_1
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
summarization
null
What dual warning does the video provide regarding the reliance on official data and the assessment of the childcare environment?
[ "Official ratings prioritize safety, requiring parents to separately investigate staff qualifications.", "Official ratings are often delayed, indicating a need to monitor for high staff turnover rates.", "Official ratings may be infrequent, necessitating personal checks for signs of staff stress.", "Official ...
C
{ "designated_segments": "[00:11 - 00:24]\n[01:16 - 01:26]", "connections": "These segments work together to warn parents about different types of risks—one bureaucratic and one behavioral. The first segment **[00:11 - 00:24]** introduces a systemic red flag: the audio explains that quality ratings may not be \"ass...
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_causal_reasoning_1
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
causal_reasoning
null
According to the expert analysis, what is the fundamental determinant of quality that supersedes the presence of high-end amenities or resources?
[ "The regularity of national quality standard assessments", "The provision of upscale amenities and modern technology", "The nature of relationships between staff and children", "The level of formal qualifications possessed by the staff" ]
C
{ "designated_segments": "[00:11 - 00:24]\n[01:02 - 01:16]", "connections": "This group reveals that the visual cues presented early in the video are potential \"red herrings\" regarding quality, a fact only revealed by later audio.\n1. **Visual Setup:** The segment **[00:11-00:24]** visually showcases pristine, h...
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_causal_reasoning_0
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
causal_reasoning
null
What specific physical behavior demonstrated by the educator is presented as the foundation for creating an environment where children voluntarily approach and share with staff?
[ "Adjusting the wooden toy tracks", "Lowering herself to the floor level", "Discussing behavior management plans", "Maintaining a standing observation role" ]
B
{ "designated_segments": "[00:24 - 00:34]\n[00:41 - 00:50]\n[00:50 - 01:02]", "connections": "A complete causal chain regarding \"quality care\" is established only by linking the visuals of one segment with the expert audio of two later segments.\n1. **Visual Evidence:** In the segment **[00:24-00:34]**, the visu...
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_future_prediction_0
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
future_prediction
null
If prospective parents apply the reporter's specific advice to the observed behavior of the staff member kneeling on the floor, what assessment are they most likely to make?
[ "They will judge the staff positively for demonstrating active engagement.", "They will prioritize verifying the staff's written academic qualifications.", "They will perceive the staff's floor-level position as unprofessional behavior.", "They will critique the facility for lacking a strictly silent environm...
A
{ "designated_segments": "[00:24 - 00:34]\n[00:41 - 00:50]", "connections": "This prediction synthesizes the expert criteria provided in the audio with the observational evidence provided in the visuals.\n* **Audio Clue:** In the segment **[00:41 - 00:50]**, reporter Ellen Coulter conveys expert advice stating th...
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_future_prediction_1
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
future_prediction
null
If a parent recognizes the cluttered environment as a warning sign of staff stress, what is the most likely subsequent action they will take?
[ "Check the national register for the provider's rating", "Inquire directly about the staff's formal qualifications", "Call emergency services to report immediate danger", "Lodge a complaint with the state or territory regulator" ]
D
{ "designated_segments": "[01:16 - 01:26]\n[01:36 - 01:48]", "connections": "This prediction connects the identification of warning signs with the prescribed administrative solution.\n* **Audio/Visual Clue:** In segment **[01:16 - 01:26]**, expert Anne-Marie Morrissey identifies a key red flag: \"staff who are lo...
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_hypothetical_reasoning_0
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
hypothetical_reasoning
null
If a childcare facility featured the high-quality play equipment shown in the wooden train sequence but also exhibited the specific 'red flag' regarding staff demeanor described by the experts, what would be the logical conclusion regarding its quality?
[ "The facility would be deemed inadequate, as positive educator-child relationships are explicitly stated to matter more than material resources.", "The facility would be considered high-quality, as the presence of expensive educational toys suggests the center is well-funded and safe.", "The facility would be r...
A
{ "designated_segments": "[00:11 - 00:24]\n[01:02 - 01:16]\n[01:16 - 01:26]", "connections": "This scenario establishes a hierarchy where emotional safety overrides material wealth, using evidence separated by significant time gaps.\n- **Visual Evidence:** Segment [00:11 - 00:24] establishes the presence of materia...
null
null
null
null
null
tZ-QcbrBjaw
tZ-QcbrBjaw_hypothetical_reasoning_1
childcare
en
129
null
videos/tZ-QcbrBjaw.mp4
854x480
hypothetical_reasoning
null
If the child playing with the monster truck were to sustain a severe injury from the scallop shell, placing them in immediate danger, which action does the reporter explicitly advise?
[ "Call 000 immediately", "Contact the state regulator", "Notify the center director", "Check the national register" ]
A
{ "designated_segments": "[01:02 - 01:16]\n[01:36 - 01:48]", "connections": "This scenario links a specific environmental hazard to a procedural remedy that appears much later in the video.\n- **Visual Evidence:** Segment [01:02 - 01:16] depicts a child playing in an environment containing potential hazards: a \"Mo...
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_fine_grained_perception_0
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
fine_grained_perception
Audio-Guided
What specific visual content is displayed on the screen at the exact moment the sound of a camera shutter clicking is heard?
[ "A mirror reflection shows the woman holding a pink smartphone in front of her face to take a picture of herself while wearing the voluminous One Million Dollar dress.", "A montage of still photographs appears over the footage, including an image of the woman in a formal gown inside an elevator and another of a c...
B
{ "connections": "This question requires identifying a brief, non-verbal sound effect (the camera shutter) and strictly correlating it with the simultaneous visual editing technique (the photo montage) that interrupts the continuous video flow." }
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_fine_grained_perception_1
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
fine_grained_perception
Vision-Guided
As a store attendant is seen lifting a floor-length sheer veil and draping it over the woman's head and shoulders, what reasoning does the woman provide for adding this accessory?
[ "She mentions that she realized she gravitates towards dresses with a bolero after wearing one for her civil wedding and wants to see if this accessory creates a similar effect.", "She explains that she decided to try the dress with the veil to see if she would love the look just a bit more.", "She notes that t...
B
{ "connections": "This question uses the unique visual action of the veil being placed on the woman as a cue to pinpoint the specific line of dialogue where she explains the motivation behind that styling choice." }
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_scene_transformation_detection_1
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
scene_transformation_detection
null
How does the camera perspective change as the woman realizes that wearing the dress has steered her direction toward big ball gowns?
[ "The view changes from a medium shot of her describing the pink floral accents to a mirror reflection where she obscures her face with a smartphone to record the fitting.", "The perspective transitions from a side angle of her admiring the lace sleeves to a shot following her as she walks toward the large wall mi...
C
{ "connections": "The defining audio event is the woman's spoken realization about her preference for ball gowns, which acts as the temporal clue for the cinematic transition from a static high-angle shot to a dynamic movement shot." }
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_scene_transformation_detection_0
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
scene_transformation_detection
null
What location change takes place while the woman invites the audience to join her for her first ever wedding dress shopping?
[ "The setting shifts from the interior of the bridal salon's showroom to the sidewalk outside as she exits the building carrying shopping bags.", "The camera cuts from her standing in the spacious reception area to a view of her examining the fabric of a lace gown.", "The footage moves from the woman exiting a v...
D
{ "connections": "The defining audio event is the woman's introductory invitation (\"Come with me...\"), which serves as the temporal marker for the visual shift from the exterior establishing shot to the inside of the store.\n" }
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_context_understanding_1
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
context_understanding
Audio Context
What motivation does the woman articulate for having the store attendant add a long veil to the fitted Thelma dress after she has already modeled it?
[ "She believed the camera missed the glycerin effect and hoped the extra layer would enhance the visual texture.", "She found the dress personally underwhelming and wanted to see if the accessory would make her love it more.", "She felt the silhouette was too casual for a wedding and tried to elevate it from a r...
B
{ "connections": "The visual stream depicts the physical action of modifying the look with an accessory, while the audio reveals the internal critique and specific reasoning (trying to salvage an \"underwhelming\" dress) that prompted the change." }
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_context_understanding_0
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
context_understanding
Visual Context
When the narrator references her "civil wedding" attire as a benchmark for her current dress expectations, what specific visual imagery is shown to illustrate that past event?
[ "A montage of still photos appears over the footage, depicting the woman in a lace bolero jacket standing outside a courthouse with her husband.", "A montage of still photos appears over the footage, featuring the woman in a formal gown inside an elevator alongside a man in a tuxedo.", "A continuous sequence sh...
B
{ "connections": "The audio introduces a past event (the civil wedding) to establish a standard for the current shopping trip, while the visual context provides the necessary historical evidence (the specific photo of the couple) to concretize what that standard looks like." }
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_comparison_1
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
comparison
null
How does the woman's physical presentation during her final departure from the storefront differ from her initial arrival?
[ "She wears a bridal veil, whereas she arrived in a casual jacket", "She carries shopping bags, whereas she arrived with only a handbag", "She holds a microphone, whereas she arrived holding a smartphone", "She walks with a partner, whereas she arrived walking alone" ]
B
{ "designated_segments": "00:00 - 00:11\n01:48 - 02:02", "connections": "This group distinguishes the beginning and end states of the event, utilizing identical main entities (the woman in the brown jacket) to highlight a change in status. In the initial segment (00:00-00:11), the visual shows the woman entering th...
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_comparison_0
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
comparison
null
How does the subject's stated emotional response to the voluminous ballgown compare to her assessment of the fitted mermaid silhouette?
[ "The ballgown evokes a regal, queen-like feeling, while the fitted silhouette is deemed personally underwhelming.", "The ballgown is admired for its pink accents, whereas the fitted silhouette is favored for its superior photogenic qualities.", "The ballgown feels like a princess fantasy, while the fitted silho...
A
{ "designated_segments": "01:07 - 01:19\n01:19 - 01:23\n01:34 - 01:48", "connections": "These segments contrast two distinct categories of dress construction and the subject's resulting emotional state. In the segment 01:07-01:19, the visual evidence emphasizes the massive width and volume of the \"One Million Doll...
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_sentiment_analysis_0
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
sentiment_analysis
null
What sentiment characterizes the protagonist's internal reaction to the style of the first dress, the 'Majestic'?
[ "Astonishment at her unexpected affinity for lace.", "Validation of her specific pre-existing preference.", "Nostalgia triggered by the salon's familiar atmosphere.", "Uncertainty about the fit of the structured bodice." ]
A
{ "designated_segments": "[00:11 - 00:24]\n[00:29 - 00:41]", "connections": "The sentiment here is a genuine surprise at one's own changing tastes. In the audio of segment [00:29 - 00:41], the woman verbally confesses, \"I had no idea I was going to gravitate towards a whole lace dress,\" indicating a mental discon...
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_sentiment_analysis_1
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
sentiment_analysis
null
What specific internal state motivated the protagonist to sit on the floor while modeling the second gown?
[ "A newfound sense of regal majesty", "An intense feeling of physical fatigue", "A mood of casual, playful whimsy", "A state of deep romantic nostalgia" ]
A
{ "designated_segments": "[00:55 - 01:07]\n[01:07 - 01:19]", "connections": "The sentiment is the transition from simply wearing a dress to inhabiting a persona. The visual in segment [01:07 - 01:19] shows the woman engaging in the unconventional behavior of sitting on the floor of the bridal salon, allowing the dr...
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_event_sequence_ordering_1
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
event_sequence_ordering
null
null
null
C
{ "designated_segments": "[01:19 - 01:23]\n[01:34 - 01:48]\n[01:48 - 02:02]", "connections": "This group clarifies the timeline of the \"Thelma\" dress fitting, distinguishing between the initial look, the critique, and the modified look with an accessory. The visual similarity of the dress in all clips requires au...
[ "Decides to add accessory due to underwhelming feel", "Models fitted mermaid dress with sheer sleeves, no headpiece", "Stands with long sheer veil draped over head" ]
What is the correct chronological order of the following events during the fitting of the "Thelma" dress? (1) Decides to add accessory due to underwhelming feel (2) Models fitted mermaid dress with sheer sleeves, no headpiece (3) Stands with long sheer veil draped over head
[ "(1) → (2) → (3)", "(3) → (2) → (1)", "(2) → (1) → (3)", "(2) → (3) → (1)" ]
What is the correct chronological order of the events during the fitting of the "Thelma" dress?
[ "Decides to add accessory due to underwhelming feel → Models fitted mermaid dress with sheer sleeves, no headpiece → Stands with long sheer veil draped over head", "Stands with long sheer veil draped over head → Models fitted mermaid dress with sheer sleeves, no headpiece → Decides to add accessory due to underwh...
UQNo9LX24lE
UQNo9LX24lE_event_sequence_ordering_0
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
event_sequence_ordering
null
null
null
B
{ "designated_segments": "[00:11 - 00:24]\n[00:24 - 00:29]\n[00:29 - 00:41]", "connections": "This sequence reconstructs the progression from selecting a dress on the rack to physically preparing for the fitting, which involves disparate shots of the woman in street clothes, a robe, and finally the gown.\nCorrect C...
[ "Woman stands on pedestal wearing 'Bride' robe", "Attendants fasten bodice and collar of Majestic dress", "Woman in brown jacket points to first dress" ]
What is the correct chronological sequence for the following events based on the video? (1) Woman stands on pedestal wearing 'Bride' robe (2) Attendants fasten bodice and collar of Majestic dress (3) Woman in brown jacket points to first dress
[ "(1) → (3) → (2)", "(3) → (1) → (2)", "(2) → (3) → (1)", "(1) → (2) → (3)" ]
What is the correct chronological sequence for the events based on the video?
[ "Woman stands on pedestal wearing 'Bride' robe → Woman in brown jacket points to first dress → Attendants fasten bodice and collar of Majestic dress", "Woman in brown jacket points to first dress → Woman stands on pedestal wearing 'Bride' robe → Attendants fasten bodice and collar of Majestic dress", "Attendant...
UQNo9LX24lE
UQNo9LX24lE_summarization_1
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
summarization
null
What is the primary purpose of the woman's visit as established by the video's introduction and conclusion?
[ "Shopping for a formal wedding dress after having a civil ceremony", "Selecting a gown specifically for her upcoming civil wedding ceremony", "Modeling various dresses for the brand's promotional marketing campaign", "Returning a dress she previously purchased for her civil wedding" ]
A
{ "designated_segments": "[00:00 - 00:11]\n[01:48 - 02:02]", "connections": "These segments function as the narrative bookends for the shopping trip, linked by consistent visual elements and progressing audio context. The first segment [00:00 - 00:11] visually establishes the \"before\" state, showing the woman in ...
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_summarization_0
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
summarization
null
What factors contribute to the woman's strong preference for the first dress, the 'Majestic', after trying it on?
[ "She prefers the dress because it offers a stark stylistic departure from her civil wedding attire, ensuring a completely unique look for this ceremony.", "She discovers an unexpected appreciation for the lace fabric and recognizes a familiar preference for the bolero style used in her civil wedding.", "She foc...
B
{ "designated_segments": "[00:00 - 00:11]\n[00:29 - 00:41]\n[00:41 - 00:55]", "connections": "The development of this topic spans the initial browsing phase to the final fitting review. In the first segment [00:00 - 00:11], the topic is introduced visually as the woman reaches out to specifically touch and examine ...
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_causal_reasoning_0
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
causal_reasoning
null
What specific experience prompted the woman to identify her desired wedding dress style?
[ "Wearing the expansive \"One Million Dollar\" ballgown", "Modeling the \"Majestic\" lace dress with a bolero", "Noting the studio's similarity to a famous TV show", "Pairing the fitted \"Thelma\" dress with a long veil" ]
A
{ "designated_segments": "00:11 - 00:24\n00:55 - 01:07\n01:07 - 01:19", "connections": "The narrative arc of \"finding direction\" is established and resolved through a specific visual catalyst that spans non-consecutive segments.\n1. In the first segment **[00:11 - 00:24]**, the audio identifies the root cause of...
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_causal_reasoning_1
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
causal_reasoning
null
What is the primary reason the woman found the third, fitted dress "underwhelming" despite acknowledging its beauty?
[ "The camera failed to capture the dress's true glistening effect.", "The fitted silhouette felt too restrictive compared to the previous dress.", "She discovered a preference for the grandeur of voluminous ball gowns.", "She considered the sheer vertical panels to be visually unappealing." ]
C
{ "designated_segments": "01:07 - 01:19\n01:19 - 01:23\n01:34 - 01:48", "connections": "The reason for the third dress being rejected is not inherent to the dress itself but is caused by the high emotional standard set by the previous dress.\n1. In the first segment **[01:07 - 01:19]**, the audio establishes a ben...
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_future_prediction_0
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
future_prediction
null
Based on the protagonist's stated realization regarding dress silhouettes and her specific reactions to the gowns tried on, which style of wedding dress is she most likely to choose for her ceremony?
[ "A form-fitting mermaid silhouette", "A voluminous ballgown silhouette", "A simple civil ceremony style", "A sleek straight-cut column style" ]
B
{ "designated_segments": "[01:07 - 01:19]\n[01:34 - 01:48]\n[01:48 - 02:02]", "connections": "This prediction is built on the synthesis of the woman's contrasting reactions to two distinct dress shapes. In the first segment [01:07 - 01:19], the visual of her wearing the voluminous One Million Dollar dress is paired...
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_future_prediction_1
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
future_prediction
null
Given the contrast between the woman's initial stated intent and her visual status upon departure, what is the most likely subsequent scenario?
[ "She continues her search at other salons without having made a purchase.", "She contacts the boutique to finalize an order for the second dress.", "She opens the packages she was seen carrying while leaving the store.", "She attends her civil wedding ceremony in the third dress she tried on." ]
C
{ "designated_segments": "[00:11 - 00:24]\n[01:48 - 02:02]", "connections": "This prediction relies on detecting a contradiction between the subject's stated intent and the visual outcome. In the first segment [00:11 - 00:24], the woman's audio frames the visit as exploratory, stating the goal is to \"try some dres...
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_hypothetical_reasoning_0
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
hypothetical_reasoning
null
If the 'One Million Dollar' dress had visually presented a fitted silhouette rather than a voluminous ballgown, how would the woman's subsequent evaluation of the 'Thelma' dress likely differ?
[ "She would still find the Thelma dress underwhelming solely because the camera visuals failed to properly capture the dress's glycerin effect.", "Her realization preferring expansive volume would not have occurred, removing the specific comparative basis for finding the fitted Thelma dress underwhelming.", "Her...
B
{ "designated_segments": "[00:55 - 01:07]\n[01:07 - 01:19]\n[01:19 - 01:23]\n[01:34 - 01:48]", "connections": "This scenario links the visual attributes of the second dress to the audio rejection of the third dress. In the segments [00:55 - 01:07] and [01:07 - 01:19], the visual evidence establishes that the \"One ...
null
null
null
null
null
UQNo9LX24lE
UQNo9LX24lE_hypothetical_reasoning_1
shopping
en
122
null
videos/UQNo9LX24lE.mp4
480x854
hypothetical_reasoning
null
If the 'Majestic' dress had been designed as a fixed strapless gown without the separate overlay components added during the fitting, which of the woman's observations would be rendered factually inconsistent?
[ "Her admission that she was surprised to fall in love with a \"whole lace dress\"", "Her comment that the studio atmosphere reminded her of \"Say Yes to the Dress\"", "Her recognition of a pattern of choosing \"boleros\" similar to her civil wedding look", "Her observation that the dress felt \"regal\" and \"...
C
{ "designated_segments": "[00:00 - 00:11]\n[00:29 - 00:41]\n[00:41 - 00:55]", "connections": "This scenario connects the physical assembly of a specific dress to the subject's stated history. Segment [00:29 - 00:41] provides crucial visual evidence where store attendants are seen adding separate components—specific...
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_fine_grained_perception_1
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
fine_grained_perception
Vision-Guided
What distinctive sound effect is audible at the precise moment the man unfolds the sheer orange-red dupatta and extends his arms to display it?
[ "A short \"beep\" sound effect marks the moment, similar to the alert heard earlier when the lehenga skirt's flare was showcased.", "A brief clip of vocal music plays in the background, echoing the soundtrack used when the skirt was first removed from the package.", "A distinctive \"whoosh\" sound effect coinci...
C
{ "connections": "This question targets a subtle editing detail where a sound effect is synchronized with a visual movement; answering correctly requires linking the specific gesture of opening the dupatta to the concurrent non-speech audio event." }
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_fine_grained_perception_0
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
fine_grained_perception
Audio-Guided
What specific visual action is the man performing with the red lehenga skirt during the brief moment when a snippet of vocal music is heard?
[ "He lifts the hem of the skirt high, turning the bottom portion upward to reveal layers of stiff, pinkish-white netting and lining underneath.", "He grips the plastic-wrapped package with both hands and tears open the clear covering to uncover the rust-red fabric and its dense embroidery.", "He lowers the skirt...
C
{ "connections": "This question relies on identifying a unique, transient audio cue (the short clip of vocal music) and correlating it with the specific visual climax of the presentation where the garment is fully fanned out on the floor." }
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_scene_transformation_detection_1
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
scene_transformation_detection
null
What change in camera perspective coincides with the whooshing sound?
[ "The camera zooms in significantly to focus on the heavy metallic embroidery of the blouse while the man displays it against the skirt.", "The camera pans slowly downwards to highlight the vertical floral patterns and sequin details covering the surface of the red fabric.", "The camera tilts downwards to frame ...
D
{ "connections": "The whooshing sound acts as an audio cue for a significant camera movement, specifically zooming out from a detail-oriented shot to a broader view as the presenter transitions to showing the dupatta." }
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_scene_transformation_detection_0
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
scene_transformation_detection
null
What visual transition occurs when the beep sound is heard?
[ "The scene cuts from the man walking down the aisle with the sack to him standing centrally in the aisle preparing to cut the knot.", "The scene cuts from the man spreading the skirt out on the floor to him standing upright and holding the skirt by its waistband.", "The scene cuts from the man tearing the clear...
B
{ "connections": "The beep sound serves as a distinct temporal marker indicating a jump cut between two different shots: one where the presenter is bending down to arrange the garment on the ground, and the next where he is standing up to display it.\n" }
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_context_understanding_0
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
context_understanding
Visual Context
When the presenter verbally encourages the audience to "check out its flares" and "check out its grace," what specific visual demonstration is he performing with the Red Lehenga Skirt?
[ "He lifts the hem high to turn the bottom upward, revealing the layers of stiff netting and lining underneath.", "He lowers the heavy skirt to the floor and uses a sweeping motion to spread the fabric into a wide circle.", "He holds the skirt by the waistband and lifts it vertically to briefly display its volum...
B
{ "connections": "The audio uses abstract terms like \"grace\" and \"flares\" to describe the garment, while the visual provides the concrete evidence of these attributes through the specific action of fanning the skirt out on the floor to display its full volume." }
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_context_understanding_1
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
context_understanding
Audio Context
As the man stands holding the Scissors and cutting the stitching of the large White Sack, how does he characterize the future popularity of the item inside to build anticipation?
[ "Urges viewers to hold their breath and immediately call to book the unveiled item.", "Guarantees the dress is specially commissioned and offers the lowest price in the market.", "Describes the item as a beautiful rust-colored lehenga with full flares and grace.", "Claims the dress will be the biggest hit on ...
D
{ "connections": "The visual action depicts the mundane opening of a rough wholesale package, but the audio recontextualizes this action as the premiere of a high-value, viral fashion item, explaining the presenter's excitement." }
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_comparison_0
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
comparison
null
How does the method of opening the package differ when the narrator highlights the dress's online popularity compared to when he mentions it was specially commissioned?
[ "Cutting the gathered knot versus slicing the horizontal seam", "Slicing the horizontal seam versus cutting the gathered knot", "Tearing the fabric opening versus slicing the vertical edge", "Removing the adhesive tape versus breaking the thread seal" ]
B
{ "designated_segments": "00:00 - 00:14\n00:26 - 00:39", "connections": "Both segments depict the event of the Man utilizing the Scissors to breach the White Sack, but they illustrate distinct mechanical steps in the unpacking process. In the first segment (00:00 - 00:14), the Visual evidence shows the Man slicing ...
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_comparison_1
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
comparison
null
How do the material characteristics of the lehenga skirt differ from those of the dupatta, as indicated by the presenter's visual demonstration?
[ "Fluid, draping softness vs. Rigid, heavy structural weight", "Heavy, opaque volume vs. Lightweight, sheer transparency", "Stiff, reinforced lining vs. Dense, opaque metallic embroidery", "Matte, solid coloration vs. Glossy, reflective surface texture" ]
B
{ "designated_segments": "00:39 - 00:51\n01:05 - 01:18", "connections": "These segments contrast two distinct components of the same \"Red Lehenga\" ensemble based on their material properties. In the segment 00:39 - 00:51, the Audio highlights the skirt's \"flares\" and \"grace,\" synthesized with Visuals of the M...
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_sentiment_analysis_1
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
sentiment_analysis
null
What tone does the protagonist adopt to frame the unboxing of the garment throughout the sequence?
[ "A pragmatic tone focused on the efficiency of inventory management.", "A dramatic tone characterized by calculated suspense and theatrical flair.", "A chaotic tone reflecting the urgency of a high-pressure retail environment.", "A skeptical tone suggesting uncertainty about the product's quality." ]
B
{ "designated_segments": "00:00 - 00:14\n00:14 - 00:26\n00:39 - 00:51", "connections": "The Man creates a sentiment of *dramatic suspense* that relies on a buildup-and-payoff structure across three segments. The audio in [00:00 - 00:14] and [00:14 - 00:26] sets a high bar, declaring the item a \"biggest hit\" and c...
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_sentiment_analysis_0
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
sentiment_analysis
null
What specific sentiment drives the man's assurance when demonstrating the exclusivity of the garment?
[ "Excitement limited to the visual appeal of the embroidery", "Assertiveness regarding the competitive pricing guarantees", "Confidence substantiated by the garment's internal construction", "Anticipation focused on the initial unboxing process" ]
C
{ "designated_segments": "00:26 - 00:39\n00:51 - 01:05", "connections": "The Man's sentiment of absolute assurance regarding the product's value is established non-linearly. In the audio of segment [00:26 - 00:39], he makes abstract verbal claims, stating the dress was \"specially commissioned\" and that the \"pric...
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_event_sequence_ordering_0
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
event_sequence_ordering
null
null
null
A
{ "designated_segments": "[00:00 - 00:14]\n[00:14 - 00:26]\n[00:26 - 00:39]\n[00:39 - 00:51]", "connections": "Correct Chronological Order:\n1. The Man introduces the video and cuts the top seam of the White Sack at the front counter.\n2. The Man picks up the sack and walks from the front counter to the rear aisle....
[ "Man cuts gathered plastic knot in store aisle", "Presenter says \"Open up\" before extracting wrapped bundle", "Man cuts horizontal stitching near glass door" ]
What is the correct chronological sequence for the following events as the man opens the package? (1) Man cuts gathered plastic knot in store aisle (2) Presenter says "Open up" before extracting wrapped bundle (3) Man cuts horizontal stitching near glass door
[ "(3) → (1) → (2)", "(1) → (3) → (2)", "(2) → (3) → (1)", "(1) → (2) → (3)" ]
What is the correct chronological sequence for the events as the man opens the package?
[ "Man cuts horizontal stitching near glass door → Man cuts gathered plastic knot in store aisle → Presenter says \"Open up\" before extracting wrapped bundle", "Man cuts gathered plastic knot in store aisle → Man cuts horizontal stitching near glass door → Presenter says \"Open up\" before extracting wrapped bundl...
D084SlWm7Wk
D084SlWm7Wk_event_sequence_ordering_1
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
event_sequence_ordering
null
null
null
B
{ "designated_segments": "[00:39 - 00:51]\n[00:51 - 01:05]\n[01:05 - 01:18]\n[01:18 - 01:24]", "connections": "Correct Chronological Order:\n1. The Man reveals the Red Lehenga Skirt and spreads it on the floor to show its flare.\n2. The Man displays the Skirt's lining and then the front of the Choli.\n3. The Man di...
[ "Man displays front design of choli", "Man holds skirt and dupatta providing purchase instructions", "Man spreads lehenga skirt on floor", "Man unfolds sheer dupatta" ]
What is the correct chronological sequence for the following events? (1) Man displays front design of choli (2) Man holds skirt and dupatta providing purchase instructions (3) Man spreads lehenga skirt on floor (4) Man unfolds sheer dupatta
[ "(3) → (4) → (1) → (2)", "(3) → (1) → (4) → (2)", "(1) → (3) → (4) → (2)", "(3) → (1) → (2) → (4)" ]
What is the correct chronological sequence for the events?
[ "Man spreads lehenga skirt on floor → Man unfolds sheer dupatta → Man displays front design of choli → Man holds skirt and dupatta providing purchase instructions", "Man spreads lehenga skirt on floor → Man displays front design of choli → Man unfolds sheer dupatta → Man holds skirt and dupatta providing purchase...
D084SlWm7Wk
D084SlWm7Wk_summarization_0
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
summarization
null
What sequence of events describes the presenter's interaction with the merchandise in the video?
[ "Unpacking a rust-colored lehenga by cutting a sealed sack and removing inner plastic", "Tailoring a custom garment by cutting excess fabric and measuring the hemline", "Packaging a sold wedding dress into a secure bundle for shipment", "Retrieving a pink saree from a hanging rack to display embroidery detail...
A
{ "designated_segments": "[00:00 - 00:14]\n[00:26 - 00:39]\n[00:39 - 00:51]", "connections": "These non-consecutive segments collectively depict the complete chronological sequence of accessing the product, which is physically interrupted by the presenter moving through the store.\n- In the segment **[00:00 - 00:14...
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_summarization_1
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
summarization
null
Which summary best describes the purchasing guidelines and value proposition presented by the speaker across the video segments?
[ "He limits the sale exclusively to online auctions on social media platforms, explicitly discouraging customers from visiting the physical showroom.", "He urges viewers to book the specially commissioned lehenga via WhatsApp or by visiting the Rishikesh store, citing a guaranteed price.", "He targets wholesale ...
B
{ "designated_segments": "[00:14 - 00:26]\n[00:26 - 00:39]\n[01:18 - 01:24]", "connections": "This group compiles the fragmented purchasing information and sales rhetoric dispersed throughout the video to form a complete commercial guide.\n- In segment **[00:14 - 00:26]**, the audio creates urgency (\"immediately c...
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_causal_reasoning_0
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
causal_reasoning
null
What introductory claim serves as the premise for the onscreen text 'Sab se viral lengha' shown during the reveal?
[ "The guarantee that the item is a specially commissioned designer piece with a fixed price", "The instruction for viewers to hold their breath and book the item immediately", "The description of the dress having a unique rust color and heavy embroidery", "The assertion that the dress will be a top hit on Inst...
D
{ "designated_segments": "[00:00 - 00:14]\n[00:51 - 01:05]", "connections": "The causal chain begins in the audio of the first segment [00:00 - 00:14], where the man confidently predicts that the dress will be \"Instagram's biggest hit, YouTube's biggest hit.\" This auditory prediction sets up an expectation that i...
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_causal_reasoning_1
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
causal_reasoning
null
What motivates the specific camera focus on the intricate embroidery details during the later display of the garment?
[ "Substantiating the earlier assertion of the garment being a commissioned designer piece", "Displaying the price tag to validate the cost guarantee mentioned initially", "Revealing the brand logo to confirm the viral status claimed by the host", "Demonstrating the lightweight weaving to support a summer wear ...
A
{ "designated_segments": "[00:26 - 00:39]\n[01:05 - 01:18]", "connections": "In segment [00:26 - 00:39], the audio provides the \"cause\" or the premise of value: the man claims the dress was \"specially commissioned for you\" and implies it is a high-end \"designer\" piece with a guaranteed price. However, the vis...
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_future_prediction_1
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
future_prediction
null
What specific action will interested customers likely take regarding the showcased lehenga?
[ "They will attempt to purchase the garment at a Louis Philippe retail outlet.", "They will utilize an automated e-commerce link to order the item online.", "They will physically visit the Ambika Saree shop on Railway Road in Rishikesh.", "They will attend a scheduled auction to bid on the commissioned piece."...
C
{ "designated_segments": "[00:26 - 00:39]\n[01:18 - 01:24]", "connections": "The prediction that customers will physically visit the shop is derived from combining visual context with specific audio logistical details. The visual evidence in the segment [00:26 - 00:39] establishes the existence and specific atmosph...
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_future_prediction_0
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
future_prediction
null
Based on the presenter's closing instructions and the on-screen information, what is the most likely action interested viewers will take after the video ends?
[ "Visit the store's website to purchase the item through a cart system", "Wait for the presenter to unveil the remaining items in the sacks", "Contact the seller via the displayed number to secure a booking", "Subscribe to the channel to be notified when the item is in stock" ]
C
{ "designated_segments": "[00:14 - 00:26]\n[01:18 - 01:24]", "connections": "The prediction that viewers will contact the seller to initiate remote transactions is supported by synthesizing specific audio instructions with persistent visual cues across non-consecutive segments. In the segment from [00:14 - 00:26], ...
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_hypothetical_reasoning_1
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
hypothetical_reasoning
null
If the scissors utilized in the opening sequence were unavailable, how would the subsequent presentation of the 'rust-colored lehenga' most likely be affected?
[ "The unveiling would proceed as scheduled, since the package was merely knotted and could be opened manually without the tool.", "The audio commentary would automatically pause to accommodate the extra time needed to breach the packaging by hand.", "The stitched outer sack would act as a barrier, delaying the v...
C
{ "designated_segments": "[00:00 - 00:14]\n[00:14 - 00:26]\n[00:39 - 00:51]", "connections": "In the first two segments, the audio creates an expectation of an imminent event, stating the dress is \"about to be unveiled\" [00:14 - 00:26]. Visually, the obstacle to this event is established as a heavy \"White Sack\"...
null
null
null
null
null
D084SlWm7Wk
D084SlWm7Wk_hypothetical_reasoning_0
shopping
en-US
84
null
videos/D084SlWm7Wk.mp4
480x854
hypothetical_reasoning
null
If the yellow graphic containing the phone number were removed from the video, which instruction given by the speaker in the final segment would become impossible to follow?
[ "Visiting the store in person", "Booking the item via WhatsApp", "Confirming the guaranteed price", "Identifying the item as a viral hit" ]
B
{ "designated_segments": "[00:00 - 00:14]\n[01:18 - 01:24]", "connections": "The audio in the final segment [01:18 - 01:24] explicitly instructs the viewer to \"book this by whatsapping me.\" However, the audio track never verbally articulates the specific phone number required to execute this command. The logical ...
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_fine_grained_perception_0
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
fine_grained_perception
Audio-Guided
When the girl is heard saying that she told her brother the names of all the subjects, what specific gesture is she performing visually?
[ "She stands at a glass counter flipping through a stack of colorful notebooks to examine their pages closely.", "She walks away from the camera towards a service counter surrounded by tall stacks of books and notebooks.", "She faces the camera and counts on her fingers to visually list the subjects she is discu...
D
{ "connections": "This question identifies a unique and subtle line of narration (\"told my brother the names of all the subjects\") and asks for the concurrent, non-salient visual detail (touching her face), requiring precise synchronization of the audio track with the visual body language." }
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_fine_grained_perception_1
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
fine_grained_perception
Vision-Guided
What distinct non-verbal sounds are audible while the video shows a low-angle shot of the girl's feet executing a sliding dance move in her new black shoes?
[ "An isolated squeal of excitement is heard over the music, lacking any subsequent clicking or mechanical noise.", "A high-pitched squeal followed immediately by a distinct clicking sound is audible over the upbeat background music.", "The squeaking sound of rubber soles sliding against the tiled floor is audibl...
B
{ "connections": "The question hinges on a fleeting, unique visual action (the sliding dance move) that is not a main event. Answering it correctly requires detecting the specific sound effects (squeal and click) that occur exactly during this brief visual moment." }
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_scene_transformation_detection_0
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
scene_transformation_detection
null
What visual setting change occurs while the girl says, "First, we went to a clothing store and bought my school uniform, which I thoroughly enjoyed wearing"?
[ "The view switches from a moving shot of a busy road with heavy traffic to the interior of a clothing store lined with hanging items.", "The scene moves from a bedroom where the girl cleans an award to the entrance of a bank where she withdraws cash.", "The setting shifts from the clothing store aisle to a shop...
A
{ "connections": "The audio event is the specific sentence spoken by the girl describing the first stop of her trip. This narration serves as the temporal context for the visual transition from the travel sequence on the road to the interior of the clothing store.\n" }
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_scene_transformation_detection_1
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
scene_transformation_detection
null
How does the visual scene transform as the girl states, "I liked this bag, paid for it, and went straight to the stationary shop"?
[ "The video cuts from a clothing store where the girl spins around modeling a plaid school uniform to a shoe store where she sits on a bench inspecting boxes on the wall.", "The scene shifts from a general store where the girl displays a plastic-wrapped backpack to a stationery shop where she approaches a service ...
B
{ "connections": "The defining audio event is the girl's explanation of completing her bag purchase and moving to the next shop. This speech segment acts as a temporal marker for the visual cut between the store selling bags and the stationery shop." }
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_context_understanding_1
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
context_understanding
Audio Context
While the visuals depict the girl hurriedly visiting an ATM to withdraw cash immediately after cleaning a silver award plaque, what explanatory details does the audio provide for this sequence of events?
[ "She explains that the specific uniform shop she intended to visit did not accept card payments, requiring her to withdraw physical cash beforehand.", "She mentions that while cleaning the play button, she remembered schools were opening tomorrow and withdrew her YouTube earnings to buy supplies.", "She states ...
B
{ "connections": "The audio provides the crucial narrative link between the disparate visuals of the award and the bank by identifying the silver plaque as the source of her income (YouTube) and establishing the school opening as the urgent reason for the withdrawal." }
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_context_understanding_0
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
context_understanding
Visual Context
When the narrator remarks that the demanding study requirements made her feel "dizzy," what visual scene and gesture accompany this sentiment?
[ "She leans over the counter to flip through a stack of colorful notebooks, examining the pages while a man packs items.", "She stands in a general store holding a wrapped black backpack, smiling at the camera while a man stands behind the counter.", "She stands surrounded by hanging stationery and book stacks, ...
C
{ "connections": "The audio expresses an internal emotional state of being overwhelmed (\"dizzy\"), which is specifically grounded in the visual modality by the physical evidence of the large book stacks and the girl's bewildered gesture." }
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_comparison_0
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
comparison
null
How does the financial mechanism utilized in the opening sequence differ from the payment method depicted at the shoe store?
[ "Digital deposit of savings versus contactless authorization via a mobile device", "Electronic balance verification versus receipt of a monetary gift from a relative", "Automated retrieval of cash earnings versus manual handover of currency notes", "Initial card activation procedure versus negotiation of a la...
C
{ "designated_segments": "00:00 - 00:10\n00:23 - 00:37", "connections": "The video presents two distinct phases of the financial process required for shopping. In the first segment [00:00 - 00:10], the audio explicitly states the source of the funds (\"earning money from YouTube,\" \"withdraw money\") combined with...
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_comparison_1
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
comparison
null
How does the girl's physical demeanor contrast between the trip to the bank and her final return home?
[ "Anxious urgency versus peaceful relaxation", "Chaotic haste versus silent disappointment", "Energetic mobility versus collapsed fatigue", "Social enthusiasm versus verbal silence" ]
C
{ "designated_segments": "00:00 - 00:10\n00:51 - 01:01", "connections": "The video frames the shopping trip with contrasting physical states of the Girl. In the opening segment [00:00 - 00:10], the visual shows high-energy actions, such as running up a flight of steps and displaying exaggerated excitement in the AT...
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_sentiment_analysis_0
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
sentiment_analysis
null
What sentiment primarily defines the girl's internal state when she collapses on the bed at the end of the video?
[ "Physical fatigue stemming solely from carrying heavy shopping bags.", "Financial stress related to withdrawing her own earnings for supplies.", "Euphoria derived from purchasing her favorite school uniform and shoes.", "Dread caused by the anticipation of a more demanding academic year." ]
D
{ "designated_segments": "[00:00 - 00:10]\n[00:37 - 00:51]\n[00:51 - 01:01]", "connections": "The true nature of the Girl's exhaustion is not merely physical but rooted in psychological anxiety about the upcoming academic workload. This sentiment is constructed by linking the **visuals** of the first and last segme...
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_sentiment_analysis_1
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
sentiment_analysis
null
What underlying sentiment characterizes the protagonist's attitude during the payment exchange at the store?
[ "Elation stemming strictly from price negotiation", "Relief associated with staying within budget", "Gratitude directed at the shopkeeper's generosity", "Pride rooted in financial self-sufficiency" ]
D
{ "designated_segments": "[00:00 - 00:10]\n[00:23 - 00:37]", "connections": "The sentiment of pride displayed during the shopping trip is specifically tied to self-sufficiency, which is only revealed by connecting the **audio** of the first segment with the **visuals** and **audio** of a later, non-consecutive segm...
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_event_sequence_ordering_1
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
event_sequence_ordering
null
null
null
A
{ "designated_segments": "[00:00 - 00:10]\n[00:23 - 00:37]\n[00:51 - 01:01]", "connections": "This group disentangles the multiple scenes involving the Girl riding on the scooter. Without analysis, these transit shots could be confused as a single event or placed in random order. This sequence establishes the start...
[ "Passing playing dogs and waving children", "Eating snack at roadside stall", "Withdrawing money at bank" ]
What is the correct chronological sequence for the following events depicted in the video? (1) Passing playing dogs and waving children (2) Eating snack at roadside stall (3) Withdrawing money at bank
[ "(3) → (2) → (1)", "(1) → (2) → (3)", "(2) → (3) → (1)", "(3) → (1) → (2)" ]
What is the correct chronological sequence for the events depicted in the video?
[ "Withdrawing money at bank → Eating snack at roadside stall → Passing playing dogs and waving children", "Passing playing dogs and waving children → Eating snack at roadside stall → Withdrawing money at bank", "Eating snack at roadside stall → Withdrawing money at bank → Passing playing dogs and waving children...
M1nuGREjhKw
M1nuGREjhKw_event_sequence_ordering_0
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
event_sequence_ordering
null
null
null
D
{ "designated_segments": "[00:10 - 00:23]\n[00:23 - 00:37]\n[00:37 - 00:51]", "connections": "This group resolves the specific order of the shopping trip, which is presented across three consecutive segments but requires synthesis of narration and visual bag accumulation to order correctly. The timeline tracks the ...
[ "Purchases black backpack", "Eats roadside Pani Puri", "Purchases books and notebooks", "Purchases shiny black shoes", "Buys school uniform" ]
What is the correct chronological sequence for the following events during the girl's shopping trip? (1) Purchases black backpack (2) Eats roadside Pani Puri (3) Purchases books and notebooks (4) Purchases shiny black shoes (5) Buys school uniform
[ "(4) → (5) → (2) → (1) → (3)", "(5) → (4) → (1) → (2) → (3)", "(5) → (4) → (2) → (3) → (1)", "(5) → (4) → (2) → (1) → (3)" ]
What is the correct chronological sequence for the events during the girl's shopping trip?
[ "Purchases shiny black shoes → Buys school uniform → Eats roadside Pani Puri → Purchases black backpack → Purchases books and notebooks", "Buys school uniform → Purchases shiny black shoes → Purchases black backpack → Eats roadside Pani Puri → Purchases books and notebooks", "Buys school uniform → Purchases shi...
M1nuGREjhKw
M1nuGREjhKw_summarization_0
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
summarization
null
How is the financing of the school supply purchases depicted throughout the video?
[ "She uses self-earned cash withdrawn from an ATM and negotiates a discount during a purchase.", "She secures a personal loan advertised at the bank to cover the rising costs of her education.", "She relies on her brother to make the payments while she selects the necessary items.", "She pays for her major pur...
A
{ "designated_segments": "00:00 - 00:10\n00:23 - 00:37\n00:37 - 00:51", "connections": "This group traces the complete economic cycle of the event, from sourcing funds to executing transactions.\n* In the **first segment (00:00-00:10)**, the audio establishes the source of the budget (\"earning money from YouTube...
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_summarization_1
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
summarization
null
Who accompanies the narrator throughout the day, and what specific role do they play in the events?
[ "Her father, who transports her by motorcycle and handles the financial payments", "A private driver, who takes her to various shops and waits while she eats", "Her brother, who drives her on a scooter and assists in retrieving notebooks", "A store staff member, who helps her find school supplies and carries ...
C
{ "designated_segments": "00:00 - 00:10\n00:23 - 00:37\n00:37 - 00:51\n00:51 - 01:01", "connections": "These segments collectively define the protagonist's travel method and identify her companion, who acts as both driver and assistant.\n* The **first segment (00:00-00:10)** visually introduces the \"Man\" in the...
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_causal_reasoning_0
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
causal_reasoning
null
What provided the financial means for the girl to purchase the items shown in the stationery and bag stores?
[ "Revenue generated from her YouTube content", "A student loan secured at the bank vestibule", "Financial assistance from her accompanying brother", "A cash gift from the shopkeeper relative" ]
A
{ "designated_segments": "[00:00 - 00:10]\n[00:23 - 00:37]\n[00:37 - 00:51]", "connections": "The root cause of the specific activities seen later in the video is established entirely in the Audio of the first segment [00:00-00:10], where the Girl states, \"schools were opening tomorrow\" and \"Since I was earning ...
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_causal_reasoning_1
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
causal_reasoning
null
What is the primary cause of the girl's physical exhaustion at the conclusion of the video?
[ "The overwhelming mental stress and dizziness triggered by the anticipation of a difficult academic year", "The physical exertion required to walk the long distance home past the dogs without vehicle assistance", "The cumulative physical burden of carrying the heavy stack of notebooks and school attire acquired...
C
{ "designated_segments": "[00:10 - 00:23]\n[00:37 - 00:51]\n[00:51 - 01:01]", "connections": "The final outcome of the video—the Girl lying exhausted on her bed—is the effect of a cumulative physical burden that is only understood by aggregating visuals from previous, non-consecutive segments. The Visuals in [00:10...
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_future_prediction_1
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
future_prediction
null
Considering the stated urgency of the timeline and the girl's enthusiastic reaction to the footwear, what is the most likely event to occur the following day?
[ "The girl records a dance performance for her YouTube channel", "The girl wears the shiny black shoes to attend school", "The girl returns the footwear to the store for a refund", "The girl visits the bank again to withdraw more cash" ]
B
{ "designated_segments": "[00:00 - 00:10]\n[00:23 - 00:37]", "connections": "This prediction synthesizes the timeline with specific aesthetic choices made during the narrative to forecast the Girl's visual appearance in the near future.\n* **Audio Context (Segment 00:00 - 00:10):** The urgency is established with...
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_future_prediction_0
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
future_prediction
null
Given the immediate deadline mentioned at the start and the specific reaction to the volume of educational materials purchased, what is the most likely scenario the girl will face the following day?
[ "She will prioritize filming a celebratory YouTube video about her recent withdrawals.", "She will spend the day casually arranging her new shoes and uniform for display.", "She will encounter a challenging academic environment with a heavy study workload.", "She will attend a relaxed school orientation with ...
C
{ "designated_segments": "[00:00 - 00:10]\n[00:37 - 00:51]", "connections": "This prediction forecasts the tone and activity of the days following the video, moving beyond simple school attendance to a specific increase in workload.\n* **Audio Clue (Segment 00:00 - 00:10):** The Girl states clearly that \"schools...
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_hypothetical_reasoning_0
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
hypothetical_reasoning
null
If the girl did not possess the silver award shown at the beginning, how would the narrative context regarding her market transactions be primarily affected?
[ "She would be unable to validly negotiate the discount offered by the shopkeeper on the footwear.", "She would lack the established source of funds required to pay for the shoes and street food.", "She would have been denied entry to the bank vestibule to withdraw the necessary cash.", "She would have been fo...
B
{ "designated_segments": "[00:00 - 00:10]\n[00:23 - 00:37]", "connections": "In the first segment [00:00 - 00:10], the **Audio** explicitly states, \"Since I was earning money from YouTube... I had to withdraw money,\" directly linked to the **Visual** of the girl cleaning a YouTube Play Button and subsequently wit...
null
null
null
null
null
M1nuGREjhKw
M1nuGREjhKw_hypothetical_reasoning_1
shopping
en-US
61
null
videos/M1nuGREjhKw.mp4
480x854
hypothetical_reasoning
null
If the girl had undertaken the shopping trip without her brother's assistance, which logistical challenge would she most likely face?
[ "Determining the names of the subjects for the semester.", "Operating the ATM to withdraw cash for the supplies.", "Locating the stationery shop within the busy market.", "Transporting the heavy volume of notebooks and books." ]
D
{ "designated_segments": "[00:00 - 00:10]\n[00:37 - 00:51]", "connections": "The **Visual** in the first segment [00:00 - 00:10] establishes the mode of transport: a scooter driven by a Man. In the fourth segment [00:37 - 00:51], the **Audio** identifies the male helper as her \"brother\" and notes that \"He quickl...
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_fine_grained_perception_1
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
fine_grained_perception
Vision-Guided
When the camera provides a close-up view of the mother's hand holding and rotating a small, colorful box with a leopard print design, what does the mother say about the item?
[ "She states that she honestly doubts the item will fit on the doll, but decides to purchase it regardless since her daughter made contact with it.", "She exclaims that the box is incredibly large and verbally worries about how she will manage to hide such a big item from her daughter.", "She remarks that the it...
A
{ "connections": "Answering this question requires correlating the specific visual detail of the patterned box—distinct from the plain white shoe boxes seen elsewhere—with the mother's simultaneous voiceover commentary regarding the fit of the item and her purchasing rule." }
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_fine_grained_perception_0
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
fine_grained_perception
Audio-Guided
A distinct rattling sound is heard briefly over the background music; describe the specific physical actions the daughter is performing at this exact moment.
[ "She is standing near a wall display, holding a grey mesh shopping basket in her left hand, while reaching up with her right hand to touch a hanging item on the rack.", "She is standing beside a clothing rack, clutching a large brown doll against her chest, while reaching out with her right hand to remove a pink ...
A
{ "connections": "This question hinges on the unique audio cue of \"rattling\" (as opposed to the earlier \"rustling\"), which corresponds directly to the physical manipulation of the shopping basket and the interaction with the retail display." }
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_scene_transformation_detection_0
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
scene_transformation_detection
null
What significant camera movement occurs while the mother asks if the daughter touched all the items to put them in the basket?
[ "The camera tilts downward from the daughter to provide a close-up view of the grey mesh shopping basket filled with accessories she is holding.", "The camera pans horizontally to the right to focus on the wall display of hanging keychains and pink purses that the daughter is touching.", "The camera transitions...
A
{ "connections": "The defining audio event is the mother's question, \"Did you touch all of this to put it in the basket?\" which serves as a temporal marker. During this specific utterance, the visual perspective shifts via a camera tilt to reveal the contents of the basket referenced in the audio.\n" }
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_scene_transformation_detection_1
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
scene_transformation_detection
null
How does the visual setting change when the mother remarks that the doll is crazy expensive but decides to get it because it is cool?
[ "The view shifts from a wall of display shoes to a close-up of an open blue box where a pair of white sneakers is being arranged in tissue paper.", "The scene transitions from the retail shelf containing the large brown doll to a white countertop where the doll is being lowered into a custom-fit white box.", "T...
B
{ "connections": "The defining audio event is the mother's statement, \"This is crazy expensive, but it is kind of cool, so I guess I'll get it.\" This commentary marks the transition from the shopping activity in the aisle to the purchasing and packaging phase at the counter." }
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_context_understanding_1
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
context_understanding
Audio Context
While the video shows the mother picking out the white sneakers with black stripes and arranging them into a shoebox, what justification does she provide for this purchase?
[ "She states that she decided to get them because her daughter always needs a good pair of tennis shoes.", "She mentions that despite spending two hundred dollars on the doll, the shoes are too cute to pass up.", "She explains that she got them in the girl's size because they will be perfect for school.", "She...
C
{ "connections": "The visual stream depicts the specific action of selecting and packing the footwear, whereas the audio stream provides the unobservable context and reasoning (suitability for school) that explains why the purchase is being made despite earlier refusal." }
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_context_understanding_0
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
context_understanding
Visual Context
When the narrator comments that she doesn't think the item will fit on the large doll but decides to buy it anyway since the daughter touched it, what specific object is shown in her hand?
[ "A white doll-sized t-shirt featuring a leopard-print star design and a matching skirt on a white plastic hanger.", "A bright pink dress with a ruffled tulle skirt that the daughter selected and pulled from the display rack.", "A pair of white Adidas sneakers with black stripes that the mother had previously ar...
D
{ "connections": "The audio provides the narrator's internal monologue regarding the decision to buy an unspecified \"it\" due to the challenge's rules, while the visual stream is required to identify the specific object as the leopard-print patterned box." }
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_comparison_0
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
comparison
null
What distinguishes the mother's stated reason for initially refusing the large plush doll from her hesitation regarding the pink sneakers?
[ "Excessive expense versus product redundancy", "Storage difficulty versus incorrect sizing", "Poor durability versus aesthetic dislike", "Age inappropriateness versus physical discomfort" ]
A
{ "designated_segments": "[00:00 - 00:12]\n[01:20 - 01:30]", "connections": "In both segments, the mother initially attempts to deny a purchase, but the audio-visual synthesis reveals distinct justifications for each refusal. In the first segment [00:00 - 00:12], the refusal is based on **price value**; the visual ...
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_comparison_1
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
comparison
null
How does the mother's assessment regarding the physical dimensions of the purchases differ between the sneakers and the doll outfit?
[ "Exact current fit versus intentional oversizing for growth", "Reliance on standardized charts versus reliance on custom measurements", "Guaranteed anatomical support versus speculative structural failure", "Verified sizing compatibility versus recognized dimensional mismatch" ]
D
{ "designated_segments": "[00:24 - 00:38]\n[00:47 - 01:01]", "connections": "These segments contrast the mother's purchasing logic regarding the physical dimensions of the items. In segment [00:24 - 00:38], the audio confirms the purchase is based on **verified compatibility** (\"I actually got them in her size\"),...
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_sentiment_analysis_0
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
sentiment_analysis
null
Which phrase best characterizes the mother's attitude toward the daughter's request for the large plush doll?
[ "Firm financial discipline regarding expensive non-essentials", "Reluctant capitulation in response to persistent begging", "Overt and immediate indulgence of the child's whims", "Feigned refusal masking a secret intention to purchase" ]
D
{ "designated_segments": "[00:00 - 00:12]\n[00:24 - 00:38]", "connections": "In the first segment [00:00 - 00:12], the mother's audio explicitly denies the daughter's request for the Zemomo doll, stating, \"That's way too expensive. Not today.\" This establishes an initial sentiment of parental strictness and finan...
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_sentiment_analysis_1
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
sentiment_analysis
null
What underlying motivation drives the mother's decision to purchase the accumulated items despite her verbalized practical concerns?
[ "Dutiful adherence to a self-imposed rule", "Unconditional indulgence of the child's desires", "Reluctant submission to the child's persuasion", "Pragmatic preparation for future school needs" ]
A
{ "designated_segments": "[00:00 - 00:12]\n[00:47 - 01:01]\n[01:09 - 01:20]", "connections": "The mother's attitude shifts from enthusiastic initiator to reluctant participant, a sentiment only visible by tracking the \"rules\" of the video across separated segments. In [00:00 - 00:12], the audio establishes the pr...
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_event_sequence_ordering_1
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
event_sequence_ordering
null
null
null
D
{ "designated_segments": "**\n[00:12 - 00:24]\n[00:24 - 00:38]\n\n**", "connections": "**\nThe interaction with the Adidas shoes is split across the transition of two segments. The discovery of the item occurs at the very end of one segment, while the discussion, denial, and ultimate secret purchase occur in the ne...
[ "Mother admits buying shoes for school while arranging them in box", "Mother says \"not today\" due to Zemomo doll cost", "Hand takes white striped sneaker from display shelf" ]
What is the correct chronological sequence of the interactions involving the white Adidas shoes as depicted in the video? (1) Mother admits buying shoes for school while arranging them in box (2) Mother says "not today" due to Zemomo doll cost (3) Hand takes white striped sneaker from display shelf
[ "(2) → (3) → (1)", "(3) → (1) → (2)", "(1) → (3) → (2)", "(3) → (2) → (1)" ]
What is the correct chronological sequence of the interactions involving the white Adidas shoes as depicted in the video?
[ "Mother says \"not today\" due to Zemomo doll cost → Hand takes white striped sneaker from display shelf → Mother admits buying shoes for school while arranging them in box", "Hand takes white striped sneaker from display shelf → Mother admits buying shoes for school while arranging them in box → Mother says \"no...
NaxSd_kUaLE
NaxSd_kUaLE_event_sequence_ordering_0
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
event_sequence_ordering
null
null
null
B
{ "designated_segments": "**\n[00:00 - 00:12]\n[00:12 - 00:24]\n[00:24 - 00:38]\n[00:38 - 00:47]\n[00:47 - 01:01]\n\n**", "connections": "**\nThe video editing presents the events surrounding the Zemomo doll non-linearly. The daughter is seen holding the unboxed doll in later segments (00:38, 00:47) and receiving i...
[ "Mother lowers Zemomo doll into white protective box", "Mother mentions spending $200 while touching white Adidas shoes", "Daughter selects pink dress while holding Zemomo doll" ]
What is the correct chronological sequence for the following events? (1) Mother lowers Zemomo doll into white protective box (2) Mother mentions spending $200 while touching white Adidas shoes (3) Daughter selects pink dress while holding Zemomo doll
[ "(1) → (2) → (3)", "(3) → (1) → (2)", "(2) → (3) → (1)", "(1) → (3) → (2)" ]
What is the correct chronological sequence for the events?
[ "Mother lowers Zemomo doll into white protective box → Mother mentions spending $200 while touching white Adidas shoes → Daughter selects pink dress while holding Zemomo doll", "Daughter selects pink dress while holding Zemomo doll → Mother lowers Zemomo doll into white protective box → Mother mentions spending $...
NaxSd_kUaLE
NaxSd_kUaLE_summarization_0
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
summarization
null
Which summary best describes the progression of events concerning the large "Zemomo" doll throughout the video?
[ "The daughter finds the doll and immediately places it in her basket, ignoring the mother's concerns about the price until they reach the checkout counter.", "The mother initially refuses the expensive doll but later reveals she purchased it, and the daughter subsequently carries it while shopping for other items...
B
{ "designated_segments": "00:00 - 00:12\n00:24 - 00:38\n00:47 - 01:01", "connections": "These non-consecutive segments collectively trace the complete storyline of the \"Zemomo doll,\" from initial discovery to acquisition and continued presence.\n* **00:00 - 00:12:** The topic is introduced visually with the **d...
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_summarization_1
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
summarization
null
What determines which items the mother purchases during the shopping trip described in the video?
[ "The strict requirement to buy anything the daughter touches.", "The storage capacity of the shopping basket the daughter carries.", "The practical utility of the items for the daughter's school.", "The compatibility of the accessories with the large plush doll." ]
A
{ "designated_segments": "00:00 - 00:12\n00:47 - 01:01\n01:09 - 01:20", "connections": "These segments collectively explain the mechanics and strict adherence to the video's central challenge: the mother must buy whatever the child touches.\n* **00:00 - 00:12:** The audio establishes the premise (\"buying everyth...
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_causal_reasoning_0
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
causal_reasoning
null
What specific action compelled the mother to purchase the expensive doll she initially refused?
[ "The daughter said she loved it", "The daughter hugged it tightly", "The daughter physically touched it", "The daughter needed it for school" ]
C
{ "designated_segments": "[00:00 - 00:12]\n[00:24 - 00:38]", "connections": "This causal chain explains two distinct outcomes in the later segment based on information established in the first.\n1. **Possession of the Doll:** In the segment **[00:24 - 00:38]**, the visual shows the daughter hugging the Zemomo doll...
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_causal_reasoning_1
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
causal_reasoning
null
What compels the mother to purchase the pink sneakers despite her initial objection that the daughter already has enough shoes?
[ "The daughter's verbal insistence on the purchase", "The mother's recognition of a practical necessity", "The item's discounted pricing on the display", "The daughter's act of touching the merchandise" ]
D
{ "designated_segments": "[00:00 - 00:12]\n[01:20 - 01:30]", "connections": "In the final segment **[01:20 - 01:30]**, the mother initially declines the pink sneakers (\"you have enough shoes\") but then immediately states, \"I actually am going to get these for her.\" The root cause of this contradictory decision ...
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_future_prediction_1
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
future_prediction
null
Considering the specific items purchased and the mother's concerns expressed during the shopping trip, what is the daughter most likely to do next?
[ "Attempt to squeeze the large doll into the small outfit to test if it fits.", "Place the large doll entirely inside the miniature suitcase for transport.", "Put the new white sneakers on the doll's feet to complete its school outfit.", "Return the clothing to the shelf after the mother refuses to buy it." ]
A
{ "designated_segments": "[00:12 - 00:24]\n[00:38 - 00:47]\n[00:55 - 01:01]\n[01:14 - 01:18]", "connections": "This prediction arises from a specific conflict regarding the compatibility of purchased items. Segment [00:12-00:24] visually establishes the \"large\" and \"gigantic\" size of the Zemomo doll. In segment...
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_future_prediction_0
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
future_prediction
null
Given the recurring contrast between the mother's verbal refusals to her daughter and her concealed actions with the merchandise, what is the most likely event to occur after the video ends?
[ "The daughter leaves the shopping center empty-handed due to the refusals.", "The mother reveals the secretly purchased items to surprise the daughter.", "The mother returns the accumulated merchandise to the shelves to save money.", "The daughter receives a lecture on financial responsibility and budgeting."...
B
{ "designated_segments": "[00:05 - 00:09]\n[00:17 - 00:21]\n[00:30 - 00:31]\n[01:09 - 01:12]", "connections": "This prediction is based on the synthesis of a consistent audio-visual contradiction maintained throughout the video. In segments [00:05-00:09], [00:30-00:31], and [01:09-01:12], the audio establishes a pa...
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_hypothetical_reasoning_1
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
hypothetical_reasoning
null
If the daughter had stated she had not physically handled the items in the shopping basket when questioned, how would the mother's final course of action regarding those items most likely have changed?
[ "She would have enforced her initial command to abandon the basket and exit the store.", "She would have purchased the items regardless to reward the daughter for her honesty.", "She would have bought only the grey purse after personally verifying its quality.", "She would have been obligated to buy the items...
A
{ "designated_segments": "[00:00 - 00:12]\n[01:01 - 01:09]\n[01:09 - 01:20]", "connections": "This scenario demonstrates how verbal confirmation of a visual state dictates whether the mother aborts the shopping trip or commits to a large purchase.\n* **Context & Conflict:** In segment **[01:09 - 01:20]**, the mot...
null
null
null
null
null
NaxSd_kUaLE
NaxSd_kUaLE_hypothetical_reasoning_0
shopping
en-US
90
null
videos/NaxSd_kUaLE.mp4
480x854
hypothetical_reasoning
null
If the daughter had merely gestured toward the pink dress instead of physically removing it from the rack, how would the mother's decision regarding the item most likely change?
[ "She would decline the purchase due to the outfit's incompatibility with the doll.", "She would proceed with the purchase to acknowledge the daughter's clear interest.", "She would substitute the item with the leopard print set she pointed out.", "She would buy the dress but complain about the high price tag....
A
{ "designated_segments": "[00:00 - 00:12]\n[00:38 - 00:47]\n[00:47 - 01:01]", "connections": "This scenario proves that the pre-established rule forces the purchase of functionally useless items, overriding the mother's logical judgment.\n* **Context & Conflict:** In **[00:47 - 01:01]**, the **Audio** presents a ...
null
null
null
null
null
1tiV3rVO9S8
1tiV3rVO9S8_fine_grained_perception_0
shopping
en-US
81
null
videos/1tiV3rVO9S8.mp4
480x854
fine_grained_perception
Audio-Guided
What specific visual overlay appears on the screen at the exact moment a distinct chime sound is heard during the woman's introduction?
[ "A digital product image on a smartphone screen is held up to the camera lens to compare with the garment.", "A red animated 'Subscribe' graphic with a ringing bell icon appears in the lower right corner of the video.", "An orange rectangular box displaying the \"TEMU\" logo and white shopping icons briefly app...
C
{ "connections": "This question relies on the auditory cue of the \"chime\" to pinpoint a precise moment in the timeline. The viewer must correlate this sound with the simultaneous, fleeting appearance of the on-screen graphic, which is a subtle editing detail distinct from the continuous live-action footage of the n...
null
null
null
null
null
1tiV3rVO9S8
1tiV3rVO9S8_fine_grained_perception_1
shopping
en-US
81
null
videos/1tiV3rVO9S8.mp4
480x854
fine_grained_perception
Vision-Guided
What distinct sound effect is audible when the scene cuts to the woman using her fingers to handle and open a small transparent plastic packet containing jewelry?
[ "A digital chime sound is heard, marking the editorial transition to the close-up view of the jewelry packet.", "A metallic jingling noise is prominent, caused by the loose earrings striking against one another inside the packet.", "A loud ripping sound is produced, simulating the tearing of a paper seal rather...
D
{ "connections": "This question uses the specific visual context of opening the jewelry packet—which is distinct from the earlier opening of the large grey shipping bag—to direct attention to the audio track. Correctly answering requires identifying the specific \"rustling\" noise that synchronizes with this fine mot...
null
null
null
null
null
End of preview. Expand in Data Studio

OmniVideo-100K

Project Page Paper Github Dataset


Official repository for OmniVideo-100K, an instruction-tuning dataset introduced in our paper: "OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains".

This repository includes:

  • videos.tar.part_xx: Raw video files.
  • train_oe_70k.jsonl: Original Open-Ended (OE) training samples.
  • train_mcq_30k.jsonl: Original Multiple-Choice (MCQ) training samples.
  • train_oe_70k_formatted.jsonl: Instruction-formatted OE samples (ready for fine-tuning).
  • train_mcq_30k_formatted.jsonl: Instruction-formatted MCQ samples (ready for fine-tuning).

Note: The corresponding human-verified test set is released separately as OmniVideo-Test.


🔨 Data Generation Pipeline

Pipeline Overview

OmniVideo-100K is generated through an automated two-stage pipeline powered by Gemini series:

  1. Entity-Anchored Video Scripting: Transforms raw videos into structured scripts comprising a video summary, a main entity list, and segment-wise audio-visual descriptions with timestamps. The global entity list ensures referential consistency across all segments and associates speech with visual entities.
  2. Clue-Guided QA Generation: Instead of processing the entire dense text at once, this step prompts the model to first mine cross-segment, multimodal clues (evidence chains) from the script. QA pairs are subsequently generated based on these high-value clues, guaranteeing long-term temporal spans and deep cross-modal dependencies.

🩷 About the Dataset

Dataset Statistics

OmniVideo-100K contains 100K QA pairs derived from 5,214 videos, split into two formats:

  • 70K Open-Ended (OE) QA pairs
  • 30K Multiple-Choice Questions (MCQ)

Stat

Task Taxonomy

We define 10 audio-visual QA tasks organized into a three-level cognitive hierarchy:

  • 🔍 Alignment (Basic perception and synchronization)
    • Fine-Grained Perception, Scene Transformation Detection.
  • 🧠 Understanding (Cross-modal semantic understanding)
    • Context Understanding, Comparison, Sentiment Analysis, Event Sequence Ordering, Summarization.
  • ⚙️ Reasoning (Advanced logical inference)
    • Causal Reasoning, Future Prediction, Hypothetical Reasoning.

Video Curation

Videos are sourced from YouTube. To ensure high-quality data, we apply strict filtering criteria:

  • ✅ Resolution ≥ 480p
  • ✅ English speech
  • ✅ Sufficient visual dynamics & word density (ensuring rich audio-visual information)
  • ❌ Hard-coded subtitles removed (using OCR-based tools to prevent models from relying on on-screen text)

Comparison with Existing Datasets

Comparison

Compared to existing datasets, OmniVideo-100K uniquely provides complex temporal tasks, evidence-based QA, and structured narratives for open-domain videos, enhancing the cross-modal synergistic capabilities of Multimodal Large Language Models.


🚀 Performance

To verify the effectiveness of OmniVideo-100K, we performed full-parameter fine-tuning on three open-source MLLMs: VITA-1.5, Qwen2.5-Omni-7B, and Qwen3-Omni-30B-A3B-Instruct.

Performance on OmniVideo-Test

Fine-tuning on OmniVideo-100K leads to substantial performance gains on the OmniVideo-Test benchmark.

OmniVideo-Test

Generalization on Existing Benchmarks

Models trained on our dataset exhibit strong generalization transfer, achieving improved performance on audio-visual benchmarks such as Daily-Omni and FutureOmni.

Benchmarks


📋 Data Format (Raw Data Structure)

We provide both raw .jsonl files and instruction-formatted .jsonl files. Below is the structure for the raw data.

Common Fields

Every sample shares these foundational metadata fields:

{
  "video_id": "...",       // Unique video identifier
  "question_id": "...",    // Unique QA identifier
  "search_tag": "...",     // Retrieval tag (e.g., vlog, news)
  "language": "English",   // Video language
  "duration": 103,         // Video duration in seconds
  "metadata": {},          // Original video metadata
  "video_path": "...",     // Local video path
  "resolution": "1280x720",// Video resolution
  "task": "...",           // Major task category (e.g., causal_reasoning)
  "subtask": "..."         // Subtype (only for Fine-Grained Perception and Context Understanding)
}

General Tasks

Open-Ended:

{
  "question": "Why is the Brunette Girl happy with her blind box?",
  "answer": "She is holding a bottle instead of a can.",
  "analysis": {
    "connections": "...",         // Describes the evidence chains
    "designated_segments": "..."  // Supporting evidence timestamps for cross-segment tasks
  }
}

Multiple-Choice:

{
  "question": "Why is the Brunette Girl happy with her blind box?",
  "options": [
    "She found a rare item.",
    "She is holding a bottle instead of a can.",
    "The blind box was on sale.",
    "She received two figures instead of one."
  ],
  "answer": "B",
  "analysis": {
    "connections": "...",
    "designated_segments": "...",
    "explanation": "..."          // Explains exactly why the correct option is valid
  }
}

Special Task: Event Sequence Ordering

Open-Ended:

{
  "question": "What is the correct chronological sequence for the following events?",
  "events": [                     // Unordered events
    "Event A description...",
    "Event B description...",
    "Event C description..."
  ],
  "answer": ["B", "C", "A"]       // Correct chronological order mapped to the list indices
}

Multiple-Choice:

(To maintain data diversity, the raw data provides both indexed (1) (2) (3) and textual representations of the event sequences.)

{
  "events": [
    "Event A description...",
    "Event B description...",
    "Event C description..."
  ],
  
  // --- Indexed Style ---
  "question_indexed": "What is the correct chronological sequence for the following events?\n(1) Event A\n(2) Event B\n(3) Event C",
  "options_indexed": [
    "(1) → (2) → (3)", 
    "(3) → (2) → (1)", 
    "(2) → (1) → (3)", 
    "(2) → (3) → (1)"
  ],
  
  // --- Textual Style ---
  "question_textual": "What is the correct chronological sequence for the events?",
  "options_textual": [
    "<Event A> → <Event B> → <Event C>", 
    "<Event C> → <Event B> → <Event A>",
    "<Event B> → <Event A> → <Event C>",
    "<Event B> → <Event C> → <Event A>"
  ],
  
  "answer": "D"
}

🤖 Instruction-Formatted Data

For direct supervised fine-tuning, we provide _formatted.jsonl files where inputs and outputs are pre-assembled.

  • "question": The final constructed prompt fed to the model.
  • "answer": The expected target response.

Formatting for General Tasks

For Open-Ended (OE):

// Input
<question>

// Output
<answer>

For Multiple-Choice, we randomly append one of several instruction prompts (either before or after the options):

// Input
<question>
<prompt_prefix>
A. ...
B. ...
C. ...
D. ...
<prompt_suffix>

// Output
<answer> // e.g., "B"
Click to view the randomized MCQ instruction prompts:

We randomly select one pair of [prefix, suffix] from the following list:

  1. ["\nSelect from the following choices.", ""]
  2. ["\nChoose between the following options.", ""]
  3. ["\nAnswer with the option's letter from the given choices directly.", ""]
  4. ["", "\nPlease select the correct answer from the options above."]
  5. ["", "\nAnswer with the option's letter from the given choices directly."]
  6. ["", "\nAnswer with the option's letter directly (e.g., A, B, C, or D)."]
  7. ["", "\nAnswer with the option's letter (A, B, C, or D) from the given choices directly."]
  8. ["", "\nRespond with only the letter (A, B, C, or D) of the correct option."]

Formatting for Event Sequence Ordering

For Open-Ended:

// Input
<question>
(1) ...
(2) ...
(3) ...
Please directly answer with the correct order of all events' indices, separated by commas.

// Output
2,3,1

For Multiple-Choice:

During formatting, we randomly alternate between the Indexed Style and the Textual Style provided in the raw data to maximize model adaptability.


📑 Citation

If you find this dataset or pipeline useful in your research, please cite our paper:

@article{cai2026omnivideo100k,
  title={OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains},
  author={Cai, Xinyue and Fu, Chaoyou and Zhang, Yi-Fan and He, Ran and Shan, Caifeng},
  journal={arXiv preprint arXiv:2606.14702}, 
  year={2026}
}
Downloads last month
1,602

Paper for MiG-NJU/OmniVideo-100K