Tips On Empirical Research Slides

james-chua

Tips On Empirical Research Slides

post by James Chua (james-chua), John Hughes (john-hughes), Ethan Perez (ethan-perez), Owain_Evans · 2025-01-08T05:06:44.942Z · LW · GW · 4 comments

  Summary slide sets the frame
  Include an agenda
  Simple charts to describe experiments
  Backup slides - Be ready for questions
  End with concrete discussion points
  Keep one slide deck per project
  Get consistent feedback on the story of your paper
  Ask your friends
  Investing time is worth it
None
4 comments

Our research is centered on empirical research with LLMs. So if you are doing something similar, these tips on slide-based communication may be helpful!

Background:

James Chua and John Hughes are researchers working under Owain Evans and Ethan Perez, respectively. Both of us (James and John) used to be MATS mentees. We weren't good at making research slides at first -- here are some principles we've found useful for understandable slides for our weekly research meetings.

We show some good example slides. We also show examples of confusing slides we've made — marked in the caption with “❌ Negative example”.
Below we use the study of sycophancy as an example. Sycophancy occurs when a model has responses that match user beliefs rather than truthful ones.

Summary slide sets the frame

**Possible summary slide.** I summarise my experimental results for the week and write what I want to be discussed. If possible, I fit in a simple plot on the right.

Your mentor manages multiple projects and people. They need to be reminded of what you are up to. The first slide should recap the key takeaways from the last meeting to motivate what you have worked on and provide a clear summary of your progress.

There are two main messages to convey, which set the frame for what your mentor should think about:

Key takeaways from the last meeting. A reminder of the next steps and what was discussed.
Experiment outcome
- My experiments worked! Your mentor will focus on sanity checks, control experiments, and extensions to other setups.
- My experiments didn't work. Your mentor will focus on debugging why. E.g., train with better data, or improve the prompts.

A summary helps your mentor save time. For example, in the slide, your mentor may say “Oh I can remember what the data augmentation was, let’s skip that”. Or maybe your mentor already read your results and wants to discuss something else.

Include an agenda

Often in meetings with your mentor, there is very little time to cover everything the team has done in the week.

Include the main sections of what you will present in order of priority. This lets you cover the most important things first.
Highlight how many slides there are on that topic and how long you'd like to allocate for that section. This allows your mentor to calibrate themselves on whether it is ok for them to drill down into the details if there is still lots of high-priority content to cover.
If you are in a group meeting with other mentees, discuss with them how to allocate the time, so that everyone gets enough time for feedback.

Sometimes, you'll need to remind your mentor about the takeaways from the previous meeting before moving on to results:

**A summary recap.** Your mentor may manage many people, so you'll need to remind them of what state the project is.

Simple charts to describe experiments

After the summary, describe the experimental setups.

Always include your prompt. The prompt should describe how you are measuring your metric in the chart. Prompts are often long, so you can truncate the prompt and put the full version at the back of your slides.

**Example showing prompt used to plot charts.** A prompt should be beside your plot to explain how you got your results.

Show error bars. Your mentor wants to know whether you ruled out simple things like getting lucky with 10 samples. We use the standard error as a fast heuristic for proportion metrics. The formula is SE=sqrt(p(1-p)/N)) where p is a metric like accuracy or success rate and N is the sample size. To obtain the 95% confidence interval error bars, take SE *1.96. This is just a heuristic because there may be other sources of variance e.g. random seeds and prompt variations. See this post for better ways of calculating errors.

Part of the reason for showing the prompt and error bars is that you want people in the meeting to critique the experiment. So you want to have the “raw ingredients” in the slides, not just the high-level conclusions you are drawing (which might be wrong).

Label your axes. Indicate what your metric is, and what you desire to see. Is it e.g. accuracy (higher is better)? Or is cross-entropy loss (lower is better)

Include the values on the bar chart. E.g. for the chart above “51.4%” and “41.6%”. This saves energy having to look at the y-axis.

Rule of thumb — 3-5 colors max on a bar chart. These bars typically represent "model before your intervention", "a control baseline", and "model after your intervention".

I recommend to have a maximum of 3 types of models on a single slide.

Make the plot large. Having the plot as large as possible on the slide is important so everyone can read the results easily. If you are sharing the slides by video call, sometimes the video quality is not good, so making the plots bigger helps. Takeaways can be included if they do not compromise the readability of the plot.

Start with the most important message first. Even though you worked hard to try 10 different experimental setups, you don’t show all of them. Show your best setup first, or what your mentor would think is the most interesting. Discussing many experimental setups takes time, and often you don’t need to discuss them because they didn’t work and you have something better. You can put other setups in your backup slides (more on that later).
❌ Negative example:

❌ **Negative example:** Showing too many bar charts at one go makes it hard to know what the takeaway message is. It is also hard to read the axis once the labels become diagonal. This slide is a better fit for your backup slides — at the end of your slide deck in case your mentor asks.

Avoid too many words on a single slide. It means that you are discussing too many ideas at once.

Use simple charts. Mentors who have multiple meetings a day don’t have the energy to understand a complicated chart! Stick to easy charts e.g. bar charts.
❌ Negative example:

❌ **Negative example.** Heatmaps of values are hard to read. Heatmaps require the audience to stare at the y-axis and x-axis to find a particular value. This is tiring. Often you can condense the most important results into a bar chart.

Backup slides - Be ready for questions

While you should keep your main slides simple, prepare "backup slides" for questions your mentor might ask. You may also have results from experiments that just finished running, or plots where you haven't had the time to clean up. Stick these plots in the backup slides and flick to them if the conversation naturally goes there.
These slides may be more wordy. Some common things:

Explain what you are measuring. Help to remind your mentor how exactly you are measuring a term! It is especially helpful if a new collaborator sits on the call and gives feedback.

For example, if i am talking about sycophancy, I show a good example. Example from Sharma et al. 2023.

Detailed prompts. Use draw arrows / highlight text. Drawing arrows and highlighting text helps draw attention to particular parts of the prompt to look at.

I explain to my mentor that models are affected by a user’s wrong reasoning. I draw a red box to highlight the relevant parts. I use an arrow to summarise the takeaway.

Scaling curves. Suppose you try to intervene on a model by training on a dataset. And your training does not seem to help. One common question is “have you tried… more data?.” You should be ready to answer at that! Below is a full scaling plot, but to start you can just have a barplot with e.g. “1k vs 20k”. Use the arrows to point specific things out.

Data-scaling plot to show that more data does not help.

Try log-log plots. Not always relevant, so use your judgment, but always keep an eye out for scaling law behavior (if using accuracy, try plotting -log(acc) on the y-axis). Finding predictable scaling trends is helpful for forecasting.

Proposed baselines. What are some simple ways that would invalidate your results? You should think of some and include slides that discuss it.

Training details. E.g. what are the prompts and responses used for training? What are the hyperparameters, and datasets used? If what you tried did not work, what does loss curve look like?

End with concrete discussion points

At the end, list what you think your next steps should be.

Seek feedback from your mentor about whether these experimental priorities are correct. Include any resource requests, such as if you're bottlenecked on compute access.

Keep one slide deck per project

It is useful to keep one slide deck for a few reasons:

Your mentor and collaborators only have to keep track of one shared link to Google Slides or similar.
You can quickly refer back to slides from previous meetings.
Provides a consistent story for how your research progressed (we recommend you add the most recent slides to the start of the deck, instead of the end).

Get consistent feedback on the story of your paper

Often getting the paper's story and thinking about how you frame certain elements is left too late. We recommend including slides in your weekly meeting that describe the current story you want to tell so you can get feedback. Then as that story changes in light of new results, present the new story and get feedback again. Following this will make it much easier to write a paper that everyone is aligned on from the start.

Ask your friends

At the start, I (James) benefitted from having a friend review my slides and provide feedback. It is especially helpful if they are also mentored by your mentor as well since your friend will be able to model your mentor's questions better.

Ask your friend to point out any confusing parts like "What do you mean by this term?". These questions highlight where you may need additional slides.

Investing time is worth it

When I first started, I had to invest a lot of time in making slides e.g. 1-2 days. This was a big time investment! I was unused to spending such a time trying to communicate. But it is worth it -- doing great experiments is only half the journey, they only matter if people understand them!

The 1-2 days of improving slides helped me to iterate on experimental improvements. E.g. “It seems like my error bars are big here, I need more samples.” or “I’m missing a control setup here. I need to make one.” Now I'm better at it so it takes only half a day. And communicating my ideas is much easier!

4 comments

Comments sorted by top scores.

comment by Ted Sanders (ted-sanders) · 2025-01-09T07:28:20.641Z · LW(p) · GW(p)

Additional thoughts:

More than 3 bars/colors is fine
I recommend using horizontal bars on some of those slides, so the labels are written in the same direction as the bars - lets you fill space more efficiently
Put sentences / verbs in titles; noun titles like "Summary" or "Discussion" are low value
If you're measuring deltas between two things, compute the error bar on the delta, don't compute the error bars on the two things; consider coloring by statistical significance (e.g., continuous color scale over range of standard errors of differences of the mean)
In addition to agenda, it can be helpful to start with objectives - why are you here and what are you hoping to get from them? are you trying to inform them? get advice on something specific? get advice on something broad?
Can help to include real data / real prompts / real model outputs - harder to fool yourself when you look at real data instead of relying on abstract metrics and intentions
It's fine to have crummy slides - don't waste 1 hour of your time to save 5 minutes of your audience's time - the slides should serve you, not the other way around

comment by TrudosKudos (cade-trudo) · 2025-01-08T08:45:34.639Z · LW(p) · GW(p)

This was incredibly informative. I really appreciate you all taking the time to share. I'm going to be using a lot of the information here immediately! I'd love to read any additional insights on slide design or thoughts you all have on other communication styles as well.

Replies from: ted-sanders

↑ comment by Ted Sanders (ted-sanders) · 2025-01-09T07:33:03.512Z · LW(p) · GW(p)

Management consulting firms have lots of great ideas on slide design: https://www.theanalystacademy.com/consulting-presentations/

Some things they do well:

They treat slides as documents that can be understood standalone (this is even useful when presenting, as not everyone is following every word)
They employ a lot of hierarchy to help make the content skimmable (helpful for efficiency)
They put conclusions / summaries / action items up front, details behind (helpful for efficiency, especially in a high trust environments)

comment by mattmacdermott · 2025-01-13T08:19:47.966Z · LW(p) · GW(p)

I found this really useful, thanks! I especially appreciate details like how much time you spent on slides at first, and how much you do now.

Tips On Empirical Research Slides

Contents

Summary slide sets the frame

Include an agenda

Simple charts to describe experiments

Backup slides - Be ready for questions

End with concrete discussion points

Keep one slide deck per project

Get consistent feedback on the story of your paper

Ask your friends

Investing time is worth it

4 comments