shared with colleagues in the English department. Other words have been jumping out this semester and last, such as the word "profound." I teach at a community college where the word profound was _never_ used by my students before. For one thing, we teach students to reject the use of subjective language in literary analysis (like "awesome" and "great"). But the word "profound" suggests a degree of knowledge and analytical expertise that privileges it as an adjective over these other words, so it didn't jump out to me at first. Until I remembered: my students have no idea what is profound and what isn't. They don't have the experience as readers, and so the word choice is completely inauthentic. It's driving me crazy, and I have to think about how to teach them to use it. This is an imperative coming from the top, and I can't stand it -- I feel like a hostage to big tech.
You may also be interested in our study of ChatGPT's and Llama's grammatical and rhetorical structure, which varies from human writing just as strongly as its vocabulary does: https://arxiv.org/abs/2410.16107
Our suspicion is that this is driven by the instruction-tuning process. The Llama base models, which do pure text completion, are pretty similar to humans. The instruction-tuned variants with post-processing to follow instructions and complete tasks are quite different from humans. Possibly there's something about the instruction tuning tasks and the feedback of the human raters that makes this happen.
I love this post, and oh my gosh, yes, "delve" has stuck out to me like a sore thumb from the very beginning! I would so love to know the history of how that word came to dominate the machine. Like, what training dataset had an overwhelming use of this hilarious word. Or are the model devs just trolling us!
shared with colleagues in the English department. Other words have been jumping out this semester and last, such as the word "profound." I teach at a community college where the word profound was _never_ used by my students before. For one thing, we teach students to reject the use of subjective language in literary analysis (like "awesome" and "great"). But the word "profound" suggests a degree of knowledge and analytical expertise that privileges it as an adjective over these other words, so it didn't jump out to me at first. Until I remembered: my students have no idea what is profound and what isn't. They don't have the experience as readers, and so the word choice is completely inauthentic. It's driving me crazy, and I have to think about how to teach them to use it. This is an imperative coming from the top, and I can't stand it -- I feel like a hostage to big tech.
"appearance of" is another one.
You may also be interested in our study of ChatGPT's and Llama's grammatical and rhetorical structure, which varies from human writing just as strongly as its vocabulary does: https://arxiv.org/abs/2410.16107
Our suspicion is that this is driven by the instruction-tuning process. The Llama base models, which do pure text completion, are pretty similar to humans. The instruction-tuned variants with post-processing to follow instructions and complete tasks are quite different from humans. Possibly there's something about the instruction tuning tasks and the feedback of the human raters that makes this happen.
I love this post, and oh my gosh, yes, "delve" has stuck out to me like a sore thumb from the very beginning! I would so love to know the history of how that word came to dominate the machine. Like, what training dataset had an overwhelming use of this hilarious word. Or are the model devs just trolling us!