eBay Uses Computer Vision to Enable Sellers to Create Cleaner Images
We built an algorithm that lets users change the background of their listing photos.
Ellis Luk is a Product Manager with eBay, focused on the Mobile Seller Experience.
Prior to joining the eBay team, Ellis held a number of product-focused positions with Amazon and SaRA, a health tech startup. Through these roles, she gained a wide breadth of experience driving activation strategies that synthesize customer insights, developing product roadmaps and managing projects for bespoke products.
Ellis received a Bachelor of Arts in Advertising from Penn State University and a Masters of Business Administration from UCLA Anderson School of Management.
A video is provided to viewers using a web-based platform without restricted audio, such as a copyrighted soundtrack. To do so, a video comprising at least two audio layers is received. The audio layers can include separate and distinct audio layers or a mix of audio from separate sources. A restricted audio element is identified in a first audio layer and a speech element is identified in a second audio layer. A stitched text string can be generated by performing speech-to-text on both audio layers and removing the text corresponding to the restricted audio element of the second audio layer. When playing back the video, a portion of the video is muted based on the restricted audio element. A voice synthesizer is employed to generate audible sound during the muted portion using the stitched text string.
A video is provided to viewers using a web-based platform without restricted audio, such as a copyrighted soundtrack. To do so, a video comprising at least two audio layers is received. The audio layers can include separate and distinct audio layers or a mix of audio from separate sources. A restricted audio element is identified in a first audio layer and a speech element is identified in a second audio layer. A stitched text string can be generated by performing speech-to-text on both audio layers and removing the text corresponding to the restricted audio element of the second audio layer. When playing back the video, a portion of the video is muted based on the restricted audio element. A voice synthesizer is employed to generate audible sound during the muted portion using the stitched text string.
A video is provided to viewers using a web-based platform without restricted audio, such as a copyrighted soundtrack. To do so, a video comprising at least two audio layers is received. The audio layers can include separate and distinct audio layers or a mix of audio from separate sources. A restricted audio element is identified in a first audio layer and a speech element is identified in a second audio layer. A stitched text string can be generated by performing speech-to-text on both audio layers and removing the text corresponding to the restricted audio element of the second audio layer. When playing back the video, a portion of the video is muted based on the restricted audio element. A voice synthesizer is employed to generate audible sound during the muted portion using the stitched text string.
A web-based item listing platform provides item listings that users can create or search. Item listings can be generated using structured information extracted while capturing an item listing video of the item. During creation of the item listing video, input prompts are provided to the user that cause a mobile device to provide an input request, such as taking an image of a specific feature of the item or providing some other item description information. During the item listing video, image recognition models may also be employed to determine other item description information, such as the color, the brand, and the like. The item listing can be generated from the item listing video by populating a set of structured data elements associated with an item description type. Each structured data element is populated with the item description information corresponding to the associated item description type.
A video is provided to viewers using a web-based platform without restricted audio, such as a copyrighted soundtrack. To do so, a video comprising at least two audio layers is received. The audio layers can include separate and distinct audio layers or a mix of audio from separate sources. A restricted audio element is identified in a first audio layer and a speech element is identified in a second audio layer. A stitched text string can be generated by performing speech-to-text on both audio layers and removing the text corresponding to the restricted audio element of the second audio layer. When playing back the video, a portion of the video is muted based on the restricted audio element. A voice synthesizer is employed to generate audible sound during the muted portion using the stitched text string.
A web browser extension identifies graphic objects from images or video being presented by a web browser. Webpages related to the graphic objects are identified. Web links that facilitate navigation to the webpages are embedded over an area of the image corresponding to the identified graphic image. Where the graphic objects are identified within video, the web links are progressively embedded within graphic object boundaries of the graphic object as the graphic objects move locations during progression of the video. In this way, a user is able to interact with graphic objects of images and video to navigate to webpages related to the graphic objects. Some implementations provide a webpage redirect command at a stop point of the video so that the user can interact with graphic objects while the video is playing and without interrupting the video.
A web browser extension identifies graphic objects from images or video being presented by a web browser. Webpages related to the graphic objects are identified. Web links that facilitate navigation to the webpages are embedded over an area of the image corresponding to the identified graphic image. Where the graphic objects are identified within video, the web links are progressively embedded within graphic object boundaries of the graphic object as the graphic objects move locations during progression of the video. In this way, a user is able to interact with graphic objects of images and video to navigate to webpages related to the graphic objects. Some implementations provide a webpage redirect command at a stop point of the video so that the user can interact with graphic objects while the video is playing and without interrupting the video.